Now we have our Perl script to chop the genome sequence into short-enough fragments, and we have the motif-search program running on the fast Westgrid server. So I've been trying to run motif searches on the whole genome. It sort-of works. Actually, it almost works great!
What works: First, it usually runs without quitting prematurely. Second, it produces what I think is the correct output. I say I think because I don't understand the statistical parts of the output. Third, this time I asked it to search the sequence for 2 motifs rather than 1, and even that seems to be working. Fourth, much of the time the output shows the pattern I was expecting to see: alignment of hundreds of short sequences, each containing a sequence related to the previously characterized USS. I trim these down (using Word's search-and-destroy function) and paste them into WebLogo, which generates logos like that above, summarizing the pattern. And it's fast - analyzing the whole genome takes only 5-10 minutes.
What isn't yet right: Sometimes it misses what should be the very significant motif and instead returns a weak motif that has nothing to do with the USS; I think this means I've set the stringency too low by telling it to expect too many sites with the motif. Often it returns only part of the USS motif, by cutting one side of the USS off, in favour of positions that show no evident similarity at all (when represented as WebLogos). This happens partly because it has decided not to fragment the motif into sub-motifs separated by non-consensus bases - I don't know why. The logo in this paragraph shows such a case. Compare it to the logo in the previous paragraph, and you see that the leftmost AT-rich part is missing. In both images the red underlining shows the positions that the motif search program decided had significant consensuses; in both the program has included positions with no consensuses and left out positions further to the left that would have strong consensuses. It could have included these positions by fragmenting the motif, but it didn't.
The biggest problem is the mysterious segmentation fault error. If it's using the full genome sequence (1.83megabases), and if I ask it to find a motif bigger than 18bp, the program begins the analysis but stops after a few cycles, reporting a segmentation fault. Googling segmentation fault tells me that this is probably because some string has become too long (the program is trying to put too much information into some location). I'm going to have to read the all-too-terse instructions to see if I can find a way around this. If I can't, I'm hoping that the person who sent me the binary code will take pity on my ignorance and help me solve the problem. The worst case will be if there is no way around this, but even then I think I can still get the analysis I need - it will just take more work on my part, combining results from different parts of the genome.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
No comments:
Post a Comment
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS