I had almost finished redoing all the BLAST analysis of USS variation using the new settings when I realized that my changes to the analysis had created a new problem.
Originally I had set the probability cutoff for the BLAST searches to 10e-10, because I wanted to exclude alignments to USS sequences that weren't true homologs (i.e. to sequences from unrelated positions in the genome that were similar because they were also USSs, not because they both descended from the same ancestral USS). This constraint meant that I got sequences with only 1 (or no) internal mismatches, and I was quite confident that these would all be true homologs. (In retrospect I might have been overconfident.)
After I did the first control analysis (of randomly selected non-USS sequences) and discovered that my BLAST searches were producing patterns unrelated to USS, I lowered the probability cutoff to 10e-4. This meant that I often got several different genomic sequences aligned to the same 39 bp USS sequence. I know that only one of these could be the true homolog, and I assumed that the best match would indeed be the true homolog (I had worked out an easy way to eliminate all but the best match). This would have been true if (1) all of the USS sequences in the H. influenzae Rd genome did indeed have true homologs in each of the other H. influenzae strain genomes, and if (2) I hadn't been searching the two strands of the genome separately. But (1) a small fraction almost certainly don't, and (2) I was, so a significant fraction of the alignments I was scoring were to the wrong sequences.
This wouldn't have been a big deal if including wrong sequences just introduced noise into the analysis, lowering its ability to detect differences caused by the USSs. But a long and heated discussion with one of the postdocs had helped me realize that including non-homologous USSs would have precisely the effect I was attributing to homologous USSs - reducing the amount of variation at USS positions relative to non-USS positions.
I had been thinking that the low probability cutoff (really 'high', as 10e-4 is higher than 10e-10) was good because it produced a set of alignments with more mismatched positions and thus more variation to feed into the analysis. But some of this variation was due to mistaken alignments, so I decided to go back to the original sets of alignments and redo the analysis, using only those that had no more than three mismatched positions. This eliminated almost all of the non-homologous alignments.
The graphs above show the difference between the two analyses for one of the three non-Rd genomes. I've cut off the 5 non-USS positions at each side, so these are only 29 bp wide. The 'better analysis' has lower genetic variation (Y axis goes to 3, not 4), because the alignments I had eliminated were the ones with the most variation (some of it due to non-homologous alignments), but the pattern is the same.
This is reassuring for two reasons. First, the pattern I saw in the earlier analyses (low variation at USS-motif positions) is still there, and still very strong, so I know it wasn't due to including non-homologous alignments. Second, because eliminating most of the problem alignments hasn't caused the pattern to become much weaker, I can be confident that any non-homologous alignments that might have slipped past the new more stringent cutoff aren't responsible for the pattern.
I redid the controls too with the new cutoff, and they give baseline variation of about 1.3%, with standard deviation of about 0.15%. So the least variable USS positions have much less variation than the rest of the genome, and the most variable have about twice as much. This high variation may be simply because the set of USS sequences has overall more variation than the rest of the genome. This is expected to be the case because USS are preferentially located in poorly constrained genomic positions such as non-coding regions and non-essential genes.
Now we just have to get the revised manuscript back to the journal editors before they lose patience with us.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS