I spent much of yesterday wrestling with a USS result generated by our bioinformatics colleague. It was counterintuitive, in that it seemed to disagree with both our prediction from basic principles and the results of a related analysis. She had a plausible hypothesis that would explain this result, and my wrestling was with how to present the result in our paper, and how to test her hypothesis.
We're quite confident in our prediction that USSs should be more common in less important parts of the genome. This was originally proposed by Ham Smith's group many years ago, and the various components of the paper we're preparing support this. In particular, the colleague has sent me data showing that proteins with no USSs in their genes have sequences that are more strongly conserved across different bacterial families than do proteins with 1, 2 or ≥3 USSs in their genes.
I'd better give 'conserved' an explicit definition here. When we find that the sequences of two proteins are similar, we need to decide whether this is just a coincidence. If the similarity is too strong to be coincidence we conclude that the genes coding for these proteins must have had a common ancestor, and we say that the proteins are homologous (similar due to shared ancestry). (I'm taking the liberty of ignoring convergence as an explanation.) Homologous sequences that have remained very similar despite very long periods of independent evolution since their common ancestor lived are said to be highly conserved; this conservation is usually due to natural selection for maintenance of an important function. Sequences that have become very different (but still too similar to be a coincidence) are said to be only weakly conserved, and these sequences usually have less important functions.
Conservation scores in our analysis were measured as the % identical amino acids in BLAST alignments. This test only used those proteins that had good homologs in each of the three other bacterial genomes used for the test, with 'good' being defined as having a BLAST "E-value" of less than 10e-9. The surprising result comes from examination of proteins that didn't meet this criterion.
A lot of proteins did not have good homologs in any of the three test genomes. I expected this to be because the sequences were only very poorly conserved, and so I expected that these proteins would be even more likely to have USSs than the worst of those meeting the E<10e-9 style="font-weight: bold; font-style: italic;">more likely to have no homologs than proteins with 1, 2, 3 or more USSs. It took me much of yesterday to get to the point where I could write the previous sentence. It seems clear now, but the nature of the data kept making my head spin. It wasn't just that I was particularly thick - it made the post-doc dizzy too.
The colleague's hypothesis is that many of the genes in the no-homologs class lack good homologs not because the homologous genes are so poorly conserved that they've diverged past the E<10e-9 cutoff. Rather she suggests it's because these genes were acquired by lateral gene transfer from distantly related bacteria, and thus have no homologs at all in the test genomes. I'm still wrestling with the best way to test this ('best' determined by the optimal combination of good data and not too much work).
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
-
Political pollsters are pretending they know what's happening. They don't.5 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections6 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
1 comment:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
You could lower your E-value cut-off. If "homologs" of these genes start appearing you might be able to say that they are very divergent. I think the best test (which is probably too much work for what you want) is to look at synteny. Look at the location. Are the genomes without homologs missing a gene in that location? Do the genomes with homologs look like the gene came in recently? Although if I recall these comparisons are between very divergent species so synteny might be completely absent.
ReplyDelete