One question about the accumulation of USS sequences in genomes is the extent that they interfere with coding for proteins and other 'real' functions of the DNA. I've calculates that USS constrain 2-3% of the H. influenzae genome, taking into account the two flanking segments and also the strength of the consensus at these places. That was done years ago, and I probably should redo the calculation, especially as I'm not sure I can even find the original notes.
Seven or eight years ago we started working on this in collaboration with a theoretical physicist (turned bioinformatician) in Taiwan. One of his grad students did extensive analysis but has since gone on to other things, and his supervisor says we're free to finish the analysis and publish it without including them. So I've arranged with our current bioinformatics collaborators to redo the analysis, incorporating various improvements made possible by both our improved understanding of the issues and by the availability of more sequences to analyze.
This is a nice change from most of our work, in that we are starting out with a very good idea of what the results will look like. Not the details, but the general shape of things. I took advantage of this to write much of the paper in advance of getting the results the paper will describe. I made fake figures showing what the data will probably look like, and considering different ways we might present it. And then I sent the whole draft paper off to the collaborators, so they could see where their work is going. And I'm sitting back waiting for them to do the heavy lifting of generating the data.
What are the main findings? We already know that in H. influenzae and Neisseria meningitidis the USSs are preferentially found in the non-coding regions (these is only about 10% of the genome). In H. influenzae about 35% of USS are in non-coding, and in N. meningitidis about 60%. We'll check the ratios for other genomes too. We assume (hypothesize?) that this is because USSs constrain the ability of genes to code for the best amino acids.
The big analysis is done on the USSs that ARE in the coding regions, because here we can determine true sequence homology with other bacteria. We can thus use sequence alignments to find out the degree to which USSs avoid the most highly conserved (= most functionally constrained) parts of proteins. The result is that USSs are preferentially found in two kinds of places. The first is parts of proteins that show little evidence of functional constraint - for example the flexible hinges and linkers between domains. The second is parts of proteins where USSs don't change the amino acids; i.e. where the USS specifies the same amino acids that are optimal anyway. We can also analyze these USSs by the kind of proteins (or RNAs) the different genes produce - USSs are preferentially found in the less important proteins. And we can check whether the protein-coding part of the genome has spare places where USSs could be put without changing the amino acid sequence of the protein. H. influenzae has quite a few of these (I forget the numbers).
Hmm, writing this overview is giving me better ideas of how the paper should be organized.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
-
Political pollsters are pretending they know what's happening. They don't.5 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections6 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
2 comments:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
I spoke with someone yesterday about using WestGrid. Apparently it is not possible (?) to have the arguments given by the user at the beginning (such as genome size, GC content, etc). So I think I should write these in the code. This will also make automation much easier. We can use several different mutation rates, converstion rates, and so on. I guess we should talk more about this, and the definition of equilibrium.
ReplyDeleteI keep coming back to preferred codon usage in Hi, but in no clear way, and I need to do a lot more reading! But I wonder if the perfect USS can be thought of as an optimum in that it doesn't code for any particularly rare amino acids in any of its possible frames. Also, is it right to think of the perfect USS as being at the top of a finess peak, and maybe a wonderful 3-off on the next peak (no rare amino acids in any frame etc) has little chance of arising because of the combined effects of receptor preference and deleterious one-off and two-off changes.
ReplyDelete