Here are the logos for the N. meningitidis and H. influenzae uptake sequences after sorting the occurrences by the scores that the Gibbs motif Sampler assigned them. (I'm pretty sure that each score is a measure of how well that occurrence's sequence matches the position weight matrix that Gibbs determined for this data set, but I don't know how the calculation is done.)
The top set are the logos for 5381 N. meningitidis DUSs. The numbers are different than in yesterday's post because I realized I had been analyzing a N. gonorrhoeae data set. The overall picture is the same for N. meningitidis and N. gonorrhoeae - low-scoring DUS retain strong consensus for most of the central positions but have only very weak consensuses for the other positions. The drop-off is quite steep. The shapes of the logos are about the same for all the occurrences with scores lower than about 0.95.
The H. influenzae dataset is even more skewed; almost 60% of the USSs have perfect scores, and about 8% have zero scores. But the consensus decays fairly evenly across the positions, and even the zero-score occurrences have the full motif. Like the N. meningitidis DUS, the shapes of the USS logos are about the same for all occurrences with scores below 0.95.
I think the question in my mind was whether there is a obvious place to draw a line between 'real uptake sequence' and 'degenerate sequence that doesn't deserve to be treated as an uptake sequence'. Unfortunately the analysis is complicated by the different sizes of the datasets - the N. meningitidis set has almost twice as many sites as the H. influenzae set.
OK, I've dug out another set of H. influenzae runs, done with a high 'expected' setting to maximize the number of sites found. This has 3466 USSs, with a lot more having zero scores than in the previous set. Now the first and last Gs in the core are seen to be weaker in USSs with low scores, though not in the larger set of USSs with zero scores. Overall the consensus still remains constant as the scores and consensus strengths decrease. Notably, the flanking AT-rich segments remain as important in poorly matched USSs as the core does.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
5 comments:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
I think this is telling us that there is no clear score cut-off for defining a non-degenerate US. Does this also tell us which positions are the most important for uptake and how well do they agree with Lindsay's data?
ReplyDeleteI think I'll do another post on how to interpret this, to clarify my muddled thinking about how the motif perspective fits with this data.
ReplyDeleteIn Lindsay's data, changing the two Gs doesn't strongly affect uptake, but the effect is still less than some other positions that show stronger conservation in the low-score logos. (There are errors in the base labels in this figure; I'll need to check her notebooks.)
What is the "background" A+T content that Gibbs is using when calculating the strength of your motifs? Some Gibbs servers default to 60% A+T because Gibbs is mostly used to analyze promoter DNA. If this is the case, the overrepresentation of G+C in your weak Neisseria motifs may be an artifact of G+C bases being scored higher than A+Ts, when in fact the weak DUS sites in the genome vary equally at all positions. In other words, Gibbs may preferentially favour weak sites that differ at the A+T positions but maintain the G+C positions.
ReplyDeleteCould it help to use some more simple way to score the DUS sequences. I would focus on the 12 bp of the Neisseria DUS and than make lists of sequences allowing zero, one, two ... missmatches (use for example fuzznuc ). Than one could sort the sequences and see if there is a high number of a certain sequence that does not fully resemble the consensus. At least than one deals with real sequences and not with some strange scores. I guess Neisseria itself would understand this way better than the Gibbs scores.
ReplyDelete@Tim: I don't think the background base composition is the cause, because the changes are position-specific (e.g. some USS Gs get much weaker than others).
ReplyDelete@Torsten: The reason I'm doing the Gibb analysis is to get away from the erroneous repeat/mismatch view. The position-weight-matrix produced by the Gibbs analysis is much more consistent with how we think uptake sequences evolve. I'll try to clarify this in my next post.