Field of Science

And the result is....

Between-strain variation in nucleotide sequence is greatly reduced at positions that are part of the USS motif. This is clearly seen in the figure to the left, where the blue bars representing the amount of variation for each position are small at the positions where the motif bases are tall (strong consensus). The error bars are the standard deviations of the six datasets (forward and reverse strands of the three readily available genomes) This was about 5000 alignments.

For the flanking AT-rich segments the magnitude of the reduction is proportional to the strength of the motif consensus, but variation in the USS core is most strongly reduced in the initial As.

This result was predicted last year by one of the post-docs, but at that time I didn't see an easy way to test it. How should we interpret it? Perhaps it means that most of the USSs in the genome have evolved to the optimum compromise between the preference of the uptake machinery and the needs of within-cell functions. Why don't the differences in % variation map precisely onto the logo?


  1. Hi

    what kind of operating systems do you have available in your lab? It seems that with some simpler applications you would be able to "kill" the Excel-Word duo and do most of your things in one program only.



  2. I have a hard time thinking of an explanation for this without hypothesizing that USS have cellular function.

  3. So maybe there are two factors affecting the USS here - one is the uptake bias favouring certain sequences, but what if that generated a relatively diverse pool, and then certain of these have sequences that are more likely to get recombined into the genome? For example, if part of the subset lies at the end of the fragment taken up, and has an A-rich 5'-end, then perhaps local melting is more likely to allow recombination of this subset into the genome, relative to others. So, even if the uptake machinery is rather relaxed in that region, an apparent bias might build up in the genome for fragments that can recombine in more easily. Something like this might explain a greater degree of stringency of motif in the genome relative to what is measured at the point of uptake.

  4. Don't know about interpretation, but that's a lovely figure. Well done!

    The logo + variation graph looks like a pretty good agreement to me on the whole. Are the minor differences simply due to the analysis (different methods, different datasets) or a genuine puzzle?

  5. Paulo: We use Macs, so can run Unix at will. If I was willing to take the trouble to reactivate my fast-fading babytalk-Perl skills, I could probably have spent several days figuring out how to do the analysis in Perl. If I was willing to read the >350-page BBEdit manual, I might have worked out how to do it with BBEdit. If I was going to need to do the same analysis many times, it might have been worth paying someone to do the above for me.

    But the way I did it used the tools I already know, and got the job done. Not done in a technically sophisticated way, but done. Think of Monty Python and the coconuts.

    Heather and Lindsay: I think you're invoking complicating factors when we haven't yet evaluated whether reduced variation is the null expectation for sequences maintained by this kind of molecular drive.

  6. From what a understand you are editing the file on Word in order to include columns (tabs) in it and open the same file in Excel to remove such columns.

    You can do it with BBEdit easily. Most recent text editors allow you to select a column and delete/insert characters in it. Page 65 of BBEdit manual show who to do it.

    I asked about your systems because in Linux, apart from Emacs and Vi (that run on Mac too) you have the option of using Kate which is fair simpler to learn and use (also less powerful too).


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS