Field of Science

Maybe the spacing is quite random after all

I think I've finally finished the analysis of spacing of real uptake sequences. Here's the figure. (Hmm, Blogger has squished it, but it's still legible.)

The main histograms are the numbers of uptake-sequence spacings in each 100 bp bin, for N. meningitidis DUSs (blue bars) and H. influenzae USSs (red bars). The blip at the far end of each histogram is the number of spacings greater than 5000 bp.

The first bar of each histogram is a lot higher than its neighbours (especially the DUS one, which would be twice as high if I'd drawn it to scale). This tells us that each genome has a lot more close-together uptake sequences than expected from the spacings of the others. We already knew this from published work, and as expected almost all of these close-pair uptake sequences are in inverted orientation, allowing them to act as transcriptional terminators.

The inset in each graph shows the spacings of the uptake sequences that are within 50 bp of each other (center-to-center distance). The N. meningitidis DUSs are mostly 15-25 bp apart, allowing the 12 bp uptake sequences to base pair as a stem with a 3-13 nucleotide loop. But the H. influenzae USS 'pairs' are much closer together, effectively sitting one on top of the other. We think these sequences probably shouldn't be considered as pairs of uptake sequences (as in the published analysis), but as single palindromic uptake sequences that can act in both directions. Here's an example, with the centers of the 'two' uptake sequences separated by 1 bp, and another with the centers separated by 11 bp.
nAAAGTGCGGTnAAATTTnnnnnnAATTTT
TTTTTAnnnnnnTTTAAAnTGGCGTGAAAn

nAAAGTGCGGTnAAATTTnnnnnnAATTTTnnnnnnnnnnnn
nnnnnnnnnnnTTTTAAnnnnnnTTTAAAnTGGCGTGAAAnn
Once I'd done this analysis I realized that I should that these pairs into account in deciding whether the spacing of the rest of the uptake sequences is random. The black line on each graph shows the average spacings of the same number of random positions as there are non-paired uptake sequences (averaged over 10 genomes worth of random positions). Overall the lines fit the histograms quite well. The darker part of the first bar on each graph is the number of unpaired uptake sequences in the 100-bp bin; there are a bit fewer of these than the random positions predict, but that's the only anomaly.

This isn't the result I was expecting, so I'll have do more thinking about how we present the spacing analysis of the simulated genomes.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS