Field of Science

Interaction effects in uptake bias and in the genome

Well, the postdoc and I continue to struggle with our revisions to his manuscript about the sequence bias of the Haemophilus influenzae DNA uptake machinery.  Quite a bit of the struggle is with each other, as we each try to clarify what we think.

One issue that's just come up is how interactions between bases at different positions of the preferred sequence motif will affect what sequences accumulate in the genome.

The top part of the figure below is a drawing of a double helix of DNA, with a specific sequence drawn on it, and below that are two 'sequence logos'.  The first one is the pattern derived from the uptake sequences in the genome, and below that is the pattern derived from the sequences that were preferentially taken up by the cells' uptake machinery.  The overall difference in height of the two logos isn't significant (they use sequences derived in very different ways), but the differences in the relative heights of the individual positions are.  For example, in the genomic logo all of the Gs on the left are about the same height, but in the uptake logo the first G is much smaller than the others.

One issue our paper needs to address is the reasons that these two logos are so different.

Both of these logos are derived by considering only how frequent each base (A, G, C or T) is at each position in the set of sequences being analyzed.  The analysis doesn't consider the actual sequences.  For example, the two sets of sequences in the figure below (made using WebLogo) give the same logo. But the two sets of sequences are different; in the left one we have only strings of six As or six Ts, whereas in the second the As and Ts are often interspersed or in strings of different lengths.

The postdoc has done a detailed analysis of the actual sequences taken up by the cells (see figure in this post), to find out the importance to uptake of the interaction effects that the logo analysis doesn't consider.  We were both thinking that these interaction effects might be responsible for at least part of the difference between the uptake-bias logo and the genomic logo.

But one of the reviewers of the version we originally submitted said that we were wrong: "If the consensus in the genome reflects only the incoming DNA and the filtering at the outer membrane (as the authors state) then the two consensus should be similar with or without interaction effects because the genomic consensus is the simple result of the initial consensus."  I've thought about this today, and I now think the reviewer is correct.

Let's consider two simple situations for an imaginary uptake machinery whose preferred sequences gave the A&T logo above.  In Situation 1, the actual sequences were those in Set 1, and we would conclude that there were strong interaction effects between the positions because the machinery preferred a sequence where six Ts in one strand were basepaired with six As in the other strand.  In Situation 2, the actual sequences were those in Set 2, and we would conclude that the uptake machinery preferred a string of six A:T basepairs but didn't care which base was in which strand at any position.

Now let's imagine that species exist with each of these uptake biases, and that each uptake bias is causing its preferred sequences to accumulate in its species' genome (because these sequences come in as part of longer DNA fragments that often replace homologous sequences in the genome by recombination - this is our molecular drive model).  In Situation 1 the genome will accumulate strings of 6 As on one strand paired with six Ts on the other.  In Situation 2 the genome will accumulate strings of six A:T pairs in various orders.

Now we sequence the evolved genomes, collecting sets of the overrepresented sequences in each, and make logos of the sequences.  Both logos will look like the logo above.  To see the how the interaction effects in the uptake bias affected the accumulated sequences in the genome, we'd have to do an interaction analysis of the genomic sequences.

Years ago we did an interaction analysis of the genome sequences; you can see them in the last figure in this post from 2006.  It found only weak interactions, and only between adjacent or near-neighbour positions, very different from the interactions the postdoc has identified in the uptake bias.  More recently he applied his interaction analysis to the set of genomic uptake sequences, and he's now  repeating it (that's easier than digging through his notes to find what it showed).

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS