Recently I've run some USS-evolution simulations that started with a 50 kb segment of the H. influenzae genome rather than with a random sequence of base pairs. I used the position weight matrix derived by Gibbs analysis of the whole genome, thinking that this would be a non-arbitrary measure of the over-representation of the uptake sequence pattern. I was hoping that this bias would be strong enough to maintain the uptake sequences already in the genome, but the genome score fell to about half (or less) of its original value after 50,000 or 100,000 cycles.
That started me wondering whether the position weight matrix should be treated as a fixed set of values, or just as telling us the relative value that should be applied to each base at each position. Said another way, could different settings of the Gibbs analysis have given a very different matrix? The answer is No, but only because the Gibbs analysis reports the weight matrix as the frequency of each base at each position, so the sum of the weights of the four bases at each position must add up to 1. So if we want to consider stronger or weaker matrices, there's no reason not to multiply all the values by any factor we want to test.
So I think the first goal is to see when strength of matrix is needed to maintain the uptake sequences in the genome at their natural frequencies. Then we can see whether the uptake sequences are still in the places they started at, or whether these have decayed and new ones arisin in new places.
A new kind of problem
12 hours ago in RRResearch