RRResearch: The one big problem

Now that we have more data from our model and I've summarized it, we're left with one big problem: Why don't more uptake sequences accumulate in our simulated evolved genomes?

In all the generic simulations the bias favouring perfect matches to the 10 bp consensus US is very strong - they're favoured 100-fold over sequences with one mismatch to the consensus. If nothing else is limiting I would expect the final evolved genomes to have about one of these perfect US in each uptake-sized segment. That is, if the fragments taken up in the run were 1000 bp, I would expect the evolved genome to have about one perfect US per 1000 bp. If the simulated fragments were 100 bp, I'd expect about one perfect US per 100 bp. But my impression is that we get a lot less than this. The outcome appears to be independent of mutation rate, so the explanation shouldn't be that accumulation is limited by mutational decay of uptake sequences.

The other variable I could change is how quickly the bias is reduced when fewer than the desired number of fragments have recombined. At present it's 0.75. Increasing it would make the early cycles go slower, but with such a high mutation rate this isn't a big problem. I'll try a couple of runs using 0.9 and 0.95.

I wrote 'my impression is' above because I haven't really gone back and redone/rechecked the simulations that should have given the most accumulation. These would be ones where the amount of the genome replaced by recombination each generation is almost 100%, or where the genome mutation rate is set to zero. I'll look carefully at what's already done and queue up some new runs today.

This is a really important issue. We need to understand what is limiting US accumulation in the model, and I don't think we do yet.

Update: I've checked the runs my co-author did before she left. With 100 bp fragments, replacing 50% of the genome in each cycle, she always got less than one perfect US per kb. A run that started with 200 perfects seeded in a 200 kb synthetic sequence ended with 33, not 2000. A run that started with a random sequence ended with 18 perfects. These may not have been the final equilibria, but , in the last half of both runs the numbers of perfects were wandering around between 10 and 100, never more. In these runs the mutation rates for the genomes were very low (the rates for the fragments were 0.001), so the problem isn't background mutational decay of perfect US (the numbers of singly mismatched US at equilibrium were also low, about 100-150).

While I was traveling I did some runs with 500% of the genome replaced - in practice this means that most parts of the genome would have replaced several times before the cycle ended, and only a few bits would have not been replaced at all. These runs had less sensitivity because the genomes were only 20 kb, but they did give higher scores, with about 30-40 perfect US in the 20 kb genome. With this much recombination, the mutation rate of the genome itself becomes irrelevant as the almost the whole mutated genome is replaced by recombination in each cycle.

Should I repeat these runs, and do some with fragments only 50 bp long too? The only reason would be to increase my confidence that these are real equilibrium outcomes. I had terminated my runs when the scores seemed stable, but maybe they would creep up if left long enough (maybe using a seeded genome isn't a very good test of the true approach to equilibrium). The results were the same with mutation rates of 0.001 and 0.01, so I could use a rate of 0.01 and set it to run for 10,000 cycles, which is more than 10 times longer than my other runs went for.

Field of Science

RRResearch

The one big problem

No comments:

Post a Comment