The big goal is to simulate how uptake sequences accumulate in genomes of competent bacteria, under the combination of mutation pressure (a randomizing force) and biased uptake preferring fragments containing these sequences. The model follows a single genome-sized sequence through repeated cycles in which
- Random segments of the genome are treated as if they were fragments in an external DNA pool released by descendants of the 'index genome'
- These fragments are scored for quality of their match to the ideal uptake sequence. The best fragment is chosen for the uptake step
- In the conceptual meantime, the index genome itself undergoes random mutation, becoming the descendant index genome.
- The chosen fragment's sequence replaces the homologous sequence in the descendant index genome.
- This recombinant sequence becomes the new index sequence and the cycle starts again at step 1.
There have been lots of issues to resolve along the way (how the mutation steps maintain the base composition of the sequence, how the uptake sequences are scored), but we finally have a program that runs. It seems to be working correctly, but the undergraduate who's done most of the work tells me that it isn't causing any uptake sequences to accumulate. He's quite a sophisticated undergraduate - he has a Biochemistry degree under his belt and is nearly finished a second degree, in computer science - and he's done a lot of statistical analysis to look for the expected accumulation.
I suspect that the problem is that so far the model is using inappropriate parameters (mutation rates too low or too high? uptake bias settings too weak or too fussy? numbers of cycles too short?). Today's goal is to figure out what these might be.