We can use the relationship between the genomic mutation rate the model is assuming and known real mutation rates for a rough calibration. Real mutation rates are on the order of 10^-9 per base pair per generation. Using such a low rate per simulation cycle would make our simulations take forever, so we've been using much higher rates (usually 10^-4 or higher) and treating each simulation cycle as collapsing the effects of many generations. But we didn't take the implications of this seriously.
If we do, we have each simulation cycle representing 10^5 generations. How much DNA uptake should have happened over 10^5 generations? We could assume that real bacteria take up about one fragment of about 1 kb every generation. The length is consistent with a recent estimate of the length of gene-conversion tracts in Neisseria, but the frequency is just a guess. I don't know whether it's a conservative guess, but if bacteria are taking up DNA as food it's probably on the low side of reality.
How much of this DNA recombines with the chromosome? For now lets assume it all does. This means that, in each cycle of our simulation, a simulated full-size genome would replace 10^5 kb of itself by recombination with incoming DNA. Because a real genome is only about 2 x 10^3 kb, each part of it would be replaced about 100 times in each cycle. We could cause this much recombination to happen in our model. but it wouldn't simulate reality because there wouldn't be any reproduction or DNA uptake between the multiple replacements of any one position.
We can make things more realistic by (1) assuming that only about 10% of fragments taken up recombine with the genome, and (2) decreasing the genomic mutation rate by a factor of 10, so each cycle only represents 10^4 generations. Now most of the genome gets replaced once in each cycle.
What about the fragment mutation rate? We might assume that, on average, the fragments that a cell takes up come from cells that are separated from it by about 5 generations. That is, the cell taking up the DNA and the cell the DNA came from had a common ancestor 5 generations back. This means that 10 generations of mutation separate the genome from the DNA it is taking up, so the fragment-specific mutation rate should be 10 times higher than the genomic mutation rate.
So I have a simulation running that uses a genome mutation rate of 10^-5 and a fragment mutation rate of 10^-4. The fragments are 1 kb long, and the cell considers 100 of these each cycle.
One other modification of the program matters here. We've now tweaked the program so that it can either start a run with a random-sequence 'genome' it's just created, or with a DNA sequence we give it, that can be taken from a real genome with real uptake sequences.
So the run I'm trying now starts not with a random sequence but with a 50 kb segment of the H. influenzae genome. This sequence already has lots of uptake sequences in it, so about half of the 1 kb fragments the model considers in each cycle pass the test and are recombined into the genome. I'm hoping the new conditions will enable the genome to maintain these uptake sequences over the 50,000 cycle run I have going overnight.