RRResearch: What factors limit simulation speed?

I'm at the part of the manuscript were we introduce the analyses done with our simulation model, and I want to begin by considering which parameters affect how fast the runs are, because run speed is what limits our ability to fully investigate how the different factors affect uptake sequence evolution.

Mutation rate. Runs with lower mutation rates take longer to reach equilibrium. I expect the effect to be proportional - a run with a ten-fold lower mutation rate will take 10 times as many cycles to reach equilibrium.
Amount of DNA recombined per cycle. The individual cycles will take longer if more DNA needs to be recombined. The amount of DNA is the product of the number of fragments to be recombined and the length of the fragments; although cycles using many short fragments may take a bit longer than the same amount of DNA in a few long fragments, I expect this effect to be minor. On the other hand, fewer cycles should be needed when more DNA is recombined in each cycle, but this only applies when the total amount of recombination is substantially smaller than the length of the genome.
Present state of the genome. More fragments will need to be tested per cycle when the genome has few good uptake sequences.
Length of the genome. Treated in isolation, this effect should be minor. Runs with long genomes will take longer to score, but scoring is only done once every 10 cycles. And longer genomes give better estimates of equilibrium US density. But since we set the amount of DNA recombined to be a constant fraction of the genome, genome length indirectly has a very large effect on run length.
Number of fragments tested at a time. Taken in isolation, this is largely irrelevant to run time, because it should only make a difference at the start of runs initiated with genomes very rich in uptake sequences. But it determines how fast the bias is decreased, which determines how many fragments have to be tested to get the required recombination, so fewer fragments per set will make the run go faster because bias will decrease faster.
Recombination adjustment factor. This directly controls how quickly the bias is reduced within each cycle; a value close to 1 will slow the bias reduction and so make the cycles take longer than with a factor of, say, 0.5.
Recombination multiplier. Higher values (e.g. 10 or 100) will make each cycle take longer because the initial bias value for the cycles will be higher. But I think we should just keep it constant at 10.
Length of fragments: Short fragments are faster to score, but are less likely to contain uptake sequences so will require testing more fragments.

Now, which of these factors might affect the nature of the equilibrium?

Mutation rate. Yes it certainly could. In earlier versions of the model it made a big difference, but the former post-doc's statistical analysis of the assorted preliminary runs found that it didn't matter with this version. I'm doing several runs right now specifically to confirm this, asking whether rates of 0.01, 0.001 and 0.0001 give the same equilibrium scores.
Amount of DNA recombined per cycle. This should be very important, because the bias only exerts its effect through recombination, whereas mutation is independent of recombination (I'm using the same rates for the genome and for the fragments).
Present state of the genome. By definition this shouldn't affect the equilibrium.
Length of the genome. The score isn't normalized for genome length; it's simply the sum of the scores of all the positions of the sliding window. (Aside on how the scoring is done: the window is the same size as the uptake-sequence, and each window-position score is the product of the scores of the individual bases in the window. Because a matched base is worth 100 times more than a mismatched base, only the highest scoring matches to the matrix make a significant contribution to the score.) We're interested in uptake sequence density, not absolute number, so this shouldn't matter. We can adjust the scores for fragment length at the manuscript level if appropriate.
Number of fragments tested at a time. Because of its effect on how quickly bias is reduced, this should matter.
Recombination adjustment factor. This should matter because it sets how quickly bias is reduced.
Recombination multiplier. We'll keep this constant at 10.
Length of fragments. This should matter because the bias acts on the whole fragment recombined, not just on the uptake sequence whose score determined the probability that recombination would happen. When a long fragment recombines it may contain mutations that worsen other uptake sequences it spans. So we might expect that short fragments would give a higher density of uptake sequences and thus higher scores.

If all goes well, my mutation-rate test will confirm that rate doesn't matter, and I can write up the former post-doc's set of analyses with a rate of 0.001. Then I'll add a few tests with real intergenic sequences and call it quits. The simulation results are only one part of the manuscript, so I shouldn't get carried away with trying to improve them more than necessary.

Field of Science

RRResearch

What factors limit simulation speed?

No comments:

Post a Comment