The comments on my last post recommended using the Perl profiling function to find out which parts of our simulation are indeed responsible for its slow runtime. So I Googled "DProf" and ran it on the simulation, and then poked around to find out how to use 'dprofpp' to turn the DProf output into a table listing where most of the time was being spent. Pretty soon I discovered that I just needed to type 'dprofpp' at the usual prompt. (duh).
Then I spent quite a while trying to figure out what the profile entries meant. Here's an example of the output.
- Total Elapsed Time = 65.63472 Seconds
- User+System Time = 63.53472 Seconds
- Exclusive Times
- %Time ExclSec CumulS #Calls sec/call Csec/c Name
- 41.6 26.44 37.689 543882 0.0000 0.0000 Switch::case
- 28.9 18.37 18.373 241893 0.0000 0.0000 Switch::switch
- 17.7 11.24 11.243 543882 0.0000 0.0000 Switch::__ANON__
- 16.5 10.52 38.066 12 0.8768 3.1722 main::tallyScores
- 0.25 0.156 0.195 3065 0.0001 0.0001 Text::Balanced::_match_quotelike
- 0.23 0.146 0.227 3692 0.0000 0.0001 Text::Balanced::_match_variable
- 0.23 0.143 0.608 44 0.0032 0.0138 Switch::filter_blocks
- 0.15 0.095 0.128 1599 0.0001 0.0001 Text::Balanced::_match_codeblock
- 0.06 0.040 0.040 7432 0.0000 0.0000 Text::Balanced::_failmsg
- 0.04 0.027 0.027 13146 0.0000 0.0000 Regexp::DESTROY
- 0.03 0.020 0.029 9 0.0022 0.0032 Switch::BEGIN
- 0.02 0.010 0.010 1 0.0100 0.0100 main::createRandomGenome
- 0.02 0.010 0.010 1 0.0100 0.0100 DynaLoader::dl_load_file
- 0.00 0.000 0.000 1 0.0000 0 Exporter::Heavy::heavy_export_ok_tags
- 0.00 0.000 0.000 1 0.0000 0.0000 Exporter::Heavy::heavy_export
(I tried putting the above into a monospaced font for clarity, but the editor collapsed all the extra spaces so that wasn't much help. Putting it in point form stops the editor from collapsing all the single line breaks into one big paragraph.)
The problem is that our program wasn't written with this profiler in mind, and the big time hogs aren't separate subroutines but loops within the main program. We could change this, but here 'we' means our programming assistant, not me.
I did figure out that the various 'switch' entries are steps in the sliding-window scoring. But this scoring happens to different components of the program and in different ways. I tried running the profiler on different versions of the simulation (large or small genome sequences, large or small fragments, many or few fragments) but I couldn't disentangle the effects in the resulting profiles.
So instead I began just recording the runtimes taken by sets of runs in which I systematically varied only one of the input parameters, with the simulation set to run for only 10 cycles. The first graph shows the effect of changing the genome size but holding the fragment size and number of fragments constant (at values so low that they don't contribute much to the runtime).
The first discovery is that the 'tally' we added to help us see how the genome evolves almost doubles the runtime for each genome length (compare the blue line to the red line). This is because obtaining the tally requires a sliding-window scoring step. Even though this scoring doesn't use the position-weight matrix, the time it takes is similar to the time needed for the formal scoring of the genome. The good news is that this step is completely dispensable.
The second discovery is that the steps that mutate the genome and the fragments add almost nothing to the runtime (compare the green and red lines). So we won't bother trying to further optimize the mutagenesis.
The third discovery is that the runtime is proportional to the length of the genome. Because the fragments are few and small, this is almost certainly due to the time needed for the sliding-window scoring. This tells us that reducing the number of times the genome needs to be scored should dramatically speed up the simulations.
The second graph shows how runtime depends on the number of fragments being tested. The fragment size was held at 200bp (ten times more than in the previous set of runs), and the genome length was 10kb.
Here we see that including the tally step increases all the run's times by about 20 seconds (compare blue and red lines). This is just what we expect for a 10kb genome from the previous graph. And again the mutation steps in each cycle contribute almost nothing to the runtime.
The runtime is directly proportional to the number of fragments, consistent with the sliding-window scoring being the major factor.
To confirm that other steps in processing fragments aren't a big problem, I also varied fragment length while holding fragment number constant. This third graph shows that runtime is linearly dependent on fragment length.
Finally, I held the total length of fragments to be screened constant (the product of fragment length and number of fragments screened each cycle). If the time needed for the sliding-window scoring of the fragments is the main factor responsible for the increasing runtime, then holding the total length constant should give a constant runtime even when fragment length and number are reciprocally varied.
That's exactly what we see, provided the fragments are larger than about 100bp. (As the fragment size decreases to approach the size of the window, the number of positions the window takes decreases faster than fragment size (a 10bp fragment can only be scored at one window position). The runtime is proportional to the length of sequence being scored each cycle.
This suggests one more test. I've been keeping either the genome length or the total fragment length constant. What if I plot runtime as a function of the total length of sequence (genome plus fragments) being scored each cycle.
If the sliding-window scoring is indeed the major factor affecting runtime, then there should be a linear relationship between total sequence and runtime. And voila, here it is.