As part of our new-improved Perl model of uptake sequence evolution, we had been intending to incorporate the usual transition:transversion bias into the part of the model that simulates mutation of the evolving sequence. But it's turning out to be HARD.
In the previous version, the mutation step incorporated a bias of the same strength as the user-specified base composition. For the H. influenzae genome (38% G+C), the routine we were using caused the mutagenesis to produce As and Ts each 31% of the time and to produce Gs and Cs each 19% of the time. This was perfectly satisfactory (or would have been if not for other components of the mutagenesis that were unnecessarily cumbersome).
At a recent planning session we thought we had figured out a way to also have transition mutations (A<->G and C<->T) occur twice as often as transversion mutations, while maintaining the specified base composition. But, when we implemented these steps into a sub-program, the base composition (initially 38%G+C) increased with each cycle of mutagenesis, leveling out at about 45% G+C. So we went back to the drawing board (the big whiteboards in the hall) and tried to understand what was wrong.
Several things were wrong. One was an error in the computer code. We fixed that, but there was another error in the implementation, so we fixed that too. Then it became clear that there was also a fundamental error in our planned steps. We had thought that we simply needed to specify the ratio of A+T to G+C and the transition bias (2-fold). But with transition bias the number of each type of mutation depends not only on the properties of the mutagenesis algorithm but on the proportions of the bases in the sequence. For example, mutagenesis of a genome with lots of As will produce more mutations to Gs than will the same mutagenesis steps acting on a genome with few As.
So I spent much of this afternoon doing algebra, trying to come up with a general relationship between the base composition bias of the mutagenesis steps and the equilibrium base composition it will produce. Unfortunately I only do algebra about once every 5 years and, although I remember the very basic rules I learned in grade 9, I have none of the skills and creativity that a regular user would have. Or maybe the problem I was trying to solve is just intrinsically messy. In any case, I covered two whiteboards with Xs and Fs and parentheses but the equations never simplified. I could call on a mathematician friend for help, or we could simply decide that incorporating a transition:transversion bias is an unnecessary refinement that actually won't make any difference to the outcome of our model.
For now we're going to take the latter approach, which will allow our programming assistant to create some working code. If we later figure out how to incorporate the transition:transversion bias, we can probably just add the necessary lines to the mutagenesis section of the program.
in The Biology Files