As part of our new-improved Perl model of uptake sequence evolution, we had been intending to incorporate the usual transition:transversion bias into the part of the model that simulates mutation of the evolving sequence. But it's turning out to be HARD.

In the previous version, the mutation step incorporated a bias of the same strength as the user-specified base composition. For the H. influenzae genome (38% G+C), the routine we were using caused the mutagenesis to produce As and Ts each 31% of the time and to produce Gs and Cs each 19% of the time. This was perfectly satisfactory (or would have been if not for other components of the mutagenesis that were unnecessarily cumbersome).

At a recent planning session we thought we had figured out a way to also have transition mutations (A<->G and C<->T) occur twice as often as transversion mutations, while maintaining the specified base composition. But, when we implemented these steps into a sub-program, the base composition (initially 38%G+C) increased with each cycle of mutagenesis, leveling out at about 45% G+C. So we went back to the drawing board (the big whiteboards in the hall) and tried to understand what was wrong.

Several things were wrong. One was an error in the computer code. We fixed that, but there was another error in the implementation, so we fixed that too. Then it became clear that there was also a fundamental error in our planned steps. We had thought that we simply needed to specify the ratio of A+T to G+C and the transition bias (2-fold). But with transition bias the number of each type of mutation depends not only on the properties of the mutagenesis algorithm but on the proportions of the bases in the sequence. For example, mutagenesis of a genome with lots of As will produce more mutations to Gs than will the same mutagenesis steps acting on a genome with few As.

So I spent much of this afternoon doing algebra, trying to come up with a general relationship between the base composition bias of the mutagenesis steps and the equilibrium base composition it will produce. Unfortunately I only do algebra about once every 5 years and, although I remember the very basic rules I learned in grade 9, I have none of the skills and creativity that a regular user would have. Or maybe the problem I was trying to solve is just intrinsically messy. In any case, I covered two whiteboards with Xs and Fs and parentheses but the equations never simplified. I could call on a mathematician friend for help, or we could simply decide that incorporating a transition:transversion bias is an unnecessary refinement that actually won't make any difference to the outcome of our model.

For now we're going to take the latter approach, which will allow our programming assistant to create some working code. If we later figure out how to incorporate the transition:transversion bias, we can probably just add the necessary lines to the mutagenesis section of the program.

in The Biology Files

Can I suggest calling on a mathematician friend for help? You're probably better than me at this stuff, but a while ago I tried noodling out some equations for a process I was interested, and eventually got it down to a half page of a brute-force script. A while later a mathematician friend was visiting and I mentioned the problem to him. He immediately wrote out a single line consisting of 2 symbols and one constant that did the same as I was trying to do, but better...

ReplyDeleteI’m not a math wizard, and I fail hard at Perl, but if the mutagenesis algorithm is as simple as described (twofold more transitions than transversions), an explanation likewise simple would be as following: in a round of mutation, an A base has a 67% chance to be converted to G, and 17% each to C or T, giving a 83% chance to end up as a G or C. Likewise, a T would result in a G/ C at 83%. Accordingly, C or G bases would end up as A/T at 83%. As there are more A/T (62%) than G/C (38%) at the beginning, more A/T would be converted to G/C than vice versa, resulting in an increase in G/C. (If rounds of mutagenesis would be run to infinity, G/C would be approximate 50%.)

ReplyDeleteA crude but simple solution could be to allow more mutations with G/C bases than with A/T, probably at a 31:19 ratio. I don’t know how realistic that would be (on the other hand, how does the organism maintain its G/C content), nor how this solution would fit your purposes though.

Dear Dr. Rosie Redfield

ReplyDeleteI was wondering if you might happen to know what an overall transition/transversion bias of R=9.394 means?

Thanks in any case.

dimitra.