Field of Science

Have bacteria evolved gene-specific rates of point mutations?

A paper just out in Nature (Martincorena et al. Evidence of non-random mutation rates suggests an evolutionary risk management strategy) concludes that E. coli genes have different mutation rates.  Genes that serve important 'housekeeping' functions mutate less often than genes that are used less often or whose functions are less important for survival.

Although such a difference in mutation rates might indeed be beneficial, since most non-neutral mutations are harmful, the result seems very improbable because we don't know of any mechanism by which the processes that cause mutations could adjust their activities according to the function of particular DNA sequences.  The authors don't know of any such mechanism either but they postulate that one must exist.

This is very reminiscent of the 'directed mutation' controversy that arose about 15 years ago, in response to work by Jim Shapiro and John Cairns showing that selection for ability to use a sugar was much more effective if the sugar was present in the environment.  That phenomenon has been shown to not be due to changes in the mutation rate (considered per base pair), but to initially unsuspected cryptic growth on the sugar and changes in the number of copies of the gene under selection.

Mutation rates are tricky to measure directly because mutations are identified by examining the phenotypes or DNA sequences of bacterial cultures many generations after the mutations would have happened.   This means that there has been plenty of time for confounding forces to also act on the mutations - we find only the mutations present in surviving cells, not all the mutations that happened.  The most important confounding force is thought to be natural selection acting on any phenotypic changes the mutations cause, but lots of other factors are known or suspected.

On first reading, I think that the authors of this paper did a good job of controlling for these factors.  But, given what we know about the processes that cause and prevent mutation, their results are so improbable that  I suspect they have missed other factors we don't know about yet.  So I predict that, like the directed mutation controversy, the long-term outcome of this work will be identification of additional confounding factors in the analysis of mutation rates rather than of a clever risk management strategy in the bacteria.

Here's a quick outline of what the authors did:  They started by comparing the genome sequences of 34 E. coli isolates; I think these were sequences available in GenBank, not ones they determined themselves.  Even very closely related bacteria like these have a lot of variation in which genes are present, so the authors first identified a set of 3420 genes, each of which was present in at least 75% of these genomes.  They then carefully compared the DNA sequences of these genes to find all the differences, which must have arisen by mutations accumulating over the many millions of years since these genes shared a common ancestor.

They then filtered out all the differences whose accumulation might have been confounded by natural selection.  First they eliminated from consideration all the differences that changed an amino acid encoded by the DNA.  Then they corrected for effects of E. coli's known codon biases, because mutations that don't change the specified amino acid may still change how efficiently that amino acid is incorporated into the specified protein.  They also corrected for suspected effects of RNA folding by trimming off the ends of the gene sequences (I'm not sure how effective this would be...).

This analysis produced estimated gene-specific mutation rates that differed by as much as ten-fold (look at the jagged line and two examples below).  The mutation rates of nearby genes were strongly correlated over distances of 10-20 kb, especially for genes that were assigned the same 'function' and the same direction of transcription; these are likely to be mostly genes in the same operon.

One factor I wanted more information about is the functional classification scheme used.  This was something I hadn't heard of - the Multifun classification for E. coli, developed by Monica Riley and M. H. Serres.  It looks good, certainly better for E. coli genes than the usual COG analysis (clusters of orthologous groups).

Another issue important for their conclusions is how they assigned functional importance to each gene. They estimated the strength of selection on each gene using the number of changes that did change the encoded amino acids (the info they had discarded in estimating mutation rates).  By this measure, genes in subsets with higher mutation rates tended to have weaker evidence of selection.  Genes in the low-mutation-rate subsets were also enriched for known to be essential for survival in lab culture in rich medium, and they were, on average, expressed as mRNA at higher levels.

The authors then examined how other confounding effects might alter the results, by examining the sequences for evidence that natural selection had acted on them, by checking the possible sizes of other confounding effects (transcription-coupled DNA repair, base composition, homologous recombination), and by using computer simulations to estimate the sizes of possible effects.  These analyses revealed only effects that would be much too small to explain the big differences in estimated mutation rates they found.

Bottom line:  This appears to be a very well done piece of work.  (The Supplementary Materials file is enormous and dense with relevant information and analyses.)  Nevertheless I'm very skeptical of their conclusion that cells have evolved a mechanism to mark important genes and protect them from mutation.  That's both because we don't know of any way cells could do this, and because I think natural selection on such 'evolvability' traits is likely to be many orders of magnitude weaker than as-yet-unidentified direct effects on mutation accumulation.


  1. Rosie, I havent thought too much about this but my first reaction is that methylation could be one explanation. Mutation rates increase with decreasing levels of DNA methylation in some eukaryotes. So perhaps methyl marking (tends to be Adenine in bacteria) of genes is a way by which bacteria either passively or actively reduce mutation rates. Separating causation from correlation would be the tricky part.

  2. Sounds to me like they're measuring fixation frequencies not mutation rates? Am I missing spmething?

    If mutation rates were constant and all the alleles were fixed by random genetic drift then in 34 strains you would expect a fairly wide range of differences just by chance. Did they compare their observed distribution to the distribution by chance alone?

    1. Hi Laurence,

      The red lines in the figure Rosie posted indicate what they think the 2.5% limits are if the mutations were totally random as determined by their Monte Carlo simulations. Since so much of the observed deviation occurs outside of that range they think there is more to the story.

      I think one possible explanation is definitely what Seth said. Another might be rampant homologous recombination. If the more important genes mutated at the same high rate, but were decoupled from the genomes and could sweep the populations then maybe you could expect there to be lower fixed diversity because you'd periodically reset the diversity through gene sweeps. This paper published recently seems to show that, at least in the vibrio species studied, positively selected genes are decoupled from genomes and sweep the populations, indicating a much higher level of homologous recombination than is often appreciated.

      There is one very concerted method that has been described that induces mutations in certain genes and not others, the diversity generating retroelements first described in the bordetella bacteriophage BPP-1 from a group at UCLA. I was disappointed not to see it mentioned in the paper, it is SUCH a cool story it should get way more press than it does(no affiliation with the authors). From the abstract of a recent open source paper on the subject:

      "Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR)."

      First time posting a comment, so want to say thanks a lot for the blog Rosie, you've got a number of fans here in Pasadena.


    2. Based on several analyses in the paper, homologous recombination does not explain the observations.

    3. Yea, they definitely considered recombination a number of times in the SI, but what I was commenting on was the possibility that this assumption might not hold:

      "Furthermore, the conservation of long-distance linkage due to the lack of crossover recombination in bacteria ensures that almost all loci within a genome share a very similar demographic history and that the effective population size is largely uniform... ...This contrasts with sexual organisms that have crossover recombination, leading to linkage disequilibrium asymptotically decaying to zero with distance and so to different regions of the genome potentially having different demographic histories"

      My point was that its possible that bacterial genomes behave a lot more like sexually reproducing genomes than we previously expected, which could maybe explain the results (as they found in the paper I referenced). Since posting though I've read more and I think their plot of linkage disequilibrium vs genomic distance in figure S16 might rule that possibility out. Is that what you were referencing, or is there something else?

    4. Haven't had a chance to thoroughly read the paper (damn teaching responsibilities!)...but is there a chance chromosomal position has something to do with this? There's an Ochman lab paper from early on in the 2000's that starts to look at this. Just kind of thinking out loud (although I like Seth's comment above)

  3. This comment has been removed by the author.

  4. I read the entire paper and all the supplemental information. I haven't got a clue about what they did, why they did it, or whether their methods are valid. It might as well be gibberish as far as I'm concerned.

    All I know is that it's extremely unlikely that there has been selection for ten-fold differences in the error rates of DNA replication/repair between different local regions in the E. coli genome.

    As the authors themselves note, this is an extraordinary claim that cannot be explained by our "current knowledge of factors influencing the mutation rate." It's too bad that there are only a handful of people in the entire world who are capable of evaluating their extraordinary evidence.

    1. I think you badly underestimate the number of people who possess appropriate training to evaluate the results of the authors. I can think of literally dozens of faculty members off the top of my head who would be up to the task, and the number would increase to well over 100 if you counted the students and postdocs trained by those faculty. And that's only in my rather narrow field. I'm certain that having people from other fields join the critique from different perspectives would result in a healthy number of potential critics on the order of 1,000 or more. And we really only need 5% or so of those people to actually get motivated enough to ensure that this gets thorough vetting.

  5. Hmmmm, I don't think the differences attributed to selection on the mutation rates are ten-fold. The non-randomness is justified from associations with selection on the proteins, expression level and so on. But these differences are much more moderate, definitely not ten-fold.

    So, only part of the variation observed can be claimed to be non-random with respect to function, and so caused by selection on error/repair rates. The rest may well be caused by other rather random factors.

  6. Sorry, my comment above was a reply to Laurence's comment.

  7. Mut says,

    The non-randomness is justified from associations with selection on the proteins, expression level and so on.

    You say that as though it's perfectly clear to you why the mutation rate should vary due to selection on proteins and expression level. Can you explain it to me?

    Why should the DNA replication error rate depend on the function of the protein encoded by the DNA sequence that's being duplicated?

  8. As you state in your blog, 'there has been plenty of time for confounding forces to also act on the mutations - we find only the mutations present in surviving cells, not all the mutations that happened'. So maybe it is perhaps a somewhat naive approach, but is it possible, although it seems very unlikely, that it is not considered that many mutations, especially if they relate to 'housekeeping-functions', are lethal and therefore not taken into account. I was not able to find out how this correction was applied to their results but I might have overlooked it.

  9. thanks for the critique.

  10. Hi All,

    Thanks for the careful evaluation of our paper Rosie. We have recently written a review/perspective on the evolutionary forces and mechanisms that can give rise to the non-random mutation rates along a genome. I thought some of you may find it of interest.

    Best wishes,


  11. I've just randomly stumbled upon this post and I'm hoping that somebody might expand on Marleen's comment. My first reaction to the findings are mirrored in her comment. I see that the researchers tried to account for natural selection but I have a hard time understanding how that can be done when many mutations in the "housekeeping" genes might have been unobserved simply because they didn't survive. This seems too important to have possibly been overlooked or abused in some way and, since I'm not a molecular biologist, I would be interested in somebody explaining how they accounted for that. Thanks.


Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="">FoS</a> = FoS