A paper just out in Nature (Martincorena et al. Evidence of non-random mutation rates suggests an evolutionary risk management strategy) concludes that E. coli genes have different mutation rates. Genes that serve important 'housekeeping' functions mutate less often than genes that are used less often or whose functions are less important for survival.
Although such a difference in mutation rates might indeed be beneficial, since most non-neutral mutations are harmful, the result seems very improbable because we don't know of any mechanism by which the processes that cause mutations could adjust their activities according to the function of particular DNA sequences. The authors don't know of any such mechanism either but they postulate that one must exist.
This is very reminiscent of the 'directed mutation' controversy that arose about 15 years ago, in response to work by Jim Shapiro and John Cairns showing that selection for ability to use a sugar was much more effective if the sugar was present in the environment. That phenomenon has been shown to not be due to changes in the mutation rate (considered per base pair), but to initially unsuspected cryptic growth on the sugar and changes in the number of copies of the gene under selection.
Mutation rates are tricky to measure directly because mutations are identified by examining the phenotypes or DNA sequences of bacterial cultures many generations after the mutations would have happened. This means that there has been plenty of time for confounding forces to also act on the mutations - we find only the mutations present in surviving cells, not all the mutations that happened. The most important confounding force is thought to be natural selection acting on any phenotypic changes the mutations cause, but lots of other factors are known or suspected.
On first reading, I think that the authors of this paper did a good job of controlling for these factors. But, given what we know about the processes that cause and prevent mutation, their results are so improbable that I suspect they have missed other factors we don't know about yet. So I predict that, like the directed mutation controversy, the long-term outcome of this work will be identification of additional confounding factors in the analysis of mutation rates rather than of a clever risk management strategy in the bacteria.
Here's a quick outline of what the authors did: They started by comparing the genome sequences of 34 E. coli isolates; I think these were sequences available in GenBank, not ones they determined themselves. Even very closely related bacteria like these have a lot of variation in which genes are present, so the authors first identified a set of 3420 genes, each of which was present in at least 75% of these genomes. They then carefully compared the DNA sequences of these genes to find all the differences, which must have arisen by mutations accumulating over the many millions of years since these genes shared a common ancestor.
They then filtered out all the differences whose accumulation might have been confounded by natural selection. First they eliminated from consideration all the differences that changed an amino acid encoded by the DNA. Then they corrected for effects of E. coli's known codon biases, because mutations that don't change the specified amino acid may still change how efficiently that amino acid is incorporated into the specified protein. They also corrected for suspected effects of RNA folding by trimming off the ends of the gene sequences (I'm not sure how effective this would be...).
This analysis produced estimated gene-specific mutation rates that differed by as much as ten-fold (look at the jagged line and two examples below). The mutation rates of nearby genes were strongly correlated over distances of 10-20 kb, especially for genes that were assigned the same 'function' and the same direction of transcription; these are likely to be mostly genes in the same operon.
One factor I wanted more information about is the functional classification scheme used. This was something I hadn't heard of - the Multifun classification for E. coli, developed by Monica Riley and M. H. Serres. It looks good, certainly better for E. coli genes than the usual COG analysis (clusters of orthologous groups).
Another issue important for their conclusions is how they assigned functional importance to each gene. They estimated the strength of selection on each gene using the number of changes that did change the encoded amino acids (the info they had discarded in estimating mutation rates). By this measure, genes in subsets with higher mutation rates tended to have weaker evidence of selection. Genes in the low-mutation-rate subsets were also enriched for known to be essential for survival in lab culture in rich medium, and they were, on average, expressed as mRNA at higher levels.
The authors then examined how other confounding effects might alter the results, by examining the sequences for evidence that natural selection had acted on them, by checking the possible sizes of other confounding effects (transcription-coupled DNA repair, base composition, homologous recombination), and by using computer simulations to estimate the sizes of possible effects. These analyses revealed only effects that would be much too small to explain the big differences in estimated mutation rates they found.
Bottom line: This appears to be a very well done piece of work. (The Supplementary Materials file is enormous and dense with relevant information and analyses.) Nevertheless I'm very skeptical of their conclusion that cells have evolved a mechanism to mark important genes and protect them from mutation. That's both because we don't know of any way cells could do this, and because I think natural selection on such 'evolvability' traits is likely to be many orders of magnitude weaker than as-yet-unidentified direct effects on mutation accumulation.
Information and Structure in Complex Systems
1 day ago in PLEKTIX