This is a continuation of the previous post about perfect and one-off USSs. I ended that post by concluding that the observed abundances of perfect and one-off USSs is predicted by both my simple math model and our complex simulation model.
I didn't clearly realize when I wrote it that this is in fact a strong test of my hypothesis that, to quote from a previous post, "...USSs accumulate in the genome by a kind of 'molecular drive', caused by the biased uptake system and by the occasional recombination with the cell's own DNA. This molecular drive is inevitable provided the cells sometimes take up DNA from other members of their own 'species', and provided this DNA sometimes replaces the corresponding part of the cells own DNA."
The low frequency of one-off USSs is not expected if we consider only the chances of USS-changing mutations and the presumed interaction between USSs and DNA-binding proteins. But the low frequency makes perfect sense if we also hypothesize that USSs accumulate by molecular drive. Said another way, when the molecular drive hypothesis is made explicit by expression as a model, it predicts precisely the otherwise-puzzling low frequency of one-off USSs.
Yesterday I told the post-docs that we should start outlining the paper we plan to write about this modeling work now, rather than waiting until we have the results. I offered to get it started, and did put together an outline, with a few paragraphs of text pasted in from previous stuff. This paper was going to include everything I described in Sunday's post. But now I realize that we would be wiser to split it into two papers, which I'm about to start outlining.
The first one will use compare the observed frequencies and distributions of USSs in real genomes to those predicted by a molecular-drive-based model of USS accumulation. Probably the model's most sophisticated features will be uptake of perfect and one-off USSs, and some degree of functional constraint on accumulation of USS in gene-sized segments. We can compare the model's predictions to observed frequencies of perfect, one-off and two-off USSs, and to observed distributions of USSs among parts of the genome that are more-or-less constrained by coding functions. The real-genome data for this latter comparison will come from a separate project I'm doing with bioinformatics researchers at another institution; I hope that project will be done in time for this modeling paper to use its data.
The conclusion of this work will be that the molecular drive hypothesis explains attributes of USSs that are otherwise unexpected.
The second paper will have a more sophisticated version of the model, and will address more complex questions about USS evolution. We won't worry much about the details until the first paper is well under way, but I'm going to start outlining it now.
11 hours ago in Variety of Life