Yesterday was my turn to do lab meeting, and I began by talking about the Gibbs searches I'm using to characterize the variation of the USS motifs in different species.
I made a point about how USS have usually been studied 'typologically', as if there was one 'real' USS (in H. influenzae the core AAGTGCGGT) to be identified and characterized. I drew an analogy with how biologists used to think about species (pre-Darwin) said that we now needed to make the transition to population-based thinking about USSs, treating the variation as an essential component of the phenomenon. That's what the Gibbs motif searches help us do.
But this morning I realized that population thinking isn't going far enough. The different USSs in a genome are in no sense a population in the same sense as a the members of a species are. Rather, each USS site evolves independently. The come to be similar because all of them are subject to the same evolutionary force - the molecular drive created by biased DNA uptake. This means that the different USS sites in the genome are only similar because of convergence.
Biologists are used to invoking evolutionary 'conservation' (preservation of similarities inherited from a common ancestor) whenever aligned sequences have more similarity than can be explained by chance. So, for example, we say that two recA genes are more conserved than the sequences that flank them.
I've been fighting the tendency (mine and others') to refer to conserved similarities when describing USS similarities, because I know that distinct sites do not have common ancestry. Maybe now I can just substitute 'converged' for 'conserved' in all the usual statements about similarities.
A mathematical theory of communication
19 hours ago in Doc Madhattan