The reviewers of our USS manuscript didn't feel that our Gibbs motif analysis of USSs was much of an advance on the previous analyses. It's true that the motif identified by the Gibbs analysis is very similar to that found by searching for perfect USS cores. But the results could have been otherwise, and it's important to have found this out. So in our revisions we need to do a better job of explaining why the motif analysis was needed.
First I should clarify that the USS should be viewed not as a replicative 'element' but as a 'motif'. Both terms can refer to sequences or sequence patterns that are present at multiple sites in the genome but, at least for the purposes of this blog, a replicative element is a DNA sequence whose repeats have arisen by copying and insertion. Transposons and insertion sequences are examples of replicative elements that code for their own replication; Alu sequences are elements that are passively replicated. USSs could have also been elements produced by some sort of copying and insertion process, but we now know they are not.
The term 'sequence motif ' can be used for any detectable sequence pattern that occurs in multiple locations or genomes. It is commonly applied to DNA sequence patterns that have been selected for binding by specific DNA-binding proteins such as polymerases, transcription factors, and repressors, and to amino acid sequence patterns that perform specific functions in proteins. These motifs arise by point mutations in preexisting sequences, not by copying and insertion. Typically they are short (5-25bp), and have much weaker consensuses than do replicative elements, with most or all instances differing at one or more positions from the consensus. (Different copies of replicative elements are often identical over hundreds of bp.)
I won't go into the compelling evidence here, but we now know that individual USSs clearly arise by normal point mutations, not by copying. Although previous analyses of genomic USSs did not explicitly consider the distinction between replicative elements and motifs, they were limited by the need to search for specific sequences. The results therefore reflected only the properties of those specific sequences, with no allowance for the true diversity of functional USSs.
Given our new knowledge that USSs are motifs, any analysis of USS evolution had to be built on a solid understanding of their true diversity. The availability of the Gibbs motif sampler program let us search the whole H. influenzae genome for any patterns, without having to specify any sequence. Once the program found a pattern, it created a list of all the sites in the genome fitting the pattern, with a measure of the strength of each match. Thus it provided an unbiased analysis of the full diversity of USS-related sequences in the genome.
A new kind of problem
12 hours ago in RRResearch