One of the projects we're working on will produce a paper titled "Defining the Haemophilus influenzae uptake signal sequence". Today I realized that a key conceptual step is making a clear distinction between two ways we've been using the term USS.
First, we've been using it to describe the sequence preference of the cell-surface machinery that takes up DNA. Second, we've been using it to describe the consensus of the many repeats in the cell's genome that appear to match this preference. The similarity between the preference and the repeats is much too strong to be a coincidence, but that doesn't justify confounding the two meanings, as we and others have done.
In previous papers (ours and others) the preference has been used as a guide to investigating the consensus, and vice versa. But in the present project we aim to define them separately. The first part of the project is the 'unbiased' analysis of USS-like repeats in the genome I've posted about previously. The second part will be a reanalysis of old uptake data (see New bottles for old wine) and the third will be direct tests of how changing different positions in the sequence changes uptake.
We would like to create terms for the two kinds of "USS"s: the preference and the consensus. I started with "USS" for the preference and "USS-like sequence" for the consensus. But this suggests that the consensus isn't the same as the preference, though our null hypothesis predicts it will be. "USS-related sequence" isn't much better. Today at lab meeting one of the grad students suggested using a letter in front of "USS' to indicate which we are referring to: "c-USS" (or "g-USS") for the consensus in the chromosome (or genome), and "r-USS" for the preference of the receptor on the cell surface.
We expect our results to indicate that the preference and the consensus are the same sequence. If so these terms will never need to be used outside of this paper, and it won't matter much if they are ugly and/or cumbersome. But on the off chance that the two sequences turn out to be different, we should at least try to find some terms we can live with.
Idiosyncratic Thinking: a computer heuristics lecture
18 hours ago in Doc Madhattan