Finally I'm moving on to reanalyzing the old data. I posted about this back in August (New bottles for old wine?) but I'm only now getting back to it. The best old data is a set of 28 short DNA sequences of plasmid inserts that were preferentially taken up by competent H. influenzae cells., and the amounts of each that the cells took up. I can do two things with this data.
First, I can use the Gibbs motif sampler to search it for USS-like patterns. This provides one direct estimation of the bias of the uptake machinery. Here are some replicate results. All of these searches used the same dataset, but random differences in the search runs produced different datasets, which produced different logos. I haven't gone back and compared the outputs to see how many of the same sequences they found, but that will only take a few minutes to do, using Unix's 'diff' command.
Second, I can find out how well the uptake of these sequences is predicted by their degree of match to the genomic USS pattern. I know that I can use a program called PatSer (for Pattern Search, I guess) on the RSA Tools site to do this. It constructs a scoring matrix and then scores sequences you give it. The matrix will be constructed from the probability matrix that the Gibbs searches produce, but I need to ask one of the graduate students to help me do this. Once I have the scores, I'll plot a graph of molecules taken up as a function of PatSer score - a strong correlation would support the hypothesis that biased uptake is responsible for the accumulation of USS in the genome.
John Keats's "Chapman's Homer" (chemistry and drug discovery version)
9 hours ago in The Curious Wavefunction