The Defining-the-USS paper needs one last analysis. I wrote earlier that the set of sites most likely to reveal the uptake bias is those that are neither in genes nor likely to function as transcriptional terminators. I have this set (490 sites) and I've made a logo (here).
I also did a MatrixPlot analysis to look for covariation between bases at different positions. (I've posted about using MatrixPlot to analyze the whole-genome set here.) It showed stronger interactions than those seen in the whole-genome set. But I'm left with two unresolved issues.
First, I don't know how strong the covariation is, because MatrixPlot's statistical underpinnings aren't explained (or aren't explained in a way that's accessible to my statistically untrained mind). I can take care of this by doing a control analysis, using the same number of input sequences taken from random positions in the genome - this control is on hold because the MatrixPlot server is down. (I guess really this control should use random intergenic positions - I could do that.) This is a valid control even though it doesn't use any statistics.
Second, when MatrixPlot detects covariation between two positions it doesn't tell me which combinations bases are found together more often than expected. For example, it usually reports that the base at position 18 (in the first flanking segment) is strongly correlated with the base at position 19. Both A and T are common at both of these positions, but MatrixPlot doesn't tell me whether the significant combinations are AA or AT or TA or TT.
My colleague Steve Schaeffer's linkage disequilibrium program will do that, so I'm about to email him and ask him to run the unconstrained (non-coding non-terminator) and control sets for us. But I want to get the control MatrixPlot results first, so I can more clearly explain what we think is going on.
Metereca: Crossing the Divide
26 minutes ago in Catalogue of Organisms