The postdoc is back and we are driving each other nuts with ideas for more analyses (both of us), more analyses (him), and requests to stop analyzing the bloody data and finish writing the damned paper (me). Just now I found myself thinking "If only we had less data...".
One simple control analysis we really did need was using his USS-scoring matrices to score some simulated genomes (random-sequence strings of the same length and base composition as the H. influenzae genome). These are controls for the analysis I wrote about here. He's done these now, and they nicely show that both scoring motifs see the bulk of the genome as no different from random sequence, and that the ~200 high-scoring positions they both find are not found in random-sequence 'genomes'.
Metereca: Crossing the Divide
23 hours ago in Catalogue of Organisms