OK, so I got to my office this morning all enthusiastic to do the additional runs that would clarify why N. meningitidis has twice as many forward-orientation DUSs in the strand synthesized discontinuously. I did three different things, all of which confirmed that the two-fold difference was just an aberration in the Gibbs analysis.
First, I plotted the distribution of forward-DUSs along both strands of the genome (yesterday I only had time to do it for the reverse-complement strand). This clearly showed that the two strands are the same-- the blue ticks in the figure below (just a close-up of part of the figure) are DUSs on the forward strand, and the red ones are DUSs on the reverse-complement strand.
Second, I completed the control analysis I had to interupt yesterday. This analyzed the reverse-complements of the 'leading' and 'lagging' sequences I had assembled yesterday. It was a way of repeating the analysis on different sequences that had the same information content. Result: very similar numbers of DUSs in both.
Third, I assembled new 'leading' and 'lagging' sequences, using our SplitSequence.pl script to efficiently find the midpoints I'm using as surrogate termini, then reran the Gibbs analysis on these. Result: very similar numbers of DUSs in both, and these DUSs gave effectively identical logos.
So I went back and examined the Gibbs output that had had twice as many DUSs as the others. For unknown reasons, both replicate runs had settled on less-highly specified motifs, and thus included a lot more poorly matched sites in their output. Well, at least I can now very confidently report that there is no direction-of-replication bias in N. meningitidis DUSs.
2016 Nobel Prize predictions
15 hours ago in The Curious Wavefunction