My new binder now has notes and graphs for both the Perl simulation results and the Gibbs motif sampler analysis.
The new N. meningitidis Gibbs searches specifying small numbers of 'expected' occurrences succeeded in finding the DUS motif. The searches expecting 200 found about 2500 occurrences, and those expecting 500 found about 2600. I've now queue'd up some searches expecting more (1000, 1500 and 2000), only because my analyses of other genomes1have used 1.5 times the number of perfect cores, and for N. meningitidis this needs 2809 occurrences.
The motifs Gibbs found from a genome sequence with the RS3 repeats removed are very similar to those from the full genome sequence. The 'residue frequencies', expressed as percent of each base at each position, are identical, and the 'motif probability models', expressed as probabilities to three decimal places, differ by no more than 0.001. This means that the RS3 repeats are not perturbing the results of whole-genome searches, but they may be perturbing the intergenic-sequence searches. I ran a bunch of intergenic searches last night, with and without RS3s removed. Some of these runs ran into 'segmentation fault' errors, which I remember dealing with a couple of years ago, and I haven't analyzed the outputs yet. I'll try to do that quickly because I really need to focus on the Perl simulation work today.
Update: I had two OK searches with the RS3 repeats and one without. They found nearly identical numbers of occurrences (1570, 1573 and 1572). I made logos of one of each type and they are indistinguishable, so I can conclude that RS3 repeats don't affect the searches of intergenic sequences, at least when the search is expecting a small number of repeats.
The analyses of N. meningitidis coding sequences also showed that it's better to expect an unreasonably low number. With exp=1500, the searches found about 6500, but most were not DUS and the motif was only strong for four of the 12 positions. But with exp=100 the replicate searches found 900 and 901 occurrences, and these gave a very strong DUS-type motif.
Leroy Hood and the tool-driven revolution in biology
1 day ago in The Curious Wavefunction