I've finished analyzing the Gibbs motif results and converting them to logos. Now I have to decide what they mean.
For now I'll just consider the analyses of the Neisseria genomes. Almost all the searches were run the same way; I'll leave the exceptions to later.
1. Searches gave a lot more sites than I had expected, and the consensus were weaker than I had expected. These two effects are related, as the additional sites the search found were quite weak matches, and these are responsible for the weak consensus. You can see the two effects in the figure at the left, which shows two searches on the same genome (the top one was started with a prior file suggesting base frequencies, so they're not perfectly comparable).
So the scientific issue is: do the Gibbs searches have built-in biases or other effects that cause them to pick up more poor matches in these short motifs than they do with the longer Pasteurellacean motifs? Or are they finding more sites because the motif distribution is indeed different? For example, maybe the specificity of the uptake machinery is broader or maybe biased uptake has been acting for a shorter time. One test is to run some searches that 'expect' fewer sites, and see if they too find this many. Those are running (or at least queue'd). Another test is to run searches for a similarly short motif on one or more Pasteurellaceae genomes, and see if they find a lot more sites. I'll queue those now.