Yesterday the very helpful guy who wrote the motif-search program I'm using sent me an improved version that eliminated the segmentation fault I'd been experiencing. Now the program will analyze a file containing the whole H. influenzae genome. He also sent advice on how to improve its accuracy by having it do many more trials, but I'm going to let this wait until we've solved the remaining problem.
That problem is the program's reluctance to fragment the motifs it finds (illustrated by the two logo images in the segmentation fault post). Because the program is optimized for finding the relatively compact motifs typical of sites where regulatory proteins bind, it prefers central positions with weak consensus over distant positions with strong consensuses.
The expert sent me instructions for specifying a "fragmentation mask" to overcome this. The mask is a string of numbers that specify the pattern of significant and nonsignificant positions in the desired motif. For example, 1110000111 specifies a motif with three significant positions on either side of four nonsignificant positions.
If I understand them correctly, masks can be used two ways, as starting suggestions or as strict rules. A mask with only zeros and ones is used as a suggestion, telling the program "Start with a motif that matches this pattern, but if you find a better pattern you can ignore the mask." A mask containing "3"s is used as a rule, with each 3 specifying a position that must remain nonsignificant in the final motif. For example, 1113333111 specifies a motif that must have four nonsignificant positions.
I'm trying to use a mask as a rule (with "3"s). But I suspect that I don't understand masks correctly, because the program reads my mask file but ignores the instructions in it. I have enough programming experience to know that this almost certainly means that my instructions are set up wrong. I've tried making the rule very simple (rather than the kind of rule I would use to force a USS-like fragmentation pattern), but even that doesn't work.
So I've emailed the helpful expert asking for more advice.
Camponotus: A Sugary High
2 hours ago in Catalogue of Organisms