The program uses a random-number 'seed' to start searching the genome sequence for sequences whose consensus motif fits this pattern. ... Once the score is stable for a 'plateau' number of cycles the search ends and the program stores the final set of sequences that gave this pattern and goes on to try another random number seed.Well. the seeds are supposed to be random numbers. The program uses them to get approximately random starting patterns, from some of which it converge on very similar final motif patterns. In any one run only a few of any 100 seeds lead to any approximation of the USS pattern, with the rest stuck muddling around in poorly-matched pseudo-patterns. But of course starting with identical seeds will give identical outcomes
This morning I noticed that two replicate runs that had been put into the computer queue within a few seconds of each other had given exactly the same results for all 100 seeds they claimed to have used. The program doesn't report each of the seeds used, but it does give you the initial seed it started the whole run with. Surprise surprise, the identical runs had started with identical 10-digit seeds (1162169861). And the next run queue's had the number 1162169821, differing only in its second-last digit. And the next two runs also had identical seeds (1162169911) Not only are these numbers much too similar to be considered 'random' the identity of all the results implies that each of the other 99 seeds is completely determined by the first seed.
I had assumed that any respectable computer program would draw whatever random numbers it needed from the random-number generator that's built in to most programming language compilers. (That's what our Perl programs do.) These actually generate pseudo-random numbers, and a great deal of analysis has gone into designing and evaluating the algorithms they use (they're also used in computer-security applications).
But I was wrong. I just went back to the original program notes, and found that one of the many settings users can specify is a 'random number generator seed'. This feature isn't discussed anywhere in the notes, and none of the examples provided in the notes use it. But maybe it prevents exactly the problem I'm having.
I've sent an email to my helpful advisor, and in the meantime I've queue'd up substitute runs, spaced far enough apart in time that they should have distinct seeds.