Whence Gibbs?

I successfully worked out how to command the Gibbs Motif Sampler to analyze the new genome sequences. I've only done it for two of them, because a better option has appeared.

A new version of the Gibbs motif software is available. It gives the option of using a 'centroid' sampling method that combines the best sites found in different runs (runs initiated with different random-number seeds), rather than simply taking all the sites identified in the run that had the best score. This has the big advantage of eliminating most of the weakly-matched 'false positive' sites.

It took me a few days to work out how to get it running on the computer cluster (the helpful administrator reset some permissions for me). The new release includes a version that runs in the Mac terminal, and I now have that working too. But it didn't take long to discover that it runs about 100-fold (no, I'm not exaggerating) slower than the usual (non-centroid) version. This means that a good run analyzing a whole genome would take several weeks (or more?); getting rid of the false positives isn't worth that big an investment.

But the very helpful Gibbs expert has again offered to help - he says the centroid version shouldn't be slower at all. So I've sent him the test file I've been using (2% of the genome) plus examples of the output I get. He's going to see if he can find the problem and fix it.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS