Field of Science

Gibbs can't find the DUS

Well, my four Gibbs searches of the whole genome (with and without the RS3 repeats) hit the 36 hour wall when they were only about 1/3 of the way through their 100 seeds (= 100 replicate searches).  And, judging by the scores reported for the results with each seed, none of them found the DUS.

So I went back to old blog posts, to see whether this is a problem I had already solved.  (Yes, I know that's pathetic, I should be able to remember what I've learned, but the ability to find forgotten results in blog posts is one of the big benefits of blogging about my research.)  In this post I considered using a prior that specifies the motif, but decided to instead seed the search with a few hundred bp enriched for the DUS and later remove these occurrences from the output.  I'm not really sure that this is the best approach so maybe I'll try both now (using only 25 seeds and specifying a 48 hr walltime.  

I'll queue up a lot of combinations of priors (just-length and base-frequencies), numbers of expected occurrences (200 and 2838), and seeded and unseeded sequences, but I'll do this only for the sequences with the RS3 repeats removed.  That's because I already have results from sequences with the repeats, and the purpose of these new runs is just to find out whether removing the repeats makes a difference.  If  one or more of the runs succeeds in finding the DUS, I'll do the same run with the sequence with the repeats and see if the motifs differ.

I'd rather find out why Gibbs can't find the DUS but can find the less frequent and more complex USS with no trouble.  But I don't have any ideas.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS