I have to chop a 200 kb file into 20 kb pieces, because the USS position weight matrix I'm using (derived from Gibbs analysis of the H. influenzae genome) is so fastidious (???) that runs take forever. Specifically a 200 kb simulation that's using a pre-evolved sequence with quite a few uptake sequences already in it has taken 28 days to complete about 3300 cycles and it's about to exceed its pre-specified time limit (800 hours, about 33 days) and be terminated before it finishes. Terminating prematurely means that it won't report the sequence it has to painstakingly evolved. And I had even given it a tenfold higher mutation rate to help it run fast!
Anyway, my clumsy solution was to chop the 200 kb input sequence into ten 20 kb segments, and evolve them all in parallel. Because Word is good with work counts, I opened the sequence file (as a text file) in Word and marked off every 20 kb with a couple of line breaks. Then I opened the file in Textedit and deleted everything except the last 20 kb to get a test file (no line breaks at all, that I could see). But it generated an 'unrecognized base' error message when I tried to use it, so my first suspicion was that Word had somehow generated a non-Unix line break.
Sure enough, opening the file in Komodo showed that it had. But surprisingly, the problem wasn't a Mac-style line break, but a DOS/Windows line break! Maybe Word 2008 thinks all .txt files are for Windows?
Field of Science
-
-
-
-
-
-
-
-
The Even Earlier Discovery of Antibiotic Resistance2 days ago in Memoirs of a Defective Brain
-
Religion is halfway between a fact and an opinion - according to kids and adults3 days ago in Epiphenom
-
Bioengineers go retro to build a calculator from living cells4 days ago in The Allotrope
-
-
A New Non-mammaliaform Eucynodont from the Ischigualasto Formation of Argentina1 week ago in Chinleana
-
-
Chemistry, fluid dynamics and an awful radioactive mess1 week ago in The Curious Wavefunction
-
Exploding expertise2 weeks ago in The Culture of Chemistry
-
-
-
-
-
-
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl11 months ago in Sex, Genes & Evolution
-
-
Finding a new translation factor, and verifying it with help from my experimental friends1 year ago in Protein Evolution and Other Musings
-
Free ImageJ Macro -- for citing images1 year ago in Skeptic Wonder
-
-
-
The Large Picture Blog Has Moved1 year ago in The Large Picture Blog
-
Lab Rat Moving House1 year ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs1 year ago in Disease Prone
-
Branson getting into microbial diversity in the deep sea2 years ago in The Greenhouse
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
1 comment:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
"Terminating prematurely means that it won't report the sequence". Can you run this simulation on open software? If so there's probably a cluster of Linux computers somewhere you can use to run big simulations. Your problem is like chip designers face -- sometimes they can switch to open tools to get results, sometimes it needs a proprietary SW tool.
ReplyDelete64 bit computers like AMD's Opteron help some with the proprietary SW running at good speed on single computers.