RRResearch: simulation plans?

I'm back in Vancouver, slowly working my way through the three-week backlog of emails that I glanced at and then disregarded. These include lots of suggestions from helpful blog and evoldir readers about how I might be able to speed up my computer simulations of uptake sequence evolution. I'll try to send a group reply to them all, but I'll summarize the ideas here. (Note that this summary is being generated BEFORE I go back and read through all the emails, so it's not the final version.)

My original query had asked about the availability of a 'mainframe' or other fast computer, hoping for something that was at least ten times faster than my MacBook Pro laptop. The first thing I learned from the responses is that mainframes don't exist any more - they've all been replaced by clusters and grids, where large numbers of individually-slow computers are networked together.

Many replies asked if I could convert my code so it would take advantage of one of these clusters. Unfortunately the steps in the simulation are all very sequential; each takes information from the previous step and generates information for the next step. Even if they could be run in parallel, I don't have the computer skills to rewrite the code to take advantage of this. But I realized that I can quite easily do multiple slow runs with small sub-genomes instead of one run with a big genome, and then manually combine the outputs. For my present goals I only need to get data for a few slow runs with big genomes, so doing this manually wouldn't be a big deal (much easier than learning how to automate it). The only concern is edge effects, but I think these will be minor and I can run a test to confirm this.

Some responses pointed me to an Amazon service called 'elastic compute cloud' - I haven't yet gone beyond looking at the front page, but this is certainly worth investigating.

Several people generously offered me time on computers under their control - either personal computers that are somewhat faster than mine or clusters under their control. The local WestGrid computers I could use are three times slower than my laptop, so I'm going to check out one of these I've been given an account on, located on the other side of the country.

But I still need to work out exactly what big slow runs I need data for, and I also need to discuss this with my main co-author (former post-doc). Deciding what needs to be done is the real bottleneck.

Field of Science

RRResearch

simulation plans?

1 comment: