In the old days, computer programs ran on big centrally located computers that belonged to universities or big corporations, not on personal desktops or laptops. It just occurred to me that this might still be possible. I'd gladly pay a modest sum to run a few of my simulations on something that was, say, 10-100 times faster than the MacBook Pro I'm typing this on. I tried googling "buy time on fast computer" and other permutations, but couldn't find anything (lots of "When is it time to buy a new fast computer" pages).
I think that there must be places to do this. Perhaps one of my occasional readers will know. But, in case you don't, I'm going to send an email to the evoldir list, which reaches thousands of evolutionary biologists.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development1 week ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.1 week ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
11 comments:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
Talk to your computer science department. I'd be amazed if there's no high-performance computing cluster there. It may not be 100 times as fast as your laptop but there's probably some system you have access to that offers a 10-fold speedup.
ReplyDeleteHi Rosie,
ReplyDeleteThere are services such as Amazon's ec2 (Elastic Compute Cloud) that let you run programs on large numbers of virtual computers where you pay for the units of compute that you use.
Given the tiny data sizes that you describe, it sounds like there is some intensive compute going on somewhere. If you have the resources, I would take another look at the way the algorithm is implemented. Can it be improved by changing the data structures or tricks such as memoization? If that seems fine, you may be pushing the limits of plain Perl's suitability for your problem. In that case, other avenues could include looking at specialist Perl modules such as PDL (Perl Data Language).
The terms you are looking for are "on demand computing" or "utility computing". Or, potentially, "supercomputing".
ReplyDeleteFor most of these suggestions, you will likely have to spend time reimplementing your software to use the new system, which I am sure you do not want to spend too much time doing.
With Amazon's EC2 service, you get access to high end computers 1 at a time, billed for only the time that you use. Each computer won't be any faster than yours, but you will be able to get access to lots of computers for only the time you need. So, this system would probably be best if you want to run a large number of experiments in parallel (maybe changing variable in between instances). I would bet frequent commenter Deepak Singh could point you in useful directions: http://www.linkedin.com/in/dsingh
The Sun Grid at network.com looks like it is able to run perl program, but I'm not sure if it is faster:
http://biowiki.org/SunGridEngineExamples
We have a bioinformatics cluster that you're welcome to test drive.
ReplyDeleteCheck your email :^>
Rosie - you might be interested in checking out WestGrid [ http://www.westgrid.ca/ ], in particular its UBC node aka Glacier [ http://www.westgrid.ca/support/quickstart/glacier ] which is suitable for computationally intensive serial jobs. Although I have never used it to run Perl, it being a off-the-shelf Linux cluster I would expect it to have Perl installed by default. Also, it's free!
ReplyDeleteWow, 5 comments already!
ReplyDeleteI've used WestGrid (a cluster shared between several nearby universities) for other parts of this project (e.g. most of the Gibbs analyses). But their individual computers are VERY SLOW, and rewriting our program to use the grid feature would be an absurd waste of time for the small amount of work I want to do now.
The program is computationally intensive. In each cycle it does a bit of analysis on some short sequences, checks the results, changes a parameter a bit, and does the analysis on some more sequences. It keeps doing this until a requirement is satisfied; sometimes this takes hundreds of iterations for a single cycle. And I want to run at least 100,000 cycles.
The nature of the program also means that cloud computing isn't an option. I'm simulating evolution, so each cycle takes as input the results of the previous cycle.
Rob, thanks for the offer. I'll send you the program and the settings file; maybe you can see if it runs on your system, and check how long a simple run takes.
i don't know enough about this to be much help, but SFU has a beowulf cluster.
ReplyDeleteI shared your post on FriendFeed:
ReplyDeletehttp://friendfeed.com/the-life-scientists/c7309267/rrresearch-can-we-buy-time-on-shared-computer
The Life Scientists room in particular has become quite the community hub, and a good place to ask for advice of all sorts.
I don't know how your perl script is setup or written, but there are many limiting factors here.
ReplyDeleteOne is, if as you say that all 100,000 cycles are dependent on the previous one, running your script on a larger machine won't necessary save you running time. Say your MacBook is a 2.3 GHz machine, even if you run on a cluster with 3.0 Xeons you won't be able to gain much, maybe 5-10% depending on how optimized is your code. Your script, if not threaded or parallel won't take advantage of the faster machine.
Two, as mentioned above, if it's not threaded or parallel, nothing will be accomplished in a cluster without some modifications. You will need some changes in the calculations and find a way to distribute them to other machines. Either some external application, such as Hadoop or MapReduce (again, I have no idea of how your code is set), or some major code change in your script.
Three, it's Perl. It's an interpreted language, so the code is not compiled, hence some gain you might have running on a faster CPU might be nullified by running on a interpreter that is not optimized to the environment you're running. Simulations and CPU-intense applications like the one you running are better suited with compiled languages. You may be able to create some C++ code that will do the calculations, and wrap it with Perl script that will take care of the string parsing, etc. Again, it will depend on how your code was designed.
Four, there are many DNA sequence simulators available. One is DAWG that is a wonderful application and fast. Can be compiled on Macs and gives you a lot of features. It might have some of the things you're currently doing.
Five, I don't know if you already did, but it would help a lot if you posted your code. I understand you preach OS, but that's half of the mass. Either Science is open, or not. Without your code, we cannot help, because we don't know what you're doing with it. That would also help you, as you will be able to define more precisely what type of help you want.
Paulo
..
ReplyDeleteTo everyone who suggested modifying my code, thanks, but I'm not competent to make any serious changes to it, and not prepared to learn how at present. Maybe later, after this manuscript is done.
ReplyDeleteNow I realize that mainframe computers no longer exist I see two alternatives. 1. I could decide that I don't need to do the very long runs I was considering. 2. I could break up the genome I would be simulating into many short genomes and run them all on the WestGrid machines.