Field of Science

Reading code

Our Perl-programming undergrad has just sent me a copy of the latest version of his program simulation the evolution of uptake sequences by molecular drive.  So far I've gotten to about line 100 and found several trivial typos and one source of confusion (to me).  I had thought that the order of steps in each cycle was as follows:
  1. Choose random fragments of a specified length from genome and mutate them (as if they came from different daughter cells).
  2. Score each fragment for its match to the USS consensus.
  3. Mutate the original genome according to the same procedure used on the fragments.
  4. Replace the corresponding segment of the genome with the fragment that has the best USS score.
But the standard version of this code seems to instead do the following:
  1. Mutate the whole genome.
  2. Chose random fragments and mutate them (again).
  3. Score each fragment for its match to the USS consensus.
  4. Replace the corresponding segment of the genome with the fragment that has the best USS score.
So the fragments are getting mutated twice.

In actuality, this 'test' version of the code has a couple of steps commented out, and short-circuits the fragment-generation and mutation steps by simply specifying the sequence of every fragment (as a perfect USS).  I think this makes it a lot easier to confirm the the code that does the scoring is working as intended.  Tomorrow I'll sit down with the undergrad and go over it.

4 comments:

  1. Readability is vital, especially when code is shared around. Perl is a bit notorious for being hard to read, but this can be improved by good commenting, or inline documentation (to be read using "perldoc myprogram").

    Make sure your student understands the value of documentation. They should also commit the code to a repository using a version control system such as SVN; this will make them more disciplined and be useful for others who may need to read, modify or use it.

    Be prepared also for a bunch of people telling you that an advantage of Python is its readability :)

    ReplyDelete
  2. I agree wholeheartedly with neil's advice. Furthermore, while your program remains a work in progress there are other things you can do to ensure it does what you intend (and nothing else!)

    As it's Perl I'm assuming that you do use strict (and thus those typos that you mentioned could not be typos in variable names!)

    Aim to write a program that is broken down into functions that each do well-defined operations e.g.

    choose_fragment($genome, $frag_length)

    mutate_fragment($frag, $mut_parameters)

    insert_fragment($frag, $genome, $position)

    and so on.

    All these functions should have a test that confirms that they do their job e.g. test_choose_fragment might test that choose_fragment returns fragments of the expected size, conforming to an expected distribution of genomic positions.

    Every time a significant bug is fixed a new test should be added that specifically confirms this by testing the condition that exposed the original error.

    This sounds like extra work, but it will save you time in the long run. You will have confidence in the correctness of your program, freeing you to work on the biological question. With testing, your program can be modified more easily - just run all the tests after any changes.

    Initially, don't worry about performance, rather focus on clarity and correctness.

    ReplyDelete
  3. Have you guys been hanging around all term, waiting for me to drag myself away from my teaching long enough to post something? Or do you have a feed thing, so you know when I post?

    I like the idea of building in tests. The undergraduate has been doing tests, but not in an organized way and not building them into the code or writing the results up in a notebook.

    He's not bad about including comments, but they're not as clear and unambiguous as they need to be for me. He's been having to work without proper oversight, but that's about to change.

    I find it much easier to be disciplined about bench-work than about computer work. He hasn't learned this kind of discipline yet.

    The program is broken down into components, but not as clearly as it could be. But he is using Strict; the typos are only in comments.

    ReplyDelete
  4. @rosie yeah, we have feed things :)

    You're in the "open science" category of my Google Reader. The web is always watching and waiting.

    ReplyDelete

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS