RRResearch: Outline of the perl program

(in response to good advice in the comments)

Below is just a list of the main sections of the program, in its present 'test' incarnation.

MAIN PROGRAM:

1. Get parameter settings from a file (except it doesn't, the settings are hard-coded in this version).

2. Create a random-sequence 'genome' of the specified length and base composition.

3. Simulate a set number of cycles (presently 100), each consisting of genome mutation, fragment creation, mutagenesis and scoring, and recombination.

3A. Mutate the genome by randomly changing bases with a specified probability. (This step should be later in the cycle, not here.)

3B. Select a specified number of segments of the same lengths, from random positions in the genome. This will represent fragments in the external gene pool.

3C. Record each fragment's sequence and 5' end position.

3D. Mutate each fragment's sequence.

3E. Score each fragment's sequence for goodness of match to the uptake sequence motif, using a sliding window. I think the sliding window scores are not being correctly cumulated.

3F. Choose the fragment with the highest score. Put its sequence at the corresponding genome position, replacing the original genome sequence of this fragment. (This is a simulated form of recombination by gene conversion.) Any mutations of this fragment that occurred in step 3D will thus become changes in the genome sequence.

3G. Score the genome for how well its sequence matches the USS motif.

4. At the end of all the specified cycles, stop and report.

-----------------------------------------------------

SUBROUTINES:

I. Creating the original random genome sequence: This is pretty simple; it just picks bases randomly, with probabilities specified by the base composition.

II. Mutating the genome or fragment sequence: This is more complex, partly because the mutations need to maintain the base composition (see subroutine III), but mainly because it does it a relatively non-obvious but more efficient way. It first decides how many mutations to make, by dividing the genome length by the mutation rate and taking the integer value. (Oops, this will only work if the genome or fragment is big enough to get more than one mutation per cycle. The 'test' version has only a 100nt genome and a specified mutation rate of 0.001, so it has a real mutation rate of zero.) The subroutine then randomly chooses positions for this number of mutations, and makes the mutations at these positions.

III. Doing the calculations for the mutagenesis probabilities: This creates arrays holding the mutation probabilities for each base (A or G or C or T) to mutate to each other base.

I've got to stop this and work on my course's final exam for a while.

2 comments:

UnknownApril 17, 2008 at 7:41 PM
Why not post the code? It would be easier to provide some help then.
Paulo
AnonymousApril 18, 2008 at 6:40 PM
I'd second that.

Great thing about perl is that there's more than one way of doing things. Sharing tips, tricks and problems is useful: We could learn as much from you as you do from us (which is why I did this.

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS