I've spend the last 48 hours watching simulations of uptake sequence evolution run (very slowly) and fiddling with code and queue-submission files and settings, but now it all seems to be falling into place.
First the former post-doc said that she thought the int was indeed an error, and that it probably arose when the coding-expert undergrad copied the code from another probabilistic step to the recombination step. She also made the excellent suggestion that we use the additive matrix in the US-variation paper.
Now that the erroneous int has been eliminated from the simulation code, runs with additive scoring do indeed work. So do runs with the less-extreme tenfold bias, and runs using the Gibbs position weight matrices derived from the H. influenzae and N. meningitidis genomes. None of this give as strong accumulation as the 100-fold bias matrix I have been using, but that's OK.
Testing the additive scoring led to several other improvements. I increased the resolution of the recombination decision by introducing a lower-cutoff for the random number range, as I mentioned in the previous post. I also found that when using the additive scoring we no longer needed to have the bias gradually decrease during each cycle, but could just leave it fixed. And I rediscovered a problem with scoring the genome additively. [Aside: scoring is done in two situations - first to decide how well the different positions in an individual fragment correspond to the matrix, so a decision about recombination can be made, and second to track the accumulation of preferred sequences in the whole genome, to follow the progress of the simulation.] When individual fragments are scored, only the highest position-score is used, and the scores of all other positions are discarded. But when the genome is scored we add up the scores of all the positions in the genome. With multiplicative scoring the low scores are many orders of magnitude lower than the good scores, so including them in the sum has little effect. But with additive scoring even really lousy matches to the matrix have significant scores, and the sum of these swamps out the effect of the rarer good scores. The solution to this problem is to only use additive scoring for the individual fragments, and to always score the whole genomes by using the matrix multiplicatively.
I also spent some time relearning how to run the simulation on the computer cluster in Nova Scotia that a colleague kindly gave me access to, and modifying the program to run properly in this different situation (just trivial fiddly stuff). I kept queueing runs on this system, thinking they were running OK, and then looking at the error file )"just in case" and finding that they had aborted at the start because forgot to specify a file, or specified it wrong, or specified it in the wrong order. But now I have about 15 runs going overnight!
My new plan for this part of the US-variation manuscript (it's most of the Results) is to start with the additively scored matrix, both because that's how position-weight matrices are commonly used in other situations and because this means we don't have to explain about reducing the bias during the cycles. I'll start with the 100-fold matrix, and show how we detect equilibrium, and that genome length doesn't affect the outcome (score per kb). Then I'll explain about mutation rates (genomic and fragment-specific). The preliminary runs I did today indicated that the mutation rate of the fragments doesn't affect the equilibrium score, which I continue to find surprising. I had suspected that this was somehow due to using the bias-reduction feature with the multiplicative scoring, but now I see it with additive scoring and fixed bias. But I have a more detailed set of runs going overnight which should make the situation clearer. Then I'll consider how the amount of recombination (amount of genome recombined in each cycle) affects the outcome, and the effect of fragment length on number and spacing of uptake sequences
Then I'll consider the effects of different matrices, first the ten-fold matrix I tried today, and then the multiplicatively scored 100-fold and ten-fold matrices, and then the Gibbs matrices. This last step links the model back to the Gibbs analyses that the Results starts with.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
-
Political pollsters are pretending they know what's happening. They don't.5 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections6 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
No comments:
Post a Comment
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS