Our computer simulation model of USS evolution is coming together, but we've hit a snag. I'm hoping that trying to explain it here will give me a clearer understanding of how to get around it.
A critical step in the model is scoring each simulated DNA fragment for USSlike sequences; the score then determines the probability that this fragment will replace its homologous sequence in the evolving genome. Our basic plan is to use a 'sliding window' the width of the USS motif we're using. The window would begin at position 1 of the fragment sequence, score the sequence in the window, and then move over one position, score again, and continue until it reached the end of the fragment. The final score would be the sum of the scores of the sequences at each window position. I expected this to be time consuming ('computationally intensive') but straightforward, but I was wrong.
The simplest scoring scheme would be to just check, at each window position, whether the sequence in the window exactly matches the USS consensus. (For simplicity here I'll consider only the standard 9bp USS core AAGTGCGGT, but I want the model to consider the full 29bp USS motif.) If the sequence in the window exactly matches the core we add 1 to the running score, otherwise we don't. The final score would then tell us the number of perfectmatch USSs in the fragment. How the model would use this number to decide the probability of uptake is still to be decided, but I'll defer this problem to another post.
A slightly more subtle scheme would give partial scores for window positions containing sequences that nearly match the perfect USS core consensus. For example, any sequence that was mismatched at only 1 of the 9 positions could score 0.3, and any that were mismatched at 2 positions would score 0.1. Worse mismatches would score 0.
Both of the above simple schemes would work, but they do so because they ignore differences in the importances of different components of the USS, and in the tolerance for specific alternative vases at different positions. I had been planning to use a scoring matrix giving a value to each base at each position. Here's a very simple version of such a matrix, with the preferred base at each position scoring 1 and the other bases scoring 0. With this matrix a prefect USS core in the window would score 9, a singly mismatched USS would score 8, a doubly mismatched USS would score 7, etc.
Even a sequence that matched the USS consensus at only 1 position would get a nonzero score  it would be 1/9 of the score of a perfect USS. This creates two kinds of problems. First, we want only reasonably USSlike sequences to get significant scores, but under this scheme a random 9bp sequence will, on average, get a score of 2.5. Second, because the window evaluates 1000 positions in a 1kb sequence, the average score of a random sequence will be about 2500, and the average score of a fragment containing a perfect USS will be about 2506.5. The scoring scheme thus is far too weak in its ability to discriminate between fragments with and without good matches to the USS. A bit of thinking shows that it doesn't matter how big or small we make the individual numbers in the matrix.
But maybe I see the solution. What if only the best base at each position has a positive value in the scoring matrix, and the other bases have negative values? We'd want to adjust the negative values so that the average random sequence would get a score of zero. I still see lots of potential problems here, but maybe this is the way to go.
Added a bit later: There's a much simpler solution. Use any matrix we like, and just calculate the average score expected for random sequences, and subtract this from the actual score for each window position. The calculation is simple  just add up all the fractional scores for the different bases, remembering to correct for base composition. And rather than doing this correction at each window position, do it after the window has completed scanning all the positions in the fragment (subtract the product of the expected average score for each window position times the number of window positions scored).
For random sequences this should give fragment scores centered on zero. I don't know how broad the distribution would be, so I don't know how strongly a single USS would shift the distribution. We would like it to move the score out beyond almost all of the random fragments.
One other problem is what to do with negative scores. For random sequences these should be as common as positive scores, but if we simply treat them as zero scores then the effective average score will be half of the mean of the positive scores. Maybe we should subtract a number bigger than the expected average score, and treat all negative scores as zero.
 Home
 Angry by Choice
 Catalogue of Organisms
 Chinleana
 Doc Madhattan
 Games with Words
 Genomics, Medicine, and Pseudoscience
 History of Geology
 Moss Plants and More
 Pleiotropy
 Plektix
 RRResearch
 Skeptic Wonder
 The Culture of Chemistry
 The Curious Wavefunction
 The Phytophactor
 The View from a Microbiologist
 Variety of Life
Field of Science





Trump to appoint nonscientist as chief scientist of USDA1 week ago in Genomics, Medicine, and Pseudoscience








Harnessing innate immunity to cure HIV9 months ago in Rule of 6ix

WE MOVED!10 months ago in Games with Words







post doc job opportunity on ribosome biochemistry!2 years ago in Protein Evolution and Other Musings

Growing the kidney: reblogged from Science Bitez2 years ago in The View from a Microbiologist

Blogging Microbes Communicating Microbiology to Netizens2 years ago in Memoirs of a Defective Brain



The Lure of the Obscure? Guest Post by Frank Stahl4 years ago in Sex, Genes & Evolution


Lab Rat Moving House5 years ago in Life of a Lab Rat

Goodbye FoS, thanks for all the laughs5 years ago in Disease Prone


Slideshow of NASA's StardustNExT Mission Comet Tempel 1 Flyby6 years ago in The Large Picture Blog

in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
No comments:
Post a Comment
Markup Key:
 <b>bold</b> = bold
 <i>italic</i> = italic
 <a href="http://www.fieldofscience.com/">FoS</a> = FoS