1. While the post-doc and I were fussing with the Perl simulation, figuring out the best settings for scoring sequences with the Gibbs position-weight matrix, we also tested the effect of calculating the score in each window of the sequence by adding the scores of the individual positions in the window, rather than by multiplying them. In fact the original versions of the simulation did add the positions cores, but we changed that because it didn't seem to give enough resolution (not enough difference in score between what our experiments had told us were good and bad uptake sequences).
Below I think I'll be able to explain how to get better resolution, but when the post-doc and I were discussing the additive and multiplicative versions we realized that of course neither additive nor multiplicative has any chance of being realistic. That's because the proteins that interact with the sequence don't use arithmetic to decide whether to act on a particular sequence. Instead each base at each position will affect the probability of uptake in its own way. Thus the additive and multiplicative ways of applying the matrix should probably be seen as the extreme outliers of how the real interactions might work.
So when we present this analysis in our proposal to NIH, I think we should include the additive case as well as the multiplicative case, with the no-selection sequences as control. The additive case will show only modest selection for fragments containing good matches to the matrix, and the multiplicative version wills how very strong selection. And if we're lucky enough to have some real uptake data by then, we can compare that to the two kinds of simulated data; it should fall somewhere in between. (Ideally I'd put a graphic here showing what I expect these to look like, but I'll leave that for the post-doc to do once he's analyzed the simulated data.)
2. Here's a way to improve the resolution of the additive selection: In the previous post I said that the simple selection method started by choosing a random number from the range between zero and the maximum possible score. But the range doesn't need to start at zero, and for the additive matrix it certainly shouldn't. Instead maybe it should start at the minimum possible score, or at the average score, or at some higher-than average score.
20 hours ago in The Phytophactor