Field of Science

Estimating equilibrium with noisy data

I'm trying to find a better way to identify an 'equilibrium' genome score for our simulations of uptake sequence evolution. Ideally I would just let the runs go until the score was stable, but two factors complicate this. First, the scores are very noisy due to genetic drift, so there is no stable core. In principle this can be smoothed out by taking the mean score over a sufficiently long interval, but in practice the runs take a long time.

I have been working on complicated averaging strategies to deal with this, but for many runs I thought I could just stop trying to be economical of computer time. So I queue'd up a few very long runs, but this didn't really solve the problem, because the mean scores just drift up and down.

Finally I tried just plotting the 'grand mean' scores of the up-from-random-sequence-genome and down-from-seeded-sequence-genome runs ('up' and 'down' runs) on the same graph, and taking the average of the final values. The 'grand mean' score is the geometric mean of the genome scores at every cycle from the start of the run. For up runs it's a underestimate of the recent mean, and for the down runs it's an overestimate, but in both cases its much smoother than the recent mean. I'll paste a graph in later, but these give what look like pretty good stable estimates of equilibrium values. So I think I'll just state in the methods that runs were continued until the up and down grand mean scores differed by no more than two-fold, and then the final scores were averaged to give an equilibrium score.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS