RRResearch: How random is it?

My first Gibbs sampler run was 'out of time' at 12hr, but the next one finished in 17hr. I used its output to evaluate whether reducing the stringency of the plateau period would reduce the quality of the analysis. The result: future runs will be about 40% faster because they'll use a plateau of 100 cycles rather than one of 200 cycles. I already have some whole-genome results using 100 seeds and 100 cycles, so I've queue'd up enough more to give me four with the forward-direction genome sequence and four with the reverse-complement sequence. These should provide enough data for all the whole-genome analyses.

I now realize there's another analysis I should try to do - testing the randomness of the positions of USSs around the genome. This is an interesting feature because USS spacing should reflect the forces that maintain all these USSs in the genome.

USS spacing was first addressed in the first genome USS analysis (Smith et al 1995), but they only said it was 'essentially random'. The human eye is notoriously bad at detecting randomness, so Karlin et al. took a much more rigorous approach (Karlin is a famous Stanford mathematician), calculating something called the r-scan statistic, which he developed and which looks too hard for me to follow. Karlin et al concluded that USS spacings were more even than expected for a randomly-located sequence element. This non-randomness led the authors to suggest that USSs might

contribute to global genomic activities such as replication and repair (the DNA repair hypothesis), sites of membrane attachments in association with domain loops, sites of nucleating Okazaki fragments or helix unwinding and/or sites contribution to genome packaging. (Yes, the syntax seems a bit off to me too.)

I think these suggestions are wrong, for reasons I'll go into another time, but the lack of randomness may still be telling us something important about the forces that maintain USS.

Karlin et al.'s analysis used only the positions of perfect USS cores (AAGTGCGGT and reverse complement). I think I should now repeat it on my new unbiased USS data. Well, really what I mean is that I think I should either find a tame mathematician/statistician who can show me how to do it, or find a similar analysis that's easier for me to understand. (Hmmm, I think my neighbour at a lunch on Thursday was a bioinformatics statistician - maybe she can give me some advice.)

Field of Science

RRResearch

How random is it?

No comments:

Post a Comment