I now realize there's another analysis I should try to do - testing the randomness of the positions of USSs around the genome. This is an interesting feature because USS spacing should reflect the forces that maintain all these USSs in the genome.
USS spacing was first addressed in the first genome USS analysis (Smith et al 1995), but they only said it was 'essentially random'. The human eye is notoriously bad at detecting randomness, so Karlin et al. took a much more rigorous approach (Karlin is a famous Stanford mathematician), calculating something called the r-scan statistic, which he developed and which looks too hard for me to follow. Karlin et al concluded that USS spacings were more even than expected for a randomly-located sequence element. This non-randomness led the authors to suggest that USSs might
contribute to global genomic activities such as replication and repair (the DNA repair hypothesis), sites of membrane attachments in association with domain loops, sites of nucleating Okazaki fragments or helix unwinding and/or sites contribution to genome packaging. (Yes, the syntax seems a bit off to me too.)I think these suggestions are wrong, for reasons I'll go into another time, but the lack of randomness may still be telling us something important about the forces that maintain USS.
Karlin et al.'s analysis used only the positions of perfect USS cores (AAGTGCGGT and reverse complement). I think I should now repeat it on my new unbiased USS data. Well, really what I mean is that I think I should either find a tame mathematician/statistician who can show me how to do it, or find a similar analysis that's easier for me to understand. (Hmmm, I think my neighbour at a lunch on Thursday was a bioinformatics statistician - maybe she can give me some advice.)