I'm going to sort out what data this manuscript needs before I do another thing!
Genome analyses needed: I need to reanalyze the Neisseria meningitidis genome with the Gibbs motif sampler, but not until I've decided whether or not to first remove the copies of the RS3 repeat. I've emailed the person who discovered them, asking him whether he thinks they are insertions or arise in situ like uptake sequences. If the former, I'll use the genome sequence that I've already removed them from. I'll do the analysis on the whole genome, and then on the strands sorted by their direction of replication. I did this before and got weird results; if the same thing happens this time I'll investigate further.
I've already done the corresponding analyses for H. influenzae, though I should probably repeat the replication-direction analysis because that was done with a slightly different dataset.
I should also analyze both genome datasets for the numbers of one-off and two-off motifs (singly and doubly mismatched); that will be easy because we have a little Perl script (somewhere) to do that now.
I should look at the effect of coding constraints by doing Gibbs searches with the coding and intergenic subsets of both genomes. But I won't split up the coding subset by the different reading frames - this is messy and not very informative.
The analysis of covariation has been done for Neisseria. I can't remember whether the H. influenzae covariation analysis was only done with the old dataset and so should be redone. The control analysis for Neisseria showed an odd pattern of weak covariation between every third position of random sequence segments. I don't think it's due to coding effects because I see the same pattern, a bit weaker, in the noncoding dataset. Maybe it's those blasted RS3 elements, so perhaps I should redo the Neisseria analysis with the RS3-deleted dataset.
The analysis of within-species variation at uptake sequences in H. influenzae is done, and there's no N. meningitidis equivalent to do.
And finally, what needs to be done with the Perl simulation of uptake sequence evolution? The few paragraphs I've found in the manuscript (written by me last fall) say that I'm going to take 200kb of intergenic sequence (or maybe all the intergenic sequence) of H. influenzae and of N. meningitidis, and find out what combinations of mutation rates and uptake bias the simulation needs to maintain their present levels of uptake sequences. Sounds straightforward, though I bet it isn't really.
Drones, Silicon Valley and biology: The future isn't here yet
1 hour ago in The Curious Wavefunction