RRResearch: Juggling data into the US-variation manuscript

I have two uptake sequence analyses I want to publish, and I'm trying to decide whether they would fit into the US-variation manuscript. One is the analysis of covariation between different USS positions and between different DUS positions (see here for example), and the other is BLAST analysis of the amounts of within-species sequence variation seen at different USS positions (see here). Both are certainly about variation in uptake sequences, and in previous blog posts I've described them as parts of this manuscript, but since then I've gotten bogged down in thinking that they don't really belong.
But let's assume that they do belong in this manuscript, and consider the next question of where would they fit best and how they would be connected with the other parts. They have to go after the first major section, which describes the analyses of the H. influenzae and N. meningitidis genomes with the Gibbs Motif Sampler, because they use datasets this generated. Should they go before the other major section, which describes the Perl model of uptake sequence evolution, or after it? These two major sections fit well together, because the model makes predictions that can be tested with the Gibbs datasets.
What if the model went first, using the generic uptake sequence? It could then be followed by the Gibbs analyses, applied both to the model's output and to the real genomes. And then by more versions of the model, using the position-weight matrices generated from the real genomes. We could then do the other analyses of the genomic data....
Here's an attempt at an outline using this order:
Intro:
Emphasize the goal of understanding the cause of uptake sequences. Summarize what we know of their properties and the evidence that they arise by point mutation and spread by transformation more efficiently than other sequences because they are preferentially taken up (cartoon figure contrasting drive and beneficial-variation models). We have developed a model of this evolutionary process, which we describe and test below.
Results:

The model (how it works, with a cartoon figure).
We run the simulations to equilibrium (evaluated as score or US count), using a 10 bp generic position-weight matrix.
Characterization of model: Effects of (i) genome length (each cycle takes longer); (ii) mutation rate (getting to equilibrium takes more cycles); and (iii) recombination rate (determines equilibrium score, below saturation).
Properties of equilibrium sequences (i): Proportions of perfect and mismatched occurrences. Use Gibbs to find them? Compare with direct counts of perfects, one-offs and two-offs? Explain why Gibbs is, in principle, better?
Properties of equilibrium sequences (ii): Spacings between occurrences (found by Gibbs. These are more even than random, as are the spacings between real uptake sequences. The spacing depends on the length of the recombining fragments.
Use Gibb to reanalyze the Perl output sequences.
Use Gibbs to reanalyze the genome sequences.
Repeat the key Perl runs using the position-weight matrices from the Gibbs analyses of real genomes.
Compare what we find.
Do other reanalyses and new analyses of the Gibbs output of the genome analyses. (Variation in subsets of genome sequences (no interesting results but worthwhile anyway) This would include the covariation analysis and the BLAST analysis of within-species sequence variation.

That seems workable, and as sensible as the previous order. Now I need to decide whether it's sufficiently better than the old order to be worth the trouble of rearranging the whole Results section.
A couple of hours later: I don't think it is. For now I'm going to treat the manuscript as two distinct parts. First, the Gibbs analysis of the genome and sub-genome sequences, and analyses using the occurrences these identify, including the covariation and within-species variation analyses. Second, the Perl simulation of uptake sequence evolution, and comparison of its results with the genome analyses.

Field of Science

RRResearch

Juggling data into the US-variation manuscript

No comments:

Post a Comment