I'm finally back to working on papers about uptake sequence evolution. Right now its the analysis of evolutionary interactions between each genome's uptake sequences and its proteome.
While I've been neglecting the manuscript my bioinformatics collaborator has been generating the final data and, I now discover, suggesting a different and more logical way to order the results. So I'm shuffling the sections around, rewriting the text that links them together and explains why we did each analysis. Well, that's not exactly true. Any scientist will admit that their papers don't always honestly explain the actual reasons why each experiment or analysis was done. That's because scientists often do good experiments for not-very-good reasons, and only later discover the logical thread that links our results together.
And sometimes, like now, we initially don't think to do experiments or analyses, only later realizing the contribution they will make to understanding or explaining other results. The reorganizing I've just done suggested two simple correlations I might look for, which might provide context for interpreting the result I had in mind. So I entered some of my collaborator's data on the tripeptides that uptake sequences specify into a new Excel file, plotted a couple of simple graphs, and presto, new results!
These aren't very important results in themselves. The relative frequencies of tripeptides specified by uptake sequences do correlate modestly (R2 = 0.54) with the total frequencies of those tripeptides in their proteomes. And the proportion of tripeptides usable by uptake sequences but not used correlates even more modestly (R2 - 0.4) with the tripeptides frequencies in their proteomes. But they provide a context for other results that makes them easier to understand.
and coming up with a couple of new simple analyses we had overlooked.
Leroy Hood and the tool-driven revolution in biology
1 day ago in The Curious Wavefunction