I need to decide how to reanalyze some old data.
In a previous post (Representation matters) I mentioned that I had assembled into a Word file some data from old publications by other labs. This file now contains the sequences of all the DNA fragments whose uptake by H. influenzae has been measured, and some indication of how well each fragment was taken up. I want to reanalyze this data to see if I can pull out more information than was available to the original experimenters. I think this might qualify as a 'meta-analysis' because the data comes from several independent studies; I'll say a bit more about meta-analysis at the end of this post.
I've played around with this data in the past, just to see if I could see any new patterns, but now it's time to get serious because I want to be able to include the analysis in a paper correlating the abundance of different USS variants in the genome with the preferences of the uptake machinery. First I should assemble all of the four papers and reread them carefully. One of them has disappeared and isn't available on-line - I may have to walk over to the library tomorrow to get a copy (how old-fashioned!).
The most detailed study (also the most recent - 1990) looked at 28 plasmids with inserts of short H. influenzae DNA fragments. The uptake scores were published, and the insert sequences are in GenBank. All of these plasmids were preferentially taken up over a plasmid with no insert. The uptake scores of 15 of these inserts were also measured after being cut away from the plasmid vector; in most but not all cases uptake of the fragment correlated well with uptake of the plasmid that contained it. And most but not all of the sequenced inserts contained sequences resembling the USS core.
The other papers are older (1980-84). The first reports the work that first identified the USS core sequence. They sequenced four fragments that were preferentially taken up, and found five copies of an 11bp sequence. But there are complications that make this work hard to compare to the 1990 results. First, the DNA fragments did not come from H. influenzae but from the related bacterium H. parainfluenzae. I don't think this should matter. Second, uptake was not quantitated, just scored as yes or no; the band intensities in their gel suggest these fragments are taken up at least 10-fold better than other (no USS) fragments. A later paper (1982) from the same lab examined uptake of two of these fragments more quantitatively, and also looked at fragments with synthetic USSs, using what was then very new technology. The paper gives more gels, and also relative uptake scores for some of the fragments they tested. The final paper also measured relative uptake of purified fragments, this time from the H. influenzae phage HP1.
The 1990 data could be further analyzed by using software to do an unbiased motif-search - does this find the USS pattern? This could be restricted to those fragments that were strongly taken up, or applied to all the fragments. And does how well a particular fragment is taken up correlate with its having a better or worse match to this USS motif?
I would like to be able to then add the results of the earlier work into this analysis, but it's complicated by not having comparable measures of uptake. I think I will have to make some inferences, based on info that is common to the different papers, such as the uptake of the 'negative' fragments. But these inferences will probably not be as solidly justified as I would like.
I think this kind of problem is common to meta-analysis generally. Meta-analysis takes results from different studies that are individually weak or flawed in different ways, and tries to infer a stronger conclusion than could be obtained from any one of the studies. But because the studies are different, approximations always have to be made so the variables analyzed can be made comparable.
A new kind of problem
4 hours ago in RRResearch