Yet another issue in interpreting microarray data

This afternoon I met with my collaborators on the sub-inhibitory antibiotic effects project. We considered where the project could lead (or end): Because the results of the (low-resolution) microarrays suggest some genes are turned up about two-fold by the antibiotic treatment, we plan to do real-time PCR on a few of these genes, looking for the same effect; call this analysis A. (I'm giving fewer details than usual to protect the delicate sensibilities of my collaborator.)

If the results of A show at least a two-fold increase, we do experiment B. If the results of B also show at least a two-fold increase, we could write a small paper. If the results of B don't show at least a two-fold increase, we could repeat the microarray analysis (doing it better this time) and maybe examine protein levels. Maybe, if A gives the two-fold increase, we also consider writing a grant to get money to do this properly. If A doesn't show at least a two-fold increase, we could either end the project, or repeat the microarray analysis anyway in the hope of discovering something the first analysis missed.

But at the end of the meeting the technician raised an issue we hadn't thought about. The student who did the original microarray analysis didn't see the two-fold increase we see in our reanalysis of her data - I've been assuming that this is because she did something wrong in her analysis of the data. However the technician described what the student had done differently than us, and I think it may not be wrong at all - it might be what we should have done.

Here's the issue: In an earlier post I raised the concern of how much we should trust individual data points in a microarray analysis, and pointed out that strongly expressed genes are likely to give more trustworthy signals than weakly expressed genes. But the issue of signal strength and trust may also apply to whole microarray slides, not just individual genes on them. Some RNA preps give better probes than others, due to differences in RNA quality and/or in the various processing steps involved in creating dye-tagged cDNA from the RNA. And some hybridizations give better signals than others, due to variations in the quality of the slide and in the hybridization and washing conditions. An inexperienced researcher will also generate more variation between slides. The net result is that some slide hybridizations (some 'samples, in GeneSpring's terminology) may have substantially stronger signals than others, averaged across all the genes on the slide.

For each gene on each slide, an expression 'ratio' is calculated by dividing the fluorescence signals from the 'test' RNA by the fluorescence signals from the 'control' RNA. Should we put equal trust in slides that gave generally weak signals and slides that gave generally strong signals? Or should we put more confidence in the slides with generally strong signals, because the ratios calculated from those with weak signals will have suffered more from random effects?

What we have done is first calculate the ratio for each slide, and then calculate a mean ratio over the three slides whose signals looked most consistent. But what the student did was to first sum all the 'test' signals for the three (?) slides and calculate their mean, similarly sum all the 'control' signals and calculate their mean, and then use the ratio of these means to decide about significance. Provided there is little slide-to slide variation in overall signal intensity or in the ratio of 'test' to 'control' signals for the gene in question, both methods (mean of ratios or ratio of means) will give similar answers. But if there is substantial variation the answers will differ.

Because the effect for most genes was not much higher than the two-fold cutoff the student used, the differences in analysis methods could have determined whether these genes were just-above or just-below the cutoff. For genes that were below the cutoff in weak signal experiments, but above it in high-signal experiments, she would have seen significance where we didn't. On the other hand, for genes that were above the cutoff in weak experiments but below it in strong ones, we would see significance where she didn't.

Should we worry about this? I've just looked back at the ratios for the different slides - no one slide has conspicuously more significant ratios than the others, as we would expect if one strong slide was biasing the averages. But if it isn't a lot more trouble, it might be wise to go back and redo the calculations the way the student did them, so we can be certain we understand why her conclusions differed from ours.

2 comments:

  1. And do you have a rationale for the 2-fold cutoff? Or is that an arbitrary cutoff value in order to limit the number of candidates?

    ReplyDelete
  2. The 2-fold cutoff is indeed an arbitraty convention. I don't take it seriously; using it showed us a category of genes that seemed to have interesting changes, but we then examined the real expression changes of all the genes in the category. But I think the student used it rigorously.

    ReplyDelete

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS