Making sense of old microarray data

Last night I went over the microarray data from the collaborative experiments that are testing how low concentrations of antibiotics affect H. influenzae gene expression. Unfortunately we've been unable to locate the notebook in which the student who did the work recorded the experimental details, and unless we can sort out some key points the array results will be unpublishable.

We have data from 4 or 5 arrays for each of three treatments. I think that all of the arrays have produced quite noisy data, so none is at all convincing in isolation. The different arrays for each treatment used several independent preparations of RNA, and the hope is that, by pooling the replicate array data for each treatment we can see significant differences.

The first image shows part of the data from one treatment, with the lines joining points indicating the amount of RNA produced from one gene relative to the RNA from untreated cells, as measured by the different arrays. This representation makes it easy to compare results from the replicate arrays.

Here we only see points for two replicates, coloured by the strength of their expression in the first replicate. We see that some lines (genes) have increased expression (red) in both replicates, and some genes consistently have reduced expression (blue). So far so good - these may be genes that are affected by the antibiotics we're testing.

The problem created by the lost notebook is that we don't know, for each sample, which RNA came from the untreated cells and which came from the cells treated with antibiotic. The problem this creates is illustrated by the second graph. This shows the data for a replicate where we may have reversed the information from the treated and untreated RNAs. Notice that the blue lines have moved to the top at this point, and the red lines to the bottom. This tells us that genes that appeared to be induced by the antibiotic (red) in the other replicates are here apparently repressed (at the bottom of the graph), and vice versa.

We could just switch the two RNAs to make the data look more reproducible. But in the absence of notebook evidence about which RNA is which this would scientifically suspect, like manipulating data points to better fit expectations. Worse, we thought we did have some evidence (from an image stored in a lab computer) that helped us decide which RNA was which, but using that evidence is what gave us this apparently-switched replicate.

I'm hoping that a response from the student who did the work will reveal that we've misinterpreted what she told us about these RNAs. If this can't be sorted out we'll need to remove this replicate from our data set.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS