A commenter on my most recent post about our messy microarray data pointed me to a paper suggesting a Bayesian approach to deciding whether apparent expression differences are significant. In principle this sounds great, but despite my attempts last summer to understand Bayesian methods, even this 'easy' paper is over my head. If the array work was a problem close to my heart, or if the preliminary data looked a lot more interesting than it does, I'd probably be prepared to master the necessary statistics. But it's not and it doesn't (sorry Julian), so I'm taking the statistical easy way out.
The previous post outlined the analysis that the technician has now done. For each treatment (two concentrations of rifampicin and one of erythromycin) she kept the data from the three best samples. Then she combined the data for the three samples into 'experiments', and filtered these to get lists of genes whose signals were consistently increased or decreased by at least twofold in all three samples. And she noted the descriptions of the proteins encoded by most of these genes.
The previous post also raised several issues we need to be concerned about. One of these we still don't have information about - that's the level of 'trust' the software has assigned to each gene's data. It's likely that some of the genes should be removed from their lists because the results of the sample used to colour the line are not considered trustworthy, perhaps because the signals are very weak, or because the two replicate signals in the sample disagree. For now I'll set aside the issue of 'trust', and refer to all the genes on the list as being 'significantly changed' by the antibiotic treatment.
What do we learn from these lists?
The experiment with the low concentration of rifampicin gives only a few genes with significant changes (one down, seven up). This is consistent with the 'noisy' appearance of the experiment in GeneSpring's graphical display. The first sample has a lot of variation, and this has little correlation with the variation in the other two samples. The first sample for the one gene that's significantly down is obviously unreliable, so I doubt that this gene is genuinely induced by the treatment. All of the seven 'up' genes are close to the two-fold cutoff in at least one sample, and none are up more than 3.8-fold in their highest sample. Five encode ribosomal proteins; the likely significance of this is discussed below.
The experiment with the higher concentration of rifampicin looks better, and gives more significantly changed genes (15 down, 36 up). None of the 'down' genes are the same as the one seen in the low concentration analysis, increasing my confidence that that one should be ignored. None of the decreases are consistently very strong, and the described functions of these 'down' genes don't suggest any interesting patterns.
The 'up' genes include 18 ribosomal proteins. The TIGR database says that H. influenzae has 55 ribosomal protein genes out of about 1740 total genes, so finding 18 of these in the 36 'up' genes is clearly a significant pattern. This adds confidence to the finding of five ribosomal proteins induced with the low rifampicin concentration, but the confidence is tempered because only two of the five are significantly up in both experiments. In the previous post I raised the concern that the strong signals expected from ribosomal protein genes might be giving an ascertainment bias, but the absence of ribosomal protein genes from the 'down' lists (and from the erythromycin lists discussed below) suggests that this isn't a problem. None of these genes is very strongly induced (most 3-4-fold). Several other proteins in the 'up' list are quite strongly induced in one or more samples, and amino acid and dipeptide transporters seem to be overrepresented.
Analysis of the erythromycin experiment produced 19 'down' genes and 17 'up' genes. Neither list has any ribosomal proteins, increasing my confidence that their over-representation on the rifampicin 'up' lists reflects genuine induction. The 'down' effects are all quite weak, but several of the 'up' effects are strong. There are three pairs of 'up' genes that share operons, which increases the confidence that they are genuinely induced. Six genes are described as 'reductases' (including two dimethyl sulfoxide reductase subunits and biotin sulfoxide reductase), and five genes are involved in arginine/ornithine/putrescine pathways.
Although I don't have the GeneSpring program I can get a good idea of how much trust it has assigned just from the screenshots I have. Only three of the strongly 'up' genes from the high-concentration rifampicin experiment have the dark colour GeneSpring uses to indicate high trust: a dipeptide transporter and two genes of unknown function. This is probably because the first sample is very noisy. Trust is generally stronger in the erythromycin experiment.
Treatment with rifampicin at the sub-inhibitory concentration of 0.05 microgram/ml induces expression of genes for ribosomal proteins and probably of genes for amino acid transporters. Does this make biological sense? Maybe. At the much higher inhibitory concentrations rifampicin inhibits transcription by RNA polymerase. If the main effect of a very weak inhibition is a shortage of the proteins the cell needs most of (= ribosomal proteins), it might turn up expression of the corresponding genes.
Treatment with erythromycin at the sub-inhibitory concentration of 0.1 microgram/ml probably induces genes for some reductases and proteins that break down arginine. Does this make biological sense? Not in any way I can see. Erythromycin at inhibitory concentrations blocks protein synthesis. The logic I suggest above for rifampicin would seem to apply more strongly to erythromycin, making me suspect that my application of it to rifampicin is just empty story telling.
This work's 'publishability' would be higher if we found effects on genes associated with virulence to the human host or with resistance to the antibiotic. Unfortunately, although standards for claiming 'virulence gene' status are lamentably low, none of the genes on the 'up' or 'down' lists is identified in any way with virulence.
I'm now going to email my collaborator, asking him to read this post and consider whether it's worth continuing with these experiments.
FieldNotes: Oliver Sacks, and irreproducible psychology
4 hours ago in Field Notes