RRResearch: Arrays and GeneSpring

Yesterday we took another look at the microarray data that we hope will show whether very low antibiotic doses change expression of H. influenzae genes. As I posted the other day, the big problem is deciding whether the differences we see in some genes are due to the antibiotic treatment or just chance.

I think that this project is at a tipping point. (My collaborator, who initiated the project, may think differently.) We need to decide whether or not to keep putting time and money and brainpower into it, and this decision will depend on the results we find in the data we have. If we find that the antibiotic treatments have had strong effects on gene expression, and especially if the genes involved have functions that tell a scientifically interesting story, we’ll want to go on to collect more data and solidify the data we have. But if the apparent effects are weak and unreliable we may decide not to proceed.

How are we going to analyze the data we have? Our plan now is to first set aside the worst (most error-prone) microarray replicates from each of our three treatments, keeping only the three replicates that appear to have the smallest random effects.

We’ll then ‘filter’ each treatment to identify genes whose expression appears to have been at least doubled by the antibiotic treatment in all of the three replicates. We had thought this was going to be troublesome to do, but I discovered that GeneSpring has a number of automatic filtering routines that can easily be used to identify exactly the sets of genes that meet these constraints. This will probably give a list of about 20 ‘genes’ from each treatment. (We’ll also do the same analysis for genes whose expression is consistently reduced by half or more; everything I say below applies to the reduced-expression sets too.)

I put ‘genes’ in quotes in the previous paragraph to bring out the issue that not all of the spots on the microarrays represent genuine H. influenzae genes. Some are just control spots of fluorescent dye, and GeneSpring knows to ignore these. But others are various control DNAs including genes from other organisms. As presently set up, Genepring doesn’t know that these aren’t real H. influenzae genes. (I didn’t set up this GeneSpring; it belongs to another lab. I think I know how to add the H. influenzae genome data it needs to do this but haven’t had time to try it.) It won’t be a lot of work to go through each list, removing the entries that aren’t real genes.

Two of the three treatments are different concentrations of the same antibiotic. Comparing the lists produced by these two treatments will help us decide whether the genes on the lists are there because of real antibiotic effects or chance –genes that show up on both lists can be confidently viewed as being genuinely induced by the antibiotic treatment.

Then we’ll want to look at the identities of the H. influenzae genes that remain. First, how many such genes are there? Finding that only a few genes are changed would be less interesting than finding many. How strongly are they induced (or repressed)? Weak effects are not as interesting as strong ones, and are more likely to be due to chance. How consistent is the change across the three replicates?

Another issue is the level of ‘trust’ that GeneSpring has assigned to each gene’s expression level, which tells us how reliable Gene Spring thinks the data are. This depends on several factors, though I’m not sure whether GeneSpring considers all of them. First, each array has two replicate ‘spots’ for each genes, and each reported expression level is the average of these two spots’ scores. If the two are not very similar, the average is not very trustworthy. Second, results for genes that are weakly expressed may not be as trustworthy as those for strongly expressed genes because the signal is too weak. Third, the software that reads the array images (we use Imagene) scores the background around each spot, and if the background is too high the spot score is not very trustworthy. So we’ll pay more attention to results that are assigned high ‘trust’.

Once all these factors have been taken into account we'll get to the most interesting one: what do the affected genes do? Some will be genes about which little or nothing is known – identified as genes because they could code for proteins, and often because the same hypothetical proteins also show up in other bacteria. But some will be genes whose functions are known, and this is the information we’ll use to decide what to do next.

One additional concern is ‘ascertainment bias. Because microarray analysis is more accurate for genes that are strongly expressed in the untreated cells (produce lots of mRNA), it is more likely to confidently detect relatively small changes in gene expression in these genes, and to miss or not trust small changes in weakly-expressed genes. Several of the genes on the preliminary lists encode ribosomal proteins, which we know are normally highly expressed. If we find that, say, 40% of the induced genes encode ribosomal proteins, does this mean that ribosomal proteins are preferentially induced by antibiotic treatment, or only that they are the easiest to detect? One check on this would be whether ribosomal protein genes also show up in the sets of genes that appear to be repressed by the treatment; if so this is probably an ascertainment bias effect.

If each treatment has produced only a few genes with significantly changed expression, and these barely make the two-fold cutoff, we might decide not to proceed with additional experiments, especially if we don’t know what these genes do.

If some ‘interesting’ genes show significant effects, we will probably use an independent method to confirm that they really are induced. This method is real-time (quantitative) PCR. It’s time-consuming and expensive, but it can give more accurate measurements of the amounts of specific RNAs in different samples.

And if the results are promising we’ll go on to do more microarray analysis, this time using cells carrying mutations that make them resistant to the harmful effects of high doses of these antibiotics. We suspect that these cells will respond differently to the low doses we’re tested on normal cells. Because a lot is known about how such resistance mutations act, their effects on low-dose responses will help us understand how the low doses exert their effects, and the significance of these effects for antibiotics used to treat infections.

(I'm away from the lab for a few days; I wrote this and the next two posts on the plane. They can fill the gap until I get back.)

Field of Science

RRResearch

Arrays and GeneSpring

1 comment: