I should have paid more attention in stats class
By Rosie Redfield on Saturday, March 21, 2009
One of the reviewers of the manuscript I'm revising for Genome Biology and Evolution asked if we could do some statistical analysis of the data we present in a graph. On the left I've put the graphs and the data . The lower graph panel and lower block of data are the controls; we can ignore them for now. I think we can also safely ignore what the data represent.
I'll describe the significance questions with respect to the top-panel graph (A):
We want to know the following:
In the left group (4 blocks of four bars, labels SAV, TAL, KEG, PHF/L), are the four blue bars significantly higher than the red, yellow and green bars beside them?
In the middle group (4 blocks of 4 bars, labels QAV, TAC, TSG, PLV), are the four red bars significantly higher than the blue, yellow and green bars beside them?
In the right group,(5 blocks of 4 bars, labels PSE, SDG, FRR, QTA, RLN/K), are the five yellow bars significantly higher than the blue, red and green bars beside them?
The actual numbers are in the upper part of the table, in the correspondingly coloured cells, and below I'll restate the above questions in terms of these numbers.
In the top four rows of the table (blue), are the numbers in the bright-blue cells significantly higher than the numbers in the light-blue cells in the same rows?
In the next four rows of the table (pink), are the numbers in the bright-pink cells significantly higher than the numbers in the light-pink cells in the same rows?
In the next four rows of the table (yellow), are the numbers in the bright-yellow cells significantly higher than the numbers in the light-yellow cells in the same rows?
I suspect this is an ANOVA (analysis of variance) type of problem. But I'm pretty sure it would require more complicated analysis than the simple ANOVA described the new statistics textbook my author-colleague kindly gave me (probably to get me off his back with dumb statistics questions). Hmmm, maybe it would be possible to do a separate ANOVA on each group -- i.e. one for the blue data, one for the red data, and one for the yellow data.
My basic version of EXCEL doesn't have the statistics add-in needed for ANOVAs, and I can't even remember the name of the statistics/graphing package the lab owns (it's not installed on my computer). But I found an on-line applet to do two-way ANOVAs here ( I need two-way because I have two variables, the rows and the columns). So I pasted the data from the blue cells into the applet, with the following results.
"Conclusion on Treatments Effects: Very strong evidence against the null hypothesis." The null hypothesis is that all treatments (columns) gave the same results, so there are very significant differences between the data in the different columns (p=0.00058).
"Conclusion on Blocks Effects: Moderate evidence against the null hypothesis." The null hypothesis is that all blocks (rows) gave the same results, so there are moderately significant differences between the data in the different rows (p=0.011).
This is definitely the kind of information I want, so I guess I should find the lab's statistical/graphing package and find someone to show me how to use it to do ANOVAs properly.
But this analysis doesn't let me see whether it's only the bright-blue column that's significantly different from the others. I guess I could repeat the analysis, leaving out the bright-blue data, and see if the others are not significantly different, but I'm sure there's a better way to do this. After I play around with our statistical/graphing package for a bit, I might be knowledgeable enough to go ask my colleague for help without embarrassing myself too badly.