 Home
 Angry by Choice
 Catalogue of Organisms
 Chinleana
 Doc Madhattan
 Games with Words
 Genomics, Medicine, and Pseudoscience
 History of Geology
 Moss Plants and More
 Pleiotropy
 Plektix
 RRResearch
 Skeptic Wonder
 The Culture of Chemistry
 The Curious Wavefunction
 The Phytophactor
 The View from a Microbiologist
 Variety of Life
Field of Science




Better diagnosis of infections6 days ago in Genomics, Medicine, and Pseudoscience





Does variation in sequencing coverage help explain apparent variation in recombination?1 month ago in RRResearch



Harnessing innate immunity to cure HIV2 months ago in Rule of 6ix

WE MOVED!2 months ago in Games with Words







post doc job opportunity on ribosome biochemistry!1 year ago in Protein Evolution and Other Musings

Growing the kidney: reblogged from Science Bitez1 year ago in The View from a Microbiologist

Blogging Microbes Communicating Microbiology to Netizens2 years ago in Memoirs of a Defective Brain




The Lure of the Obscure? Guest Post by Frank Stahl4 years ago in Sex, Genes & Evolution


Lab Rat Moving House5 years ago in Life of a Lab Rat

Goodbye FoS, thanks for all the laughs5 years ago in Disease Prone


Slideshow of NASA's StardustNExT Mission Comet Tempel 1 Flyby5 years ago in The Large Picture Blog

in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
I should have paid more attention in stats class
One of the reviewers of the manuscript I'm revising for Genome Biology and Evolution asked if we could do some statistical analysis of the data we present in a graph. On the left I've put the graphs and the data . The lower graph panel and lower block of data are the controls; we can ignore them for now. I think we can also safely ignore what the data represent.
I'll describe the significance questions with respect to the toppanel graph (A):
We want to know the following:
In the left group (4 blocks of four bars, labels SAV, TAL, KEG, PHF/L), are the four blue bars significantly higher than the red, yellow and green bars beside them?
In the middle group (4 blocks of 4 bars, labels QAV, TAC, TSG, PLV), are the four red bars significantly higher than the blue, yellow and green bars beside them?
In the right group,(5 blocks of 4 bars, labels PSE, SDG, FRR, QTA, RLN/K), are the five yellow bars significantly higher than the blue, red and green bars beside them?
The actual numbers are in the upper part of the table, in the correspondingly coloured cells, and below I'll restate the above questions in terms of these numbers.
In the top four rows of the table (blue), are the numbers in the brightblue cells significantly higher than the numbers in the lightblue cells in the same rows?
In the next four rows of the table (pink), are the numbers in the brightpink cells significantly higher than the numbers in the lightpink cells in the same rows?
In the next four rows of the table (yellow), are the numbers in the brightyellow cells significantly higher than the numbers in the lightyellow cells in the same rows?
I suspect this is an ANOVA (analysis of variance) type of problem. But I'm pretty sure it would require more complicated analysis than the simple ANOVA described the new statistics textbook my authorcolleague kindly gave me (probably to get me off his back with dumb statistics questions). Hmmm, maybe it would be possible to do a separate ANOVA on each group  i.e. one for the blue data, one for the red data, and one for the yellow data.
UPDATE:
My basic version of EXCEL doesn't have the statistics addin needed for ANOVAs, and I can't even remember the name of the statistics/graphing package the lab owns (it's not installed on my computer). But I found an online applet to do twoway ANOVAs here ( I need twoway because I have two variables, the rows and the columns). So I pasted the data from the blue cells into the applet, with the following results.
"Conclusion on Treatments Effects: Very strong evidence against the null hypothesis." The null hypothesis is that all treatments (columns) gave the same results, so there are very significant differences between the data in the different columns (p=0.00058).
"Conclusion on Blocks Effects: Moderate evidence against the null hypothesis." The null hypothesis is that all blocks (rows) gave the same results, so there are moderately significant differences between the data in the different rows (p=0.011).
This is definitely the kind of information I want, so I guess I should find the lab's statistical/graphing package and find someone to show me how to use it to do ANOVAs properly.
But this analysis doesn't let me see whether it's only the brightblue column that's significantly different from the others. I guess I could repeat the analysis, leaving out the brightblue data, and see if the others are not significantly different, but I'm sure there's a better way to do this. After I play around with our statistical/graphing package for a bit, I might be knowledgeable enough to go ask my colleague for help without embarrassing myself too badly.
2 comments:
Markup Key:
 <b>bold</b> = bold
 <i>italic</i> = italic
 <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
I really like using Graphpad Prism for that kind of stuff. All of the statistical analysis tools are linked to the extensive help files, which almost make up a statistics textbook themselves. Very clear, very understandable. Even for someone like me, who also didn't pay that much attention in statistics class ;)
ReplyDeleteI think Graphpad Prism is the name of the package we already have (the one I need to learn to use)!
ReplyDelete