The postdoc gave me the actual numbers for the fragments with single mismatches at position 7: The input set contained 5940 of these, and the recovered (uptake) set had only 215. If we hypothesize that all of these 215 arise from contamination, then 3.6% of the fragments in the recovered pool come directly from the input pool. Because we know the exact sequence distribution of the uptake pool fragments (we sequenced 10^7 of them) we can correct the distributions in various subsets of the recovered pool for this possible contamination.
The plan is to do the main analyses with and without the correction. We don't actually know how much contamination there is, but 3.6% is the upper limit. Any results that don't change when the correction is applied are robust.
The analysis I'm most concerned about is the test for interactions between bases at different positions in the uptake sequence. The measure of interactions between positions that don't have big effects on uptake is likely to be robust, as these samples are large and removing 3.6% is unlikely to make much difference. For positions with very strong effects (6, 7, 8 and 9), the contamination correction will definitely reduce the ability to detect any interactions (because the sample size will get much smaller)...
What we see when we ignore possible contamination: When all the sequences with a mismatch at a weak position (e.g. 5) are aligned, we see an increase in the importances of some other positions, and we think this means that the effects of the positions are interdependent. But when all the sequences with a mismatch at a very strong position (e.g. 8) are aligned, we see that the importances of the other positions all shrink dramatically. This could mean that when base 8 is incorrect the DNA is taken up by some sequence-independent process, or that the fragments with incorrect base 8 contain out-of-alignment uptake sequences that our analysis overlooked (we know this occurs). But it could also mean that the fragments with incorrect base 8 were not taken up at all, but entered the recovered pool as contamination. So we need to correct for the maximum possible contamination (3.6%) and see how the importances change.
How should the corrections be done? We have position-weight matrices for the recovered and input pools, and for each subset of this data (e.g. for all fragments with mismatches at position 5, or 8, or 14). We think that, to correct a recovered-pool matrix for contamination, we just need to subtract from it 3.6% of the corresponding value in the corresponding input-pool matrix. This is easy, but when the postdoc tried it he sometimes got negative numbers (whenever 3.6% of an input value was larger than the recovered value. He thinks this means we need to use a more complicated calculation, but I wonder if it just means that, at this position of the matrix, the corrected value is indistinguishable from zero. We both think that it might be wise to consult a mathematician at this point.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
1 comment:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
Not a statistician here, just thinking out loud.... Why are negative numbers bad? A weighted coin might be 70/30 heads, so if you subtracted background (50/50) you would get +30 heads, -20 tails. Dosent a negative number just imply selection against something at a certain position?
ReplyDelete