The postdoc and I are still/again working on his paper about DNA uptake specificity, and were getting deep into the Results. I had thought I understood the specificity matrices and sequence logos, but tonight I realized that I didn't even understand the simple stuff (the ratios of different categories of sequences in the sequence reads from the input pool and the recovered pool.
While struggling with that, I realized that we had been assuming that all the DNA fragments in the recovered pool had indeed been taken up by the competent cells and then reisolated. We hadn't properly controlled for the possibility that the recovered pool was contaminated with DNA fragments that had never been taken up, but that had the same sequence distribution as the input pool.
I started thinking about contamination because of what appeared to be an odd result. The postdoc's results show that four positions in the 31 bp DNA uptake sequence (USS) are particularly important for uptake; fragments with a non-consensus base at even one of these positions were rarely taken up by competent cells. But he has sequences of 10^7 fragments from the recovered DNA pool, and this large dataset does include quite a few fragments with such mismatches.
Surprisingly, when a set of fragments with one of these mismatches is examined (e.g. fragments that do have a mismatched base (A, G or T rather than C) only at position 7, the most important for uptake), the bases at all of the other positions of the USS appear to have made no contribution to uptake. This observation might be explained by some weird type of interaction, but it might also be a sign that the recovered DNA preparation is contaminated with DNA fragments that were never taken up.
In fact, we can use this observation to set limits on contamination, and to correct the database for possible effects of contamination, by considering the implications of two extreme hypotheses about contamination. The first is the hypothesis that there is no contamination - that every fragment in the recovered pool is there because it was taken up by a competent cell, including all the fragments with a mismatched base at position 7. The second is the hypothesis that fragments with A, G or T at position 7 are never taken up, and that all of the fragments with a mismatched base at position 7 are there because of contamination, not uptake. We can use this hypothetical value to calculate an upper limit to the contamination of the recovered pool, and then apply appropriate correction factors to all the analyses.
It's a bit embarrassing to realize that we've been neglecting this obvious issue with our data. But I'm really glad that we're catching it now and not leaving it for the referees to catch after we submit our manuscript.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
Thinking about possible contamination in the postdoc's data
1 comment:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
I like the "maximum contamination" calculation you have above.
ReplyDeleteHowever, I'd hesitate to just call it all contamination:
(1) It could be non-specific uptake. Clearly, not having a C at pos7 makes sequences very hard to take up, but that doesn't mean they weren't.
(2) At least a good portion of reads with mismatches in the inner core have what look like "good" uptake signals in other alignment frames... our "exception-that-proves-the-rule". It won't be all of the "contamination" but will account for a chunk.
(3) Our washing conditions were in principle quite stringent, but we never used DNase. It is possible that we are also seeing some things that can bind tightly to cells, but not be taken up...