We've posted the manuscript on the public arXiv.org server. You can download the full pdf, including all the supplementary data, at http://arxiv.org/abs/1201.6643.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
-
Political pollsters are pretending they know what's happening. They don't.5 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections6 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
Lists of Elsevier journals to boycott
Readers of this blog probably already know that there's a call out to boycott journals published by Elsevier because of their anti-scientific publishing practices. Initially researchers were signing a pledge to not contribute to Elsevier's activities, by refusing to publish in, referee for, or do editorial work for any Elsevier journal. I've signed this (at The Cost of Knowledge), and you should too.
Jon Eisen has now expanded this, asking that researchers also refrain from promoting work published in Elsevier journals (Boycotting Elsevier is not enough - time to make them invisible). Don't write blog posts about them, don't choose them for journal club, don't even cite them if a reasonable alternative is available (he didn't say that last one, but I do).
This is all very well in principle, but if you're like me you have only a very fuzzy idea of which journals in your field are part of the Elsevier empire. The information is available for all fields on Elsevier's web site, but I thought I'd do my bit by listing here the journals in my field. But then I saw how many there are! Of course my 'field' is an unusual overlap of genetics and molecular biology and microbiology and evolutionary biology, but still. Here are a few you might recognize:
Elsevier's list (94 journals): Microbiology and Virology
Elsevier doesn't have a list for Evolution or any related topic, so I did a search. Although that identified 179 journals, most are not focused on evolution. Here's a link to the first 25 on the list.
Jon Eisen has now expanded this, asking that researchers also refrain from promoting work published in Elsevier journals (Boycotting Elsevier is not enough - time to make them invisible). Don't write blog posts about them, don't choose them for journal club, don't even cite them if a reasonable alternative is available (he didn't say that last one, but I do).
This is all very well in principle, but if you're like me you have only a very fuzzy idea of which journals in your field are part of the Elsevier empire. The information is available for all fields on Elsevier's web site, but I thought I'd do my bit by listing here the journals in my field. But then I saw how many there are! Of course my 'field' is an unusual overlap of genetics and molecular biology and microbiology and evolutionary biology, but still. Here are a few you might recognize:
- Cell
- Current Opinion in Microbiology
- FEBS Letters
- Gene
- Journal of Molecular Biology
- Microbiological Research
- Mutation Research
- Plasmid
- Protist
- Trends in Biochemical Sciences
- Trends in Cell Biology
- Trends in Ecology & Evolution
- Trends in Genetics
- Trends in Microbiology
Elsevier's list (94 journals): Microbiology and Virology
Elsevier doesn't have a list for Evolution or any related topic, so I did a search. Although that identified 179 journals, most are not focused on evolution. Here's a link to the first 25 on the list.
ArXiv submission?
I'd like to put our arseniclife submission to Science onto the arXiv server so that anyone who's interested can read it. Not many biologists use arXiv (it's mainly a physics thing) but it's a very convenient place to post manuscripts and other documents. And its use by physicists provides a great precedent for open science, because manuscripts are posted there and submitted for formal publication in peer-reviewed journals.
However, I'd like to first find out whether Science has any policy about arXiv pre-publication. Their Instructions to Authors say:
Distribution on the Internet may be considered prior publication and may compromise the originality of the paper as a submission to Science. Please contact the editors with questions regarding allowable postings.Has anyone had direct experience with this? I think I'd better send out a tweet...
Brief update
Things are progressing much faster than usual - this will be an epic week for paper-submitting!
The Research Associate submitted her manuscript about natural competence in E. coli a few days ago, to the Journal of Bacteriology. She's done a mass of work showing that a wide range of E. coli strains (including the full ECOR collection) are not naturally transformable even when their competence regulons are induced by artificial expression of Sxy from a high-copy plasmid. But a bit of transformation does happen if recombination functions are also artificially provided by inducing the lambda 'recombineering' genes. So the competence regulon does encode a functional DNA uptake machinery. We don't know why it's so inefficient compared to those of other bacteria, though we make a few suggestions.
The arseniclife analyses have all been replicated and the manuscript is almost ready for submission to Science as a Report. We're aiming for Monday - the grad student and his supervisor are still polishing up their figures.
After what seems like an eternity of wrestling with his DNA uptake specificity data, the analyses, and the interpretations, the post-doc and I now agree that we have an excellent manuscript that will be ready to submit to PNAS within a few days.
A manuscript by a visiting grad student from a few years ago is also going to be submitted within the next few days. It describes her investigations into competence of a relative of Haemophilus influenze, the poultry pathogen Gallibacterium anatis. We're listed as authors because some of the work was done in our lab, and because we've contributed quite a bit to the analysis and writing.
And finally, my article about how the teaching of introductory genetics needs to change is just about ready to send to PLoS Biology!
The Research Associate submitted her manuscript about natural competence in E. coli a few days ago, to the Journal of Bacteriology. She's done a mass of work showing that a wide range of E. coli strains (including the full ECOR collection) are not naturally transformable even when their competence regulons are induced by artificial expression of Sxy from a high-copy plasmid. But a bit of transformation does happen if recombination functions are also artificially provided by inducing the lambda 'recombineering' genes. So the competence regulon does encode a functional DNA uptake machinery. We don't know why it's so inefficient compared to those of other bacteria, though we make a few suggestions.
The arseniclife analyses have all been replicated and the manuscript is almost ready for submission to Science as a Report. We're aiming for Monday - the grad student and his supervisor are still polishing up their figures.
After what seems like an eternity of wrestling with his DNA uptake specificity data, the analyses, and the interpretations, the post-doc and I now agree that we have an excellent manuscript that will be ready to submit to PNAS within a few days.
A manuscript by a visiting grad student from a few years ago is also going to be submitted within the next few days. It describes her investigations into competence of a relative of Haemophilus influenze, the poultry pathogen Gallibacterium anatis. We're listed as authors because some of the work was done in our lab, and because we've contributed quite a bit to the analysis and writing.
And finally, my article about how the teaching of introductory genetics needs to change is just about ready to send to PLoS Biology!
Sorry for lack of posts...
We're busy finishing the Science/arseniclife paper, and the postdoc's uptake paper, and the RA's E. coli competence paper (submitted!), and an old visitor's competence paper, and my article about teaching genetics....
Sudsy gel
Why did I put SDS into the buffer of this agarose gel before I loaded it? So the DNA from the lysed cells wouldn't rise up out of the wells and spread out over the surface of the gel buffer, of course!
I'll tell you more tomorrow, if my experiment works out
The Discussion for the post-doc's DNA uptake paper
The post-doc and I have been struggling, independently and together, to create a good Discussion section for his paper on the sequence specificity of DNA uptake. We have lots of things we could write about, but many of them aren't well connected to each other or to what the paper is about. But now that I've done some good work on the end of the Results, I think I've finally come up with a Discussion that might work.
The Results ends with the analysis of possible interactions between bases at different positions in the uptake signal sequence motif he derived. We motivate this analysis as a possible explanation of the discrepancy between his uptake sequence motif and the one I derived years ago for the uptake sequences in the H. influenzae genome. I'm reproducing the two motifs below and below them his figure of his interaction analysis.
He's now done an uptake experiment that validates (confirms the predictions of) the interaction analysis. It shows that having mutations at two interacting positions (positions 4 and 11, I think) does indeed reduce DNA uptake much more strongly than predicted by the effect of each mutation singly. This motivated me to clarify for myself the implications of the interaction analysis.
The Results ends with the analysis of possible interactions between bases at different positions in the uptake signal sequence motif he derived. We motivate this analysis as a possible explanation of the discrepancy between his uptake sequence motif and the one I derived years ago for the uptake sequences in the H. influenzae genome. I'm reproducing the two motifs below and below them his figure of his interaction analysis.
He's now done an uptake experiment that validates (confirms the predictions of) the interaction analysis. It shows that having mutations at two interacting positions (positions 4 and 11, I think) does indeed reduce DNA uptake much more strongly than predicted by the effect of each mutation singly. This motivated me to clarify for myself the implications of the interaction analysis.
The diagram is at the top of this post. The center four positions of the core (left segment) are greyed out, because their effect on uptake is so strong that we can't make confident inferences about their interactions with other positions. The black brackets above each segment indicates that all of the bracketed positions participate in interactions with all the positions in the other bracketed segments, as indicated by the blue arrows. However the positions within a single bracketed segments do not interact with each other, unlike the minor covariation interactions (figure below) we found long ago between adjacent positions in the genomic USS sequences (pdf of the paper here).
Anyway, back to the Discussion...
First we can explain how the interaction analysis nicely reinforces the hypothesis that the uptake sequences in the genome are there as a direct and unselected molecular-drive consequence of the bias of the uptake machinery. This is an 'exception that proves the rule' situation, where the initial finding that the simple uptake-bias motif didn't match the genomic USS motif created doubt about the hypothesis, and the subsequent demonstration that interactions explain the discrepancy increased our confidence in it.
Then we can say that this leaves only the uptake bias itself in need of an explanation, and that we propose that it exists as part of a solution to the mechanistic problem of getting stiff, highly charged DNA molecules through the narrow secretin pore. Because cells efficiently take up closed circular DNAs we know that uptake doesn't usually initiate at a fragment end, but must initiate internally on DNA fragments (see this very old post). We hypothesize that the uptake bias favours sequences that are readily kinked, and that this kinking occurs mainly as consequence of interactions between the uptake sequence and mutually-interacting proteins of the uptake machinery (the uptake motif is itself only slightly bent, at the T-tracts). One reason to think that proteins mediate the interactions is that adjacent positions don't interact with each other.
Perhaps we can here pose a specific model of what parts of the uptake sequence interact with what parts of the machinery... This should take into account that the T-tracts interact with the core positions but not with each other.
Finally we can discuss the known or possible uptake biases of other species. First the other Pasteurellaceae, then the Neisserias, and finally bacteria where uptake bias may have been overlooked.
Growth of GFAJ-1 under phosphate limitation (correction)
Erika Check Hayden's otherwise-excellent Nature News report on our work contained one error, the statement that "Redfield was unable to grow any cells without adding a small amount of phosphorus".
Here's the email I had sent her in response to an earlier query about phosphorus concentrations:
Here's the email I had sent her in response to an earlier query about phosphorus concentrations:
Hi Erika,
The amount of phosphate in the medium used by Wolfe-Simon et al for their published growth analysis is indeed uncertain. Their ICP-MS analysis found that most of their media preparations contained 3-4 µM phosphorus, but one batch contained <0.3 µM and a solution containing only the AML60-medium salts had 7.8 µM. Because we don't know which batch was used for the results in their Figure 1, 3-4 µM is a good estimate of the phosphorus contamination, but the actual amount could have been substantially lower or higher.
My cells did grow in medium with no added phosphorus*, to about 5 x 10^6 cells/ml. This is about 1/4 of the density reached by GFAJ-1 in Wolfe-Simon et al's '-P/+As' medium. Adding 3 µM phosphorus to my medium increased GFAJ-1 growth fourfold, to the same density as reported in Wolfe-Simon et al's experiments. Simple algebra thus suggests that my unsupplemented medium contained about 1 µM phosphorus. The correspondence of the cell densities reached in my supplemented (3 µM) and their unsupplemented medium supports the estimate of 3-4 µM contaminating phosphorus in their medium.
My cells, like theirs, were clearly phosphorus-limited, because they grew to much higher densities when additional phosphorus was provided (see my recent RRResearch post and their Fig. 1).
I think this is the best that can be done, since Wolfe-Simon et al. apparently did not keep good enough records to determine the actual phosphorus concentration of the medium they used for their reported experiments.
Hope this helps,
Rosie
*The initial growth problem was not due to a lack of phosphorus but to the need for an amino acid, which I solved by supplementing the medium with a small amount of glutamate.
GFAJ-1 growth curves in limiting phosphate
The BioScreen is a wonderful time-saver. Over the weekend it did growth curves using media with 9 different concentrations of phosphate, each with 10 replicates, taking readings every 20 minutes for 46 hr!
This data tells me that my choice of 3 µM added phosphate was good; it gives about four times as much growth as no added phosphate, and twice as much as 1 µM, so the unsupplemented medium probably has about 1 µM contaminating phosphate.
The big surprise is that cells reach higher densities with a moderate amount of phosphate (70 µM) than they do with 250 µM or with the 1500 µM used by Wolfe-Simon et al. I don't think this has any serious implications for our analysis.
I was also surprised to see that the cultures with the higher amounts of phosphate were still growing at the end of the time course. I'm going to replicate these results with another time course, and this time I'll run it for longer (3 days? 4 days?).
This data tells me that my choice of 3 µM added phosphate was good; it gives about four times as much growth as no added phosphate, and twice as much as 1 µM, so the unsupplemented medium probably has about 1 µM contaminating phosphate.
The big surprise is that cells reach higher densities with a moderate amount of phosphate (70 µM) than they do with 250 µM or with the 1500 µM used by Wolfe-Simon et al. I don't think this has any serious implications for our analysis.
I was also surprised to see that the cultures with the higher amounts of phosphate were still growing at the end of the time course. I'm going to replicate these results with another time course, and this time I'll run it for longer (3 days? 4 days?).
The CsCl/mass spectrometry data
Here's the figure the collaborating grad student sent, showing his LC-MS analysis results of two DNA samples from the first set of GFAJ-1 preparations I sent him.
Each data point is a fraction from one of the CsCl gradients he fractionated the two GFAJ-1 DNA samples on (one for the -As/-P DNA and one for the +As/-P DNA). The -P condition is actually 3 µM added phosphate - this gives growth to approximately the same density as Wolfe-Simon et al's '-P' condition.
The lines with the solid symbols show the amount of DNA in each fraction - these each show a nice DNA peak at around the 800 µl position in the gradient.
The lines with open symbols show the amount of arsenate in each of these fractions - these lines are hard to see because they're sitting right on top of the X-axis (yes, that means that the amounts of arsenate detected are ~ zero 'ion counts'). The real values aren't necessarily zero, but they're below the detection limit for this experiment.
The dashed line shows the amount of arsenate that should have been detected if 4% of the phosphate in the DNA had been replaced by arsenate, as predicted by Wolfe-Simon et al's gel analysis (data in their Table S2).
The second graph shows his standard curve for arsenate detection.
Each data point is a fraction from one of the CsCl gradients he fractionated the two GFAJ-1 DNA samples on (one for the -As/-P DNA and one for the +As/-P DNA). The -P condition is actually 3 µM added phosphate - this gives growth to approximately the same density as Wolfe-Simon et al's '-P' condition.
The lines with the solid symbols show the amount of DNA in each fraction - these each show a nice DNA peak at around the 800 µl position in the gradient.
The lines with open symbols show the amount of arsenate in each of these fractions - these lines are hard to see because they're sitting right on top of the X-axis (yes, that means that the amounts of arsenate detected are ~ zero 'ion counts'). The real values aren't necessarily zero, but they're below the detection limit for this experiment.
The dashed line shows the amount of arsenate that should have been detected if 4% of the phosphate in the DNA had been replaced by arsenate, as predicted by Wolfe-Simon et al's gel analysis (data in their Table S2).
The second graph shows his standard curve for arsenate detection.
Academic publishing gets even sleazier
An email from Scientific and Academic Publishing:
Who are these guys? Their web site lists an impressive 133 journal titles. But most of the ones I clicked on are nonexistent - they have some Editorial Board members but no Editor in Chief or ISBN number, and haven't published any papers. Only one (The International Journal of Plant Research) had 'published' any papers, and these each had only Abstract and reference list- the body of the paper was apparently 'coming soon'. Perhaps this is to be expected, given that this journal too lacks an Editor in Chief. It may lack editors entirely - authors are instructed that they must format the html links for the references they cite, a function normally done by a journal's copy editors.
Their office is in California, so they're not a third-world effort. I couldn't find any information about publication charges at all, but I don't suppose they're just doing this for the glory.
Hmm, the International Journal of Genetic Engineering needs an Editor in Chief - that would look good on my CV. All I need to do is check the boxes on the handy application form they provide!
In the second paragraph they seem to be first saying they'll recommend my already-published paper to their editors (for the editors to do what, read it with admiration?), and then asking me to add a bit of new material to it and submit it to them for publication. This reeks of self-plagiarization. But in the next sentence they ask for other papers instead.Dear Rosemary J. Redfield,
This is Scientific & Academic Publishing, USA. Nice to get your information from the journal PLOS Pathogens and also happy to pass on our regards to you from the editorial department of SAP.
We've finished reading the abstract of your paper Transformation of Natural Genetic Variation into Haemophilus Influenzae Genomes and will recommend it to our editors. If you are interested in our journals and want to publish it on our journals, please extend this paper and describe your latest research achievements and send it to us by our online submission system (http://www.manuscriptsystem.com). All manuscripts submitted will be considered for publication.
If this paper has been published, we also welcome you to submit other papers to us.
Welcome to visit our website at http://www.sapub.org.
Who are these guys? Their web site lists an impressive 133 journal titles. But most of the ones I clicked on are nonexistent - they have some Editorial Board members but no Editor in Chief or ISBN number, and haven't published any papers. Only one (The International Journal of Plant Research) had 'published' any papers, and these each had only Abstract and reference list- the body of the paper was apparently 'coming soon'. Perhaps this is to be expected, given that this journal too lacks an Editor in Chief. It may lack editors entirely - authors are instructed that they must format the html links for the references they cite, a function normally done by a journal's copy editors.
Their office is in California, so they're not a third-world effort. I couldn't find any information about publication charges at all, but I don't suppose they're just doing this for the glory.
Hmm, the International Journal of Genetic Engineering needs an Editor in Chief - that would look good on my CV. All I need to do is check the boxes on the handy application form they provide!
Here's the gel photo
These DNAs were all stored in the fridge (4 °C) in aqueous solution (10 mM Tris 1 mM EDTA pH 8.0) for two months before this gel was run. The DNAs in the 'ss' lanes were heated to 95°C for 10 min before loading to separate the strands and reveal the effects of any single-strand breaks.
These DNAs show no sign of degradation; compare to the original photo here. In particular, the DNA fragments from cells grown with limiting phosphate and 40 mM arsenate are actually slightly longer than the fragments from cells grown with limiting phosphate and no arsenate. (I don't think this difference is significant; the important point is that the fragments aren't any shorter.)
Because these large fragments typically migrate at the resolving limit of the gel, all I can say with confidence is that the fragments in all four preps are all significantly larger than 30 kb. This is the size range we expect for chromosomal DNA in a normal DNA prep. I don't have size standards for single-stranded DNA (I should have heated the lambda fragments but forgot to) so all I can say about the length distribution of single strands is that the four preps are all very similar.
This result tells is that DNAs from arsenate-grown cells are not undergoing degradation in storage due to slow hydrolysis of arsenate diester bonds in the DNA backbone, as suggested by an earlier anonymous commenter.
Generating final data for the #arseniclife paper
1. Cells for new DNA preps: For the replicate DNA preps (for the replicate LC-MS analysis), yesterday I inoculated GFAJ-1 cells into two 50 ml cultures in AML60 medium with 1500 µM PO4, with and without 40 mM AsO4, and into two 500 ml cultures on AML60 medium with 3 µM PO4, with and without 40 mM AsO4. Most of these cultures are growing nicely, so tomorrow I think I'll have enough cells for the DNA preps. Well, the 1500 µMp 40 mM As culture isn't growing at all, but I don't think we need to replicate this one anyway. I need to get at least 50 µg of DNA from each prep, to give the grad student enough for his CsCl gradients. Last time one of the cultures (3 µM PO4, no AsO4) wasn't dense enough to give me the DNA I needed, but so far it looks as dense (or not-dense) as the parallel culture with AsO4. I'll prep the DNA today and if I don't have enough I'll just set up more cultures. I'd be able to prep the DNA from them on Sunday, so still would have the DNAs ready to send on Monday.
2. Troll-suggested control: I've run the gel of the two-month-old DNAs from cells growth with and without arsenic, both native and denatured, and there's no difference in fragment length, with all double-stranded fragments being at least 30 kb in length. So there's no evidence of arsenic-bond strand breakage during long-term storage at 4 °C. I'll post a gel photo later (the image I saved isn't right).
3. Presentable growth curves: A lab in our research cluster has a BioScreen incubator/plate reader I can use to automate my growth curves. But the test cultures I set up in an ordinary microtiter plate aren't growing consistently, so I'll have to mess around a bit before I can do the growth curves.
2. Troll-suggested control: I've run the gel of the two-month-old DNAs from cells growth with and without arsenic, both native and denatured, and there's no difference in fragment length, with all double-stranded fragments being at least 30 kb in length. So there's no evidence of arsenic-bond strand breakage during long-term storage at 4 °C. I'll post a gel photo later (the image I saved isn't right).
3. Presentable growth curves: A lab in our research cluster has a BioScreen incubator/plate reader I can use to automate my growth curves. But the test cultures I set up in an ordinary microtiter plate aren't growing consistently, so I'll have to mess around a bit before I can do the growth curves.
Writing the #arseniclife paper
The grad student working on the mass-spectrometry analysis of GFAJ-1 DNA is still making sure his results meet his high standards, but as soon as they are ready he'll send them to me and I'll post them here. In the meantime, since he and his supervisors have concluded that the DNA contains no arsenic, we've started writing our paper. We're going to submit it to Science as a Brevia. These are very short peer-reviewed articles (one page, one figure), which we think suits this work very well.
But first we need to replicate our results. My plan is to generate some detailed growth curves for cultures with various levels of phosphate, with and without 40 mM arsenate. For this I'll use a BioScreen machine that belongs to a neighbouring lab. This machine automates collection of optical density data from cultures growing in wells of 100-well plates. I'll also grow big batches of cells for new DNA preps, using the same media and culture conditions as before.
This should only take a few days, and I hope to have the DNAs ready to send to my collaborators on Monday.
Two steps forward, one step back (the postdoc's uptake bias paper)
The postdoc's manuscript on uptake bias is inching towards completion. He's added most of the references and updated the figures, and we've only discovered one new analysis that needed to be done. But including this analysis at the right place in the Results makes writing the rest of the Results a lot more straightforward, so we're ahead of the game.
What is this analysis? Removing, from our dataset of 10^7 sequence reads of DNA fragments that the competent cells took up, some sequences that may have been interpreted incorrectly. The incorrect interpretation happens because the sequence responsible for their uptake isn't correctly aligned in our analysis. Here's a figure explaining the problem:
The top sequence is the consensus of the fragment we used. The lower-case bases at each end were not degenerate and function as controls. The first step in the analysis was to align each sequence read to this consensus at its left end, and below the consensus we see three correctly aligned reads, with their core uptake sequence indicated by the yellow arrows.
Below these are two reads that were misaligned because they contained either an insertion or a deletion of a single base. We think these insertions and deletions arose during synthesis of the pool of degenerate fragments. Although these fragments still contain good uptake sequences (red arrows), the incorrect alignment doesn't recognize this. Instead, the fragments appear to have been taken up despite having very poor agreement with the consensus.
Below these misaligned reads is a sequence that is correctly aligned but that contains a second match to the core consensus, indicated by the green arrow. This second match was created by several changes downstream of the consensus uptake sequences, but it isn't recognized by the analysis because it is out of alignment and, in this case, in the other orientation. The presence of two uptake sequences means that we can't attribute their uptake to the one sequence that's correctly aligned.
Sequences with these artefacts couldn't be removed from the dataset before the original analysis, because they couldn't be identified until we were able to score each fragment for matches to the 'uptake motif' that the initial analysis produced. Now that we've identified them, we can consider whether they would have confounded any of the analyses.
The main concern is the reads with insertions or deletions. Because the initial filtering required that the 10 control bases all be perfectly matched, most of these were removed, and the 10^7 recovered reads we analyzed only included about 1500 with insertions or deletions that misaligned the core. That's too few to have misled the initial analysis, but it is a concern for the analyses of possible contamination and sequencing errors, and for the analysis of interaction effects. The postdoc has now finished checking for effects on the interaction analysis (none) and still needs to check for contamination and error effects.
What is this analysis? Removing, from our dataset of 10^7 sequence reads of DNA fragments that the competent cells took up, some sequences that may have been interpreted incorrectly. The incorrect interpretation happens because the sequence responsible for their uptake isn't correctly aligned in our analysis. Here's a figure explaining the problem:
The top sequence is the consensus of the fragment we used. The lower-case bases at each end were not degenerate and function as controls. The first step in the analysis was to align each sequence read to this consensus at its left end, and below the consensus we see three correctly aligned reads, with their core uptake sequence indicated by the yellow arrows.
Below these are two reads that were misaligned because they contained either an insertion or a deletion of a single base. We think these insertions and deletions arose during synthesis of the pool of degenerate fragments. Although these fragments still contain good uptake sequences (red arrows), the incorrect alignment doesn't recognize this. Instead, the fragments appear to have been taken up despite having very poor agreement with the consensus.
Below these misaligned reads is a sequence that is correctly aligned but that contains a second match to the core consensus, indicated by the green arrow. This second match was created by several changes downstream of the consensus uptake sequences, but it isn't recognized by the analysis because it is out of alignment and, in this case, in the other orientation. The presence of two uptake sequences means that we can't attribute their uptake to the one sequence that's correctly aligned.
Sequences with these artefacts couldn't be removed from the dataset before the original analysis, because they couldn't be identified until we were able to score each fragment for matches to the 'uptake motif' that the initial analysis produced. Now that we've identified them, we can consider whether they would have confounded any of the analyses.
The main concern is the reads with insertions or deletions. Because the initial filtering required that the 10 control bases all be perfectly matched, most of these were removed, and the 10^7 recovered reads we analyzed only included about 1500 with insertions or deletions that misaligned the core. That's too few to have misled the initial analysis, but it is a concern for the analyses of possible contamination and sequencing errors, and for the analysis of interaction effects. The postdoc has now finished checking for effects on the interaction analysis (none) and still needs to check for contamination and error effects.
A troll raises a semi-valid concern
An anonymous troll posted this comment on a previous post:
It's indeed theoretically possible that the GFAJ-1 DNA from arsenic-grown cells originally contained As in its backbone, and that the As bonds were destabilized once the DNA was purified away from the proteins and other cellular components. If the chemists are correct, the As bonds would all hydrolyze within less than a second. The AsO4 liberated from the DNA would then be seen just as background in the CsCl gradient, or lost entirely in subsequent purification steps.
We would have to assume that the GFAJ-1 cells contain some powerful unknown-to-science DNA-binding proteins or other structures that stabilize the As bonds in the aqueous environment of the cell. These proteins would be removed during the initial DNA purification steps (SDS-lysis and phenol-extraction). Wolfe-Simon et al. did not report taking any precautions to prevent hydrolysis, so if their DNA prep really contained As bonds these must have been stable for at least the day it would have taken them to do the initial extractions, run the agarose gel, and cut out the DNA-containing gel slices. We don't need to count the time needed to send the gel slices to the LLNL lab at Livermore for Nano-SIMS analysis because the entire slices were analyzed (the DNA was not purified away from them).
Loss of As from the backbone would change the structure of the DNA. It would now be As-free, but would contain a single-strand break at each site that previously had an As. If a substantial fraction of the backbone was As, the DNA would be extensively degraded.
One way to check for hydrolysis of As bonds would be to run the DNAs in a gel, both intact and after separating the strands by boiling. Normal DNA is stable for months, so we could compare the fragment lengths of the DNAs from As-grown and P-grown cells, both immediately after purification and after extended storage in aqueous solution. Here's a diagram of what we'd expect to see in an agarose gel. The upper orange band in the lefthand lanes is the long fragments of double-stranded DNA present in my DNA preps immediately after purification (see gel photo here), and the broader bands on the right are how single-stranded DNA would appear.
Later: Comment send by email:
im curious about how much time the DNA you prepared has spent in solution since you prepped it — supposing there were C-Ar bonds in the DNA at the time of lysis, has so much time passed now that, given the hydrolysis rates that were experimentally determined and recently reported in this JACS communication (offensive link deleted), I'm worried that theres no hope of seeing positive signal even if there was one to begin with. Any thoughts about this issue? Perhaps keeping the DNA stored as a dry pellet and avoiding any encounters with water until the last minute would be the way to go.First, a few corrections: Arsenic is As, not Ar. The DNA backbone bonds are diester bonds, so the As is bound to oxygen (O), not carbon (C). It's of course not possible to purify DNA while avoiding any encounters with water.
It's indeed theoretically possible that the GFAJ-1 DNA from arsenic-grown cells originally contained As in its backbone, and that the As bonds were destabilized once the DNA was purified away from the proteins and other cellular components. If the chemists are correct, the As bonds would all hydrolyze within less than a second. The AsO4 liberated from the DNA would then be seen just as background in the CsCl gradient, or lost entirely in subsequent purification steps.
We would have to assume that the GFAJ-1 cells contain some powerful unknown-to-science DNA-binding proteins or other structures that stabilize the As bonds in the aqueous environment of the cell. These proteins would be removed during the initial DNA purification steps (SDS-lysis and phenol-extraction). Wolfe-Simon et al. did not report taking any precautions to prevent hydrolysis, so if their DNA prep really contained As bonds these must have been stable for at least the day it would have taken them to do the initial extractions, run the agarose gel, and cut out the DNA-containing gel slices. We don't need to count the time needed to send the gel slices to the LLNL lab at Livermore for Nano-SIMS analysis because the entire slices were analyzed (the DNA was not purified away from them).
Loss of As from the backbone would change the structure of the DNA. It would now be As-free, but would contain a single-strand break at each site that previously had an As. If a substantial fraction of the backbone was As, the DNA would be extensively degraded.
One way to check for hydrolysis of As bonds would be to run the DNAs in a gel, both intact and after separating the strands by boiling. Normal DNA is stable for months, so we could compare the fragment lengths of the DNAs from As-grown and P-grown cells, both immediately after purification and after extended storage in aqueous solution. Here's a diagram of what we'd expect to see in an agarose gel. The upper orange band in the lefthand lanes is the long fragments of double-stranded DNA present in my DNA preps immediately after purification (see gel photo here), and the broader bands on the right are how single-stranded DNA would appear.
Later: Comment send by email:
Since the cell’s interior is aqueous, then it would seem reasonable that there has to be something preventing the spontaneous hydrolysis in vivo. As you point out, some form of stabilizing protein would seem the most likely candidate. If this protein is going to work, then it could (for the purposes of this argument) remain attached throughout the purification process. If so, then you might find As in the purified DNA, and should also find traces of protein. I don’t know if this would be enough to alter the 260:280 ratio, but DNAse digestion followed by SDS-PAGE might detect something. Definitely a hotdog experiment, but once the DNA is purified, not that difficult.SDS-phenol extractions are pretty harsh; for a protein to remain with the DNA it would need to have been covalently bound, which would certainly have complicated such cellular processes as DNA replication and transcription. And if the DNA contained a significant amount of As, the DNA would then contain enough protein that it wouldn't migrate properly in the gel but would stick as a blob of gunk in the well at the top of the gel. It also would band differently in a CsCl gradient.
How dumb do (a couple of) our students think we are?
Email I just received from a couple of undergraduates:
(*1): Maybe it's "science"?
(*2): Ooh, why didn't anyone tell me that's what I'm valued for?
(*3) Somehow these students neglected to complete their evaluations of my teaching.
(*4) Oh yes I do!
HI Dr.Rosie Redfield,
Here's a chance to represent UBC's Zoology Science Department.
The annual Science Week Events Committee , organized by the Science Undergraduate Society, would like to invite you to join our event on Thursday, January 26th, 2012 at 12:30-1:45.
As you may know, Science Week, is a week-long (form January 23rd - January 27th), multi-events celebration which allows students to show off their UBC pride while rewarding them for their first term achievements. So far we have jell-O wrestling, jeopardy, and a scavenger hunt. Although, we realized something was missing...(*1)
This year, we have added a new event to our venue, called the "Professor Pageant". UBC Professors, like yourself, will be participating in the pageant and showcasing their popularity, attitude and talents (*2). Only one competitor will come out on top, however, each participant will go home with a special customized award.
We would like to invite you to join us because you are deemed awesome (yes, awesome) by an overwhelming consensus among students (*3). This is an opportunity that you do not want to miss! (*4)
We would love to hear from you and provide more details. Also, if you would like to suggest any of your colleagues (open to all faculties), please provide their name so we could can invite them accordingly!
Sincerely
(Names withheld to protect the naive.)
(*1): Maybe it's "science"?
(*2): Ooh, why didn't anyone tell me that's what I'm valued for?
(*3) Somehow these students neglected to complete their evaluations of my teaching.
(*4) Oh yes I do!
More power of CsCl gradients
The grad student pointed out to me by email that I'd overlooked one big advantage of using CsCl gradients to clean up the DNA. He's not analyzing only the fractions that contain DNA, but all the fractions from the gradient. This allows him to detect where in the gradient any arsenic is, and thus lets him distinguish whether the arsenic is bound to the DNA or independent of it. So even a moderate level of arsenic contamination wouldn't be a problem.
Getting ready for some arsenic data
Any day now I hope to receive some preliminary results from the mass spectrometry test for arsenic in GFAJ-1 DNA. In preparation I though I should at least attempt to understand the control data that the grad student doing the work sent me a couple of weeks ago. But I got sidetracked by the easier task of understanding some control CsCl-gradient data he also sent. This is a pre-analysis step, used to further purify the DNA before the analysis.
What he did: He ran control DNA (from cells grown with lots of phosphate and no arsenate) in two CsCl gradients and collected fractions (~ 100 µl fractions from gradients with total volumes of 1 or 2 ml). He then measured the volume and DNA concentration of each fraction. This showed a nice DNA peak in each gradient (green is the 1 ml gradient, red is the 2 ml gradient). He then pooled the high-DNA fractions of each gradient, desalted them to remove the CsCl, and digested the DNAs in preparation for mass spectrometry (LC-MS).
In both gradients, contaminants that weren't soluble in the CsCl solution will probably have either pelleted to the bottom of the tube or risen to the top of the tube, depending on their density relative to the CsCl. (As I recall from my undergrad experiences, RNA pellets but proteins rise.) Whether these would have recontaminated the fractions as they were collected depends on how the collection was done. Fractions collected as drops from the bottom of the tube would have avoided contaminants that had risen to the top but would have encountered any pelleted contaminants on their way out.
Do we need to also consider contaminants that might have banded at a specific density in the gradient? The centrifugation is powerful enough to cause the heavy Cs+ ions to move down in the tube, might it also affect the distribution of other ions? What does Wikipedia say? (Ah, the correct term is 'isopycnic centrifugation'.) Nothing about other ions. CsCl gradients have typically been used to separate DNAs with different base compositions from each other (e.g. nuclear DNA from mitochondrial or plastid DNA); I don't know if anyone ever used them to separate DNA from soluble contaminants.
Bottom line: If the LC-MS data shows arsenic in the DNA, we can polish up these DNA purification steps. If it doesn't, we won't need to bother.
What he did: He ran control DNA (from cells grown with lots of phosphate and no arsenate) in two CsCl gradients and collected fractions (~ 100 µl fractions from gradients with total volumes of 1 or 2 ml). He then measured the volume and DNA concentration of each fraction. This showed a nice DNA peak in each gradient (green is the 1 ml gradient, red is the 2 ml gradient). He then pooled the high-DNA fractions of each gradient, desalted them to remove the CsCl, and digested the DNAs in preparation for mass spectrometry (LC-MS).
What I've done: Arithmetic to calculate how much purification these gradients would have accomplished.
The green data: 6778 ng of DNA (89% of the total DNA recovered) is in four fractions with a total volume of 300 µl (37% of the volume recovered). This means that the concentrations of soluble contaminants not bound to the DNA will have been reduced to about 40% of what they were.
The red data: 5135.3 ng of DNA (68% of the total DNA recovered) is in two fractions with a total volume of 310 µl (17% of the total volume recovered). This means that the concentration of soluble contaminants will have been reduced to about 25%.
Hmm, that's not very efficient purification. Larger gradient volumes and longer spins might help. And of course the desalting step should have removed much more of the soluble contaminants.
But this arithmetic may not matter much. The real advantage of the CsCl step is not that it's removing soluble contaminants. Instead, it's fractionating on completely independent principles than the other steps we use, and so it is expected to reduce or remove contaminants that the other methods might not remove. It should remove contaminants that might have coprecipitated with the DNA when it was spooled out of 70% ethanol, and ones that might elute with the DNA in the desalting column because they're insoluble and soluble under the same combinations of conditions as DNA (we typically have to treat these conditions as 'secret sauce' because the manufacturers of the desalting columns don't like to reveal how they work).
Do we need to also consider contaminants that might have banded at a specific density in the gradient? The centrifugation is powerful enough to cause the heavy Cs+ ions to move down in the tube, might it also affect the distribution of other ions? What does Wikipedia say? (Ah, the correct term is 'isopycnic centrifugation'.) Nothing about other ions. CsCl gradients have typically been used to separate DNAs with different base compositions from each other (e.g. nuclear DNA from mitochondrial or plastid DNA); I don't know if anyone ever used them to separate DNA from soluble contaminants.
Bottom line: If the LC-MS data shows arsenic in the DNA, we can polish up these DNA purification steps. If it doesn't, we won't need to bother.
Must stop analyzing data....
The postdoc is back and we are driving each other nuts with ideas for more analyses (both of us), more analyses (him), and requests to stop analyzing the bloody data and finish writing the damned paper (me). Just now I found myself thinking "If only we had less data...".
One simple control analysis we really did need was using his USS-scoring matrices to score some simulated genomes (random-sequence strings of the same length and base composition as the H. influenzae genome). These are controls for the analysis I wrote about here. He's done these now, and they nicely show that both scoring motifs see the bulk of the genome as no different from random sequence, and that the ~200 high-scoring positions they both find are not found in random-sequence 'genomes'.
One simple control analysis we really did need was using his USS-scoring matrices to score some simulated genomes (random-sequence strings of the same length and base composition as the H. influenzae genome). These are controls for the analysis I wrote about here. He's done these now, and they nicely show that both scoring motifs see the bulk of the genome as no different from random sequence, and that the ~200 high-scoring positions they both find are not found in random-sequence 'genomes'.
Subscribe to:
Posts (Atom)