Field of Science

More on the NSERC proposal

Title:  I want something catchy but not frivolous.  'Do bacteria have sex?' has to be immediately followed by one particular definition of sex, as any process evolved (and maintained) by selection for the random genetic recombination it causes.  But 'Do bacteria have sex for sex?' might get the point across even in the absence of accompanying definitions of 'sex' and 'sex'.  It even communicates the point that there are two evolutionary definitions of sex that could apply to bacteria.

Rationale:  I explained this in a post a few days ago:
Natural competence, the ability to take up DNA fragments from the environment, is the only bacterial 'parasexual' process for which reasonable doubt remains about its non-recombinational function.  This doubt arises from the apparent self-specificity of DNA uptake by bacteria in two groups, the genus Neisseria and the family Pasteurellaceae.  Bacteria in these groups preferentially take up DNAs containing short sequence motifs that are ~100-fold more abundant in their own genomes than expected by chance.  This match between the bias of the DNA uptake machinery and the genomic abundance of a DNA motif has been interpreted as an adaptation that enhances the presumed recombinational (sexual) benefits of DNA uptake, by allowing mate-choice or excluding possibly harmful foreign DNAs.  However a simpler non-sexual explanation exists - that the preferred sequences play purely mechanistic roles in DNA uptake, and that the motifs' abundances in the respective genomes are due to a passive accumulation caused by biased uptake and subsequent unselected recombination.
Aims:  So the main aim of this proposal would be to test the DNA uptake systems of other naturally competent bacteria for sequence biases.

Outcomes:  Here are the three possibilities:

1.  Finding such biases in one or more bacteria whose genomes are not enriched for the favoured motif would be strong evidence that the bias exists for mechanistic reasons, not because uptake of self DNA enhances recombination.  In principle we only need to find one such case.

2.  Finding that DNA uptake has absolutely no sequence biases in one or more of the tested bacteria would greatly weaken our hypothesis.  We will have argued that DNA uptake is expected to  have sequence biases because the high forces that have been measured on DNA require very strong contacts between DNA and the proteins of the uptake machinery.  Such strong contacts always show some sequence-dependence.  Finding that uptake is completely unbiased in any species would mean that this argument is invalid.

3.  But what if we find that uptake is always biased and the corresponding genomes are always enriched for the preferred motifs?  This wouldn't disprove our hypothesis that the bias exists for mechanistic reasons - in fact it's exactly what our model predicts, given that the bacteria we're testing are known to recombine some of the DNA they take up in the lab.  But I need to think through the implications more carefully.  In the previous post I wrote:
If we do find overrepresentation, we can (1) decide whether this overrepresentation would create any significant degree of preferential uptake of self-DNA, and (2) use our simulation model (or refinements of it) to evaluate how much recombination must be going on to give this overrepresentation.
The molecular drive that happens in the simulation model is the null hypothesis for uptake sequences; it shows how sequences preferred by the uptake machinery accumulate in the genome, without invoking any benefit from recombination.  If uptake has a sequence bias, the DNA taken up is sometimes homologous to the cell's genome, and this DNA sometimes recombines with the chromosome, we expect the preferred sequences to slowly accumulate in the chromosome.  If recombination of chromosomal alleles has a net genetic benefit, selection for the self-uptake promoted by uptake sequences will act in addition to drive (not instead of it).  If recombination instead has a net cost, selection against it may oppose drive and reduce accumulation of the sequences preferred by the uptake machinery.

So, if we find accumulation of a preferred motif in bacteria previously thought to have unbiased uptake, we'll know that, in nature as well as in the lab, that the cells must sometimes take up DNA that replaces chromosomal sequences by homologous recombination.  So we'll have generalized the bias-plus-abundance phenomenon already characterized in the Neisserias and Pasteurellaceae.  This would certainly be seen as an important (publishable) result.

But this wouldn't mean that the recombination this promotes must be beneficial, because drive can explain this accumulation.  Would there be any way to tell whether the accumulation has been significantly affected by selection?  Are thee tests we could do in other bacteria that we can't do in H. influenzae?  Or that we should do in H. influenzae?  (Of course we can never rule out that the benefits of recombination are too small to detect...)

If we started by assuming there's no selection for or against the genetic consequences of recombination, we might think we could use the degree of enrichment to infer, using our model, how much recombination must be happening.  If (letting go of our assumption) the enrichment was really partly due to selection for recombination, then the actual level of recombination in the population would have to be less than the model predicts.  (Yikes, this logic is getting weird.)  But I don't think our model is nearly that realistic, and we couldn't use it to make predictions anyway because we don't have enough real information about the properties of any real population.

Enough for now...

NSERC Proposal

Yikes, even though our proposal to NSERC (I think that stands for the Natural Sciences and Engineering Research Council) wouldn't be due until November 1, I just discovered that we need to get a 'Notification of Intent' in to them by next Wednesday (August 1).   It hadn't occurred to me that NSERC had an advance registration requirement, and, if it did, I would have expected registration to be due a month before the grant submission deadline, as is the case for CIHR.

Luckily the people who send UBC faculty 'Grant Opportunity Updates' thought to include a reminder about this - otherwise I'd  have missed the registration deadline for sure.  If they're like CIHR, there's no recourse for missing the deadline, but it's worse for NSERC because they have only one submission deadline each year.

And of course NSERC doesn't seem to use the 'Common CV' system that most other agencies do, so I have to input all my information again...

And, because NSERC does send proposals out to external reviewers, unlike CIHR, I need to come up with a list of 5 possible reviewers.  That will take some thinking.  While I'm at it, I'll also ask for pointers from friends who regularly apply to NSERC.

Are proteins wet and gooey?

To me this protein image looks wet and gooey.  Initially I found that disconcerting, because I'm used to the dry solid appearance of the more standard visualizations.  But now I think that 'wet and gooey' is actually a realistic description of the surface properties of any protein.

And a possible NSERC proposal

If we decide to follow the CIHR proposal outline I've made in the previous post, we'll probably do a separate proposal to NSERC on the role of uptake sequences in the mechanism of DNA uptake.  I was just discussing this with the post-doc and he had a great idea about also using the analysis to make inferences about the frequencies of chromosomal recombination in naturally competent species.  So I'm going to write it out here quickly, before I forget it.

Motivation:  This work addresses a very important question with big/deep/fundamental importance to the colossal problem of the origin of meiotic sex in eukaryotes.  The question is 'Do bacteria have any processes that evolved because of selection for recombination of chromosomal alleles?'  We think this selection is the reason for the success of meiotic sexual reproduction in eukaryotes, but compelling evidence for this has been elusive.  Bacteria have four well-studied processes that do generate homologous recombination; three that transfer DNA between cells and one that carries out homologous recombination.  But almost every aspect of these processes has been shown to cause recombination as an unselected side effect of processes selected for other functions.  

Natural competence, the ability to take up DNA fragments from the environment, is the only one of the four for which reasonable doubt remains about its non-recombinational function.  The strongest selection for DNA uptake is generated by its nutritional consequences; DNA is an excellent and economical source of preformed nucleotides and of phosphate, and the nutrient function of DNA uptake is supported by its regulation by nutritional signals in many bacteria.  This doubt arises from the apparent self-specificity of DNA uptake by bacteria in two groups, the genus Neisseria and the family Pasteurellaceae.  Bacteria in these groups preferentially take up DNAs containing short sequence motifs that are ~100-fold more abundant in their own genomes than expected by chance.  This match between the bias of the DNA uptake machinery and the genomic abundance of a DNA motif has been interpreted as an adaptation that enhances the presumed recombinational (sexual) benefits of DNA uptake, by allowing mate-choice or excluding possibly harmful foreign DNAs.  However a simpler non-sexual explanation exists - that the preferred sequences play purely mechanistic roles in DNA uptake, and that the motifs' abundances in the respective genomes are due to a passive accumulation caused by biased uptake and subsequent unselected recombination.  (Something here about why mechanistic sequence biases are plausible/expected.)

Only a subset of competent species exhibit this uptake specificity (strongly biased uptake and strongly enriched genome).  The others are considered to have no uptake bias at all (and no genomic uptake sequences), largely because they show no preference for DNA fragments form their own genomes over those from other genomes.  Thus the most powerful test of this hypothesis is testing these species for cryptic uptake biases.  Finding of even minor biases in their uptake machinery would confirm that biased uptake need not result from selection for mate-choice.  We have developed a simple and very powerful method to test for such biases, using Illumina sequencing of DNA fragments taken up from pools of highly degenerate DNA fragments.

As a bonus, this analysis will allow estimation, for a number of bacterial species, of the frequencies of chromosomal recombination between close relatives.  We will be working with species whose genomes have been sequenced.  Once any uptake biases have been identified, the corresponding genomes can be analyzed to detect any enrichment of the preferred motifs.  (Unless the motifs are as complex as the known Neisserial and Pasteurellacean ones)  For motifs that are either short or not very strong, finding enrichment will not be evidence for a sexual function, since the same motifs will occur frequently in foreign DNAs.  However the enrichment will allow estimation of the frequency of recombination in the species, because we have developed a computer-simulation model of the accumulation process.


A.  Test naturally competent bacteria for biases in their DNA uptake mechanisms.  We've done H. influenzae, which has well characterized uptake bias.  We'll test Acinetobacter bayleyi and Thermus thermophilus, both Gram-negative bacteria thought to have no uptake bias and no genomic enrichment.    I've already arranged collaborations with a researcher; it will probably be simplest to travel to their lab to do the uptake experiments.  We'll use fully degenerate Illumina-ready DNA fragments, which we have already designed and ordered.  The analysis methods for the Illumina (or Miseq) output have already been developed for H. influenzae; they may need modification for the fully degenerate sequences.  We could also test a Vibrio (maybe not cholerae!) and maybe Pseudomonas stutzeri.  We could also test bacteria that have some uptake bias, with (A. pleuropneumoniae and G. anatis) or without (Campylobacter) genomic uptake motifs, but this wouldn't really test our hypothesis.

Could we also test Gram-positive bacteria for uptake bias (Streptococcus or Bacillus)?  This is the equivalent of testing Gram-negative bacteria for translocation bias, and will be substantially more difficult because as the DNA enters the cytoplasm it becomes single-stranded and partly degraded (at one end).  But we think we can develop ways to fish out the single strands from the cytoplasm.

What if we find no evidence of bias?  I'd be surprised.  This wouldn't disprove our hypothesis, but it would weaken it, because our hypothesis is built on the expectation that all DNA binding proteins have some sequence specificity, and that tasks requiring application of force to DNA will benefit from the tighter binding created by sequence-specific interactions. 

B.  Examine the corresponding genome sequences for overrepresentation of the preferred motifs.  If we find no overrepresentation, we will have established that uptake biases need not result from selection for preferential uptake of self-DNA.  If we do find overrepresentation, we can (1) decide whether this overrepresentation would create any significant degree of preferential uptake of self-DNA, and (2) use our simulation model (or refinements of it) to evaluate how much recombination must be going on to give this overrepresentation.

Back to the big picture:  If we find uptake bias without genomic overrepresentation, or without self-specificity, we'll have swept away the last feeble pillar still propping up the claim that bacteria have processes that evolved for genetic exchange.

A new CIHR proposal?

The more I think about the 'significance' part of our latest CIHR proposal, the weaker I think it was.  So here I'm going to lay out the bones of a different proposal, one that builds very well on our current strengths but allows much stronger claims of significance.

Possible title:  Predicting the outcomes of genetic exchange in polyclonal infections

The problem:  Genetic exchange between closely related pathogens can increase virulence and is responsible for many failed attempts at control (spread of antibiotic resistance, escape from immune surveillance and vaccine immunity).  At present we have no ways to anticipate or prevent this, largely because we are ignorant of the constraints and biases of the underlying steps.  Previous (sparse?) attempts to understand genetic exchange have been based on inferences from (i) laboratory studies of the mechanism and regulation of DNA-transfer processes and recombination and (ii) detection of past recombination events in natural populations.  The former bear little relation to events in real populations, and the latter are confounded by later time, genetic drift and natural selection.

Hypothesis: Identifying the constraints on transformational genetic exchange will allow the outcomes of natural recombination events to be predicted.

Significance: The ability to predict the most likely genetic exchange events will help researchers prepare for new variant strains of bacteria.  Identifying the causes of the constraints may also permit interventions that block genetic exchange in polyclonal infections such as the cystic fibrosis lung.


A. Identify the DNA sequence effects that constrain DNA uptake.

  1. Clarify the sequence biases of DNA uptake by H. influenzae.
  2. Identify the proteins that interact with the preferred sequences.
  3. Identify sequence biases of DNA uptake by other bacteria, especially those not known to exhibit bias.
  4. Characterize the effects of these biases in natural and simulated communities.

B. Identify the constraints on homologous recombination between H. influenzae strains.

  1. Identify the effects of DNA sequences and sequence heterologies on the extents and endpoints of recombination tracts.
  2. Identify the effects of DNA sequences and sequence heterologies on recombination frequencies across the H. influenzae genome

C. Use the results of the above studies to develop a probabilistic model of recombination, and test these predictions using datasets of recombination events from natural and simulated bacterial communities.

  1. A dataset of past recombination events inferred from genome sequence data of H. influenzae strains.
  2. A metagenomic dataset derived from a short-term evolution of a polyclonal H. influenzaelaboratory culture. 
  3. A metagenomic dataset from cystic fibrosis sputum.

The figure shows the structure of H. influenzae recombination tracts (data from Mell et al. Transformation of natural genetic variation into Haemophilus influenzae. PLoS Pathog 7(7): e1002151. doi:10.1371/journal.ppat.1002151)

On another topic, I've been getting emails about a new paper in the Journal of Biological Chemistry, which claims to explain the growth of GFAJ-1 by use of phosphate from degraded ribosomes.  I don't think this idea is sound, but I'm going to leave it for Carmen Drahl and Ed Yong to deal with.

Grant proposal plans (yes, again...)

But this time it's not just plans for a proposal to CIHR (the Canadian equivalent of NIH).  I'm certainly also going to apply to NSERC (the Canadian equivalent of NSF) and maybe to other agencies too (Cystic fibrosis?  NIH?).

NSERC grants are much smaller than CIHR grants, but they have one big advantage - that they're not restricted to health research.  For me this is important, as it will let me write a proposal that spells out the biggest reason I think my research is important, which has no health implications.

I'm having several colleagues read the previous (unsuccessful) proposal, asking specifically for big-picture feedback on whatever fundamental problems it might have.  I've had feedback from two, and they both say that the arguments for significance are weak.  Unfortunately the significance that drives me is pure research, not health-related, and I downplay this in the proposal because I know from experience that the reviewers won't appreciate it.  I clearly need to clarify and better-communicate the health-related reasons our work is important.

So I'm going to start here by simply tabulating all the significances I can come up with, health-related and not.
  1. DNA uptake leads to transfer of antibiotic resistance genes into previously sensitive strains.
  2. DNA uptake leads to changes in the cell surface that create strains insensitive to current vaccines.
  3. If uptake specificity exists for mechanistic reasons (not mate-choice) then bacteria don't have any analog of eukaryotic sex (no mechanisms evolved because of selection for creating new combinations of genes or alleles).
  4. Bacteria may use binding to extracellular DNA as a way to adhere to mucosal surfaces and biofilms.
  5. In the human body DNA is a valuable nutrient source that promotes bacterial persistence and virulence.
  6. DNA uptake is a novel kind of transport problem with special challenges (very long stiff charged molecules) - understanding how it's done will shed light on other membrane-transport problems.
  7. The H. influenzae T4P DNA uptake system (and that of other Pasteurellaceae) is unique in lacking any obvious retraction motor protein. 
  8. Type IV pili are important virulence determinants, especially for adhesion (but H. influenzae doesn't appear to normally have pili).
  9. H. influenzae is a major human pathogen in all vulnerable populations.
  10. H. influenzae is the model system for studies of competence in the gamma-proteobacteria.
  11. Understanding DNA uptake will shed light on the physical properties of DNA.
  12. Understanding DNA uptake specificity will shed light on the physical properties of DNA.
  13. Antibiotics (some) induce competence in some/many species
These are all fine significances, but they have problems.  Either they're not a high priority for the review committee (e.g. the Microbiology and Infectious Diseases committee really doesn't care about evolutionary issues, transport problems, or the physical structure of DNA), or it's not easy to use them to make a strong case for the experiments we've been proposing (e.g. we're not addressing the consequences of DNA uptake).

#arseniclife wrapup

I think that the #arseniclife saga is finally nearing its end.

Our refutation paper was published on Science Express on Sunday July 8; Science lifted its embargo early, to coincide with my Evolpalooza talk.  Another refutation paper by Erb et al.) was released at the same time - we didn't know about this work but it nicely supports and complements ours.  At the same time Science released a rather platitudinous Editorial Statement (available here).  Wolfe-Simon continues to deny that any errors were made and states that results of her more recent work support the original claims (evasive email correspondence here).

Both papers will appear in print in the July 26 issue of Science - I don't know whether there will be any accompanying commentary.

So what should we learn from the whole mess?  The 'Cascade of FAIL' figure above is a summary slide from my Evolpalooza talk.  Although I think everyone involved failed, I'm happy to attribute this to a cascade of human error rather that malfeasance or misconduct.  

Science's Editorial Statement smugly points out that the scientific process is self-correcting, but fails to acknowledge the harm done but the original error, and the cost to many of the correction process.  Unfortunately, nobody involved seems willing to apologize for the trouble they have caused.

Should the original paper be retracted?  David Sanders argues for this at Retraction Watch, but I think not.  The authors are unwilling to admit any errors, and I don't think journals should have the right to remove papers just because the authors made mistakes and their conclusions turn out to be unjustified.  That's especially true now, 18 months after the paper appeared, when the literature contains a number of new papers that respond to and refute these claims.

What about the long-term fallout for the public understanding of science?  It's not as bad as I had feared.  Most of the hits from a Google search for 'arsenic DNA' (below) are to pages discussing the new refutation results or the controversy; only one is to the original report.

NASA's cowardly responses to their #arseniclife FAIL

I've now seen two responses from NASA about the new publications that refute the Wolfe-Simon results.

The first was sent to Margaret Munro of Postmedia News, by James Schalkwyk of NASA:
We asked the director of the NASA Astrobiology Institute, which had provided funding for the GFAJ-1 research, and he said we're deferring to the actual researchers for their statements on their research. You can contact Felisa Wolfe-Simon, (email address redacted) for their official statement (I assume you already know this though).

I'm sorry this isn't more helpful but it appears we haven't been involved in the research for some time. Good luck!
The second was just posted by Dan Vergano of USA Today on their ScienceFair blog.  It comes from Michael New, astrobiology discipline scientist in NASA's Planetary Science Division at NASA Headquarters:
NASA supports robust and continuous peer review of any scientific finding, especially discoveries with wide-ranging implications. It was expected that the 2010 Wolfe-Simon et al. Science paper would not be exempt from such standard scientific practices, and in fact, was anticipated to generate significant scientific attention given the surprising results in that paper. The two new papers published in Science on the micro-organism GFAJ-1 exemplify this process and provide important new insights. Though these new papers challenge some of the conclusions of the original paper, neither paper invalidates the 2010 observations of a remarkable micro-organism that can survive in a highly phosphate-poor and arsenic-rich environment toxic to many other micro-organisms. What has emerged from these three papers is an as yet incomplete picture of GFAJ-1 that clearly calls for additional research.
I'm at a loss for words.

Here's a new predatory publishing trick!

I just received this email:

So I compile and edit a book of articles on evolutionary biology and genomics.  Research Signposts publishes it as a real hard-bound book, and I get 30% of the author publication charges.  Sounds great...

Wait, 'author publication charges'?  Do the authors PAY to be published in this book?  So I clicked on the 'publication plans' link.

They have four different publication plan options.

  1. In the first plan, the author(s) of each article in my book must buy 100 reprints of their article, for a minimum of $435, depending on the length of the article.  This also (I think) gets them three copies of the book.
  2. In the second plan, I pay a minimum of $1950 to have the book published, more for colour figures and pages >150.  I get five copies and each article's author(s) get one.
  3. In the third plan I purchase 25 copies of my book.  It's not clear how much these will cost, but the charges sheet mentions $899 for three copies plus a pdf.
  4. The fourth plan doesn't get me a hardbound book at all.  Instead I pay $995 (or more) for Research Signposts to make my book available online.

Thanks but no thanks.

Consider the publication embargo...

I'm having email conversations about my upcoming #arseniclife public-outreach talk with (i) editors at Science and (ii) publicists for the Evolpalooza congress.  

Our paper refuting the original claims won't appear in Science until late July, and Science's embargo policy asks authors to refrain from contacting the press.  Reporters on Science's list will be sent copies of the paper a week before it is published, on the understanding that they will not report on this information until it appears in Science.

From the info sheet Science sends to authors:
The embargo policy ensures that no single reporter or news organization gains an advantage over others and that reporters have an equal amount of time to write full and accurate stories. Your cooperation with this policy helps us gain excellent coverage for your research and protects you from problems that may jeopardize your paper’s publication.
(Ooh!  "...problems that may jeopardize your paper's publication"!)

Science asks authors to not initiate contact with the press about their publication, and to only talk to members of the press who have agreed to respect the embargo.  Authors are free to present their data at conferences, but are asked to inform Science of this in advance. 

All this seems a bit silly when applied to research results that have already been widely publicized, with the manuscript publicly available on the arXiv server (it's also on PubMed Central but not released yet).  So I emailed Science for clarification. The response asked for what seem to be slightly tighter restrictions (to not mention that the paper is in press at Science, to not talk to the press after my presentation).  These seem inappropriate, since this is a public-outreach talk and since a major focus of my presentation will be on how science is communicated.  

My plan is describe the refutation results and the Science paper as a minor part of my talk, and to meet with whatever press the organizers arrange, either before or after my talk.  I'll  make sure the journalists are aware of Science's embargo, but I'll happily talk with them about any aspect of the #arseniclife debacle.  I'll probably also mention the issues surrounding the embargo in my talk. 

If you'd like to learn more about embargos in science journalism, I recommend Ivan Oransky's excellent Embargo Watch blog.

Note added Monday July 9:  Science unexpectedly decided to lift the embargo prematurely, so our paper (and another) appeared on Science Express at 8 pm Sunday, right in the middle of my Evolpalooza talk.  Both papers will appear in print in the July 13 26 issue of Science.