RRResearch: January 2010

Eureka!

Now I see how to give the Innovation section a narrative, and accomplish other things too. The key was to see it as a way to reinforce the rest of the proposal.

The proposal begins with a Specific Aims page, which provides a summary of what I propose to do and why. This is followed by several pages of Significance, which provide a more detailed explanation of what the problems are that this work will help address. After the Innovation section comes the Approach, where I spell out each Specific Aim in detail, emphasizing how it will be accomplished.

In an old-style proposal, the Specific Aims page would be called the Summary, the Significance section would be Background, the Approach would be Methods, and the place now occupied by Innovation would be where the Specific Aims were listed, connecting the problems raised in the Background with the solutions described in the Methods.

There's no reason that the Innovation section can't also accomplish what a traditional Specific Aims section accomplished. So mine will begin with "I am proposing three Specific Aims, each innovative in both concept and strategy." Then I will have a paragraph for each Aim in turn, explaining how our approach differs from previous approaches and why it is the best solution to its problem. In doing this I will also be giving the reader an overview of the Aims both in the context of the broad Significance they've just read and in the context of the detailed Approach sections they're about to read.

Are we innovative yet?

NIH wants its applicants to use somewhere between half a page and a full page of the 12-page proposal to explain how their proposals are innovative. Here's NIH's instructions:

Explain how the application challenges and seeks to shift current research or clinical practice paradigms.
Describe any novel theoretical concepts, approaches, methodologies, instrumentation, or intervention(s) to be developed or used and any advantage gained.
Explain any refinements, improvements, or new applications.

Other advice, from Jeffrey Benovic and Bruce Freeman):

Significance is why the work is important to do.
Innovation is why the work is different from (better than) what has been done before.
Definition of innovation: a new device or process resulting from study & experimentation; the act of introducing something new.
How will research in your field change as a result of your work?
Demonstrate the potential gains are not merely incremental.
Explain why concepts & methods are novel to one field or novel in a broad sense (or both).
Summarize (sans detailed data) novel findings to be presented as preliminary results in Approach
Focus on innovation in study design & outcomes

Morgan Giddings also emphasized describing specific ways the field will be different if our work is funded and successful.

The field is recombination (in its broad sense, everything from the molecular mechanisms to the evolutionary consequences), and we're going to fully characterize recombination between related strains. That is, we're going to find out (i) the properties of the recombination tracts and (ii) the probabilities of recombination for ~all differences between the strains. This will be done using deep sequencing of single and pooled recombinant genmes, and will give an extremely comprehensive picture of recombination across the ~40,000 snps and 300 indels and rearrangements that distinguish these two strains. Then we're going to use recombination to map genes responsible for the very low transformation frequency of one of the strains.

I've been making lists of things that are innovative about what we propose, but what's lacking is a coherent narrative that ties them together. So I'm going to just start listing them here and see if a narrative comes together...

This will be the first time that all of the recombination of a single recombinant genome (from a single transformation event) has been identified for any organism. And we're going to do it many times (50? 100?). Maybe include actual numbers - how many recombination tracts do we expect to characterize? how many breakpoints? Relate to how many actual recombination tracts have already been characterized (I'd have to dig into the literature...)?
By using deep sequencing to measure the frequency of recombination at tens of thousands of SNPs and indels we will characterize the full spectrum of sequence factors affecting recombination. The scale will be unprecedented. How many SNP-conversion events do we expect to detect? How many indel-conversion events?
We are breaking down the previously necessary tradeoff between high resolution and broad scope. Deep sequencing of tine genomes lets us have it all!
We will develop new analytical tools, to analyze recombinant genomes.

Paragraph about Aim III:

Nobody else is focusing on the causes of the poor transformability of many strains of 'transformable' species.
Will use genome sequencing of recombinants with altered transformability to identify recombination tracts carrying the responsible alleles. This may seem wasteful but is very efficient; one lane of sequencing (even without multiplexing) is likely to define a stretch of no more than 1 kb. (If the difference is a snp and not an indel - we should have done the phenotyping! This is a pitfall we need to write about.)
This will (we think) be the first time that the QTL mapping methods used for eukaryote genomes are applied to bacteria. (Is this right? We need to clarify the relationship between QTL mapping and sequencing. Is anyone sequencing recombinant genomes of yeast?) They were developed out of necessity for the very large eukaryote genomes, but, now that sequencing bacterial genomes is so cheap, are very efficient when applied to bacteria that lack the sophisticated genetic tools available for E. coli and B. subtilis. We're one of the first to start using deep sequencing to replace conventional molecular biology analysis (well, 'one of' is a cop-out as I don't really know...).

Maybe the narrative could reflect/reinforce the narrative of the Specific Aims and Significance sections: "Haemophilus is bad, recombination is devaluing our only weapons (vaccines and antibiotics), but finding out the ground truth about between-strain recombination can let us better predict and prevent it." Or at least emphasize that we're applying this innovation to a pathogen

about PilT

One issue we need to deal with better in our revised CIHR proposal is the identity of the H. influenzae protein that retracts its type 4 pili and/or pseudopili (short stb pili).

In the well characterized bacteria (Neisseria meningitidis, Pseudomonas aeruginosa), pilus retraction is done by the protein PilT, using energy it gets by hydrolyzing ATP. I'm just going to summarize the things I think are true, but once I've done that I'll need to read the latest papers to find the evidence for my statements, and to check what I may have gotten wrong.

The problem is that we (and others) haven't been able to identify a PilT homolog in H. influenzae, although everything we know about DNA uptake in other systems, and about the need for other proteins of the type 4 pilus system in H. influenzae, predicts that a PilT homolog should be needed to pull the DNA in (by pulling the pilus or pseudopilus in). The competence regulon (CRP-S regulon includes all of the other proteins with recognizable T4P-family signal sequences ('prepilin protease-dependent leader sequences'), but none of these are good homologs of the PilT proteins identified in other bacteria (nor of the related PilU). Nor are there recognizable PilT or PilU homologs among proteins that don't have this leader sequence.

The closest H. influenzae relative of PilT in other systems is a protein assigned as the PilB homolog. In other bacteria PilB is essential for assembly of the T4P, and we know that H. influenzae PilB is essential for DNA uptake. I'm pretty sure that PilB can't also do the job of PilT, because both proteins are ATPases. That is, in the bacteria where its function has been studied (mainly N. meningitidis and P. aeruginosa) PilB uses energy from hydrolyzing ATP to assemble pilin subunits into a pilus fiber, and PilT uses energy from hydrolyzing ATP to disassemble the pilus fiber into its subunits. From the perspective of the pilus, the PilT reaction is a reversal of the PilB reaction, but from the perspective of the ATP these are very different reactions - the energy requirement tells us that PilB will not be able to carry out pilus disassembly.

But some protein must do the work of pulling in the DNA. One possibility is that H. influenzae has a cryptic PilT homolog - maybe an ATPase that gets to the right place in the inner membrane without having a recognizable T4P targeting sequence. Another is that this function is done by an unrelated protein. I'd expect such a protein to be an ATPase,

(Here's a link to a couple of movies of P. aeruginosa cells whose pili have been made visible with fluorescent antibody. In one you can see the pili (~5 times longer than the cell) shortening, and in the other you can see the tip of an elongated pilus attaching to the slide surface at a point distant from the cell, and then shortening, pulling the cell to the attachment point.)

I can't think of any way to select or screen for a defect in pilus disassembly. This is partly because pili have not been detected on strain Rd - the group that showed pili only did this work in the clinical strain 86028-NP. That strain does detectable twitching motility under the alkaline conditions where it does produce visible pili, but it also doesn't have a PilT homolog. In fact (I think), none of the Pasteurellaceae have them. And I think that Pasteurellacean cells typically don't have type 4 pili at all. We do have a pilus-associated phenotype that we can screen for defects in - DNA uptake - and we expect a PilT mutant to be defective for this. But we have already identified lots of genes whose knockouts prevent DNA uptake and thus would be found in such a screen, so this is a lousy way to look for PilT mutants.

But let's think about this a bit more. Say there are about 25 genes needed for transformation. In principle we can do random knockouts using some well-behaved in vitro method and use transformation to put these into the chromosome (just like Gerry Barcak and Hanna Tomb did in Ham Smith's lab 20 years ago). Then we'd do a massive screen for strains that don't transform, and screen the clean nontransformers for DNA uptake, then check where the knockout was in each strain that didn't do uptake. Then we'd check out any new genes. (This strategy is looking more and more lousy with each additional screening step. If I had unlimited money I might do this, but I don't think I'd fund it over competing projects.)

An alternative plan is to start by screening the genome for proteins with ATPase motifs (Walker boxes) and then pick out the ones that have signals to target them to the inner membrane and don't have any other assigned function. If there are only a few candidates this would be reasonable, but there might be very many.

Another plan is to wait for someone else to solve the problem. Another bacterium lacking PilT is also able to retract type 4 pili - I thought it was Myxococcus, but there's a 2003 paper describing Myxococcus pilT gene - Oh, it's that Myxococcus pilT mutants can still retract their pili, though not as well as wildtype cells.

Articulating why us, why H. influenzae

The research associate has been going through the reviewers' comments on our unsuccessful CIHR research proposal. Much of the criticism was along the lines of 'Why should you do this when it's already been done in Neisseria*?", "Why should you do this in H. influenzae instead of Neisseria*?" and "Why should anyone try to do this, when other scientists have been unsuccessful?"

So yesterday she took the devil's advocate position, pushing me to defend our plans. One argument we will make is that we're in a much better position than the Neisseria researchers to use uptake sequences as a tool to study the uptake mechanism. For example, we have a resource that they lack - the availability of related species with variant uptake specificity. We also have done more investigation (in H. influenzae) into the details of the uptake specificity, measuring uptake with a series of uptake sequences altered at single or double positions. And Aim I of our proposal will give us an immensely detailed characterization of how strongly every base at every uptake sequence position contributes to uptake.

Another defense is that we have thought more deeply about how uptake could work than others have. Nobody else has considered that it's not enough to have a type four pilus or pseudopilus pull on the DNA (a ratchet is needed). Furthermore, nobody else has recognized the significance of the ability to take up circular molecules intact.

These issues are better addressed in H. influenzae. The need for a ratchet is not obvious in Neisseria, because it has long external pili. But most competent bacteria appear to pull DNA to the inner/cytoplasmic membrane with short stubby structures (pseudopili). Neisseria doesn't need long pili either, but this is only seen in mutants, whereas in H. influenzae we're studying the natural mechanism, not an aberration. We know that H. influenzae can take up circular DNAs that remain supercoiled in the periplasm, but we don't know that for Neisseria.

We will downplay testing whether competent H. influenzae have external pili. None are visible in the few published electron micrographs of competent cells, but nobody has ever specifically looked for them. We will begin with the reasonable assumption that competent cells lack external pili but will check this assumption using an anti-pilin antibody.

We may also downplay the search for the proteins that bind to DNA. It's intellectually messy work and we don't have any experience with mass-spec.

* The reviewers' emphasis on Neisseria made us wonder if one or both of them might have a Neisseria background. But I just checked the member lists for the previous two versions of this review committee, and none of them have any obvious connections to Neisseria or natural competence or pili. (These are previous committees, the membership list for our actual committee won't be released for months.) But whoever our reviewers were, they were surprisingly knowledgable!

If I'm on sabbatical why can't I get into the lab?

Well, the NIH proposal needs to get to Research Services in three weeks. I've received lots of very helpful advice, but it's mostly big-picture stuff that will take lots of work to implement. Lots of digging up papers (and reading them), lots of thinking about how stuff ties together, lots of trying to craft plausible descriptions of the significance for human health. Not to mention lots of struggling to understand what the post-doc is finding in all the wonderful sequencing data.

(I just realized that I should take advantage of the NIH database of funded proposals - I can search the list for 'recombination' + 'bacteria' to see the kinds of things people write. I also think I might focus the significance a bit more on H. influenzae, rather than just emphasizing the generality of the need to better understand recombination.)

The failed CIHR proposal also needs to be rewritten in the next few weeks, so we can get a draft to internal review a month before the due date. The research associate has offered to go through the reviewer's concerns, annotating the draft at each point that needs work. I'm afraid that this will be much of the proposal. She's working hard to get more preliminary data on our cross-species complementation plan. The concern is that her preliminary data could show that this isn't going to work as easily as we are hoping, so this is risky. Luckily the reviewers weren't concerned about the lack of preliminary data for the optical tweezers section - instead they were concerned about it's significance.

On Monday afternoon I'm giving a talk about DNA uptake to the biophysics group at the university across town (henceforth SFU) where I'll be doing my optical tweezers sabbatical project. This will be a chalk talk because I want to keep it informal and get lots of interaction with the audience, and because I don't want to take the time to prepare polished slides. Luckily the room's walls are covered with chalkboards, and the organizer has promised coloured chalk.

I'll prepare for this talk by rereading the CIHR proposal (killing two birds with one stone).

Bad news from CIHR

Yesterday I got the review and scores for our DNA uptake grant proposal - not nearly as good as I'd hoped. It only made the 51st percentile, which means it won't get funded.

The reviewers said some good things. They thought it was very well written (as it was). They liked the mix of risky and safe projects.

Some of the reviewers' concerns were very reasonable: the mass- spec bit lacked details that would make it credible (we've no experience); the basic cross-species complementation should have been tested; the search for a pilT homolog should have been completed; why continue to look for pili when they're not seen in EM. They also want more justification of the personnel budget. These problems can easily be addressed in the resubmission (due March 1).

Others mainly asked for better explanation of significance. Why study DNA uptake in H. influenzae when we already have information about uptake by Neisseria and Bacillus? What will we learn from the optical tweezers experiments? I thought we had done a really good job of explaining this, but I guess I thought wrong.

A bigger concern was the reviewers' general lack of enthusiasm, which I suspect may also account for several unfounded criticisms. One reviewer wondered why we were going to look at binding of Neisseria pili to DNA, when we had clearly stated that this was just a positive control. Another said that the optical tweezers experiments had already been done for Neisseria and Myxococcus, when in fact these had only looked at pilus retraction, not DNA uptake. If we had managed to create more enthusiasm, maybe the reviewers would have more carefully checked whether these criticisms were valid.

CIHR includes a 'community reviewer' for each proposal. These are interested members of the general public - they usually read only the lay summary. The community reviewer of the previous submission (3 years ago) complained that the lay summary hadn't included the name of the organism, so this time I made a point of naming Haemophilus influenzae, clearly explaining that it was a bacterium that causes respiratory diseases. But the reviewer of this proposal nevertheless complains that the name misled them into thinking this proposal was about influenza, and recommends that we don't give any species names!

We're going to try to get a revised version done by the end of January, so we can take advantage of UBC's internal review option.

The US-variation manuscript finally is coming together

(Yes, I know I've said this before...)

Abstract

Uptake signal sequences are DNA motifs that promote DNA uptake by competent bacteria in the family Pasteurellaceae and the genus Neisseria. The genomes of these bacteria contain many copies of their canonical uptake sequence (often >100-fold overrepresentation), causing the uptake machinery to prefer DNA derived from close relatives over DNAs from other sources. However the molecular and evolutionary forces responsible for both the uptake bias and the abundance of uptake sequences in these genomes are not well understood. Here we thoroughly evaluate the simplest explanation, that uptake sequences accumulate in genomes by a form of molecular drive, generated by biased DNA uptake and genetically neutral recombination. A computer simulation model shows that these simple assumptions are sufficient to drive uptake sequences to high densities, with the spacings, stabilities and strong consensuses typical of real uptake sequences. In the absence of strong evidence of selection for a recombination function, it may thus be more parsimonious to treat uptake sequences as an epiphenomenon of biased DNA uptake rather than as evidence for a sexual function of natural competence.

Resolving the hotspot paradox, at least partly)

A while back I described our previous work on the paradoxical activity of meiotic recombination hotspots (their mode of action is self-destructive). A new paper by Simon Myers and coauthors (Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination.) now goes a long way towards resolving the paradox, though it doesn't explain how our recombination system got itself into this mess.

This group had previously identified a 13-nt sequence motif typical of human hotspots (thought not all of them); it's thought to be the sequence motif recognized by the process that initiates recombination by creating a double-strand break in the hotspot DNA. Previous work had suggested that chimpanzee hotspots are in different places than human hotspots, so the authors looked at the chimpanzee homologs of the human hotspots and found that although the sites did have this sequence (or variants of it), they didn't function as hotspots. The chimpanzee genome also had this motif at other sites, more than the human genome does. They concluded that many of the human occurrences of the motif had been lost from the human genome because their hotspot activity was self-destructive. They hypothesized that the motif was not ancestrally a hotspot but had become one in the human lineage, 1-2 million years ago.

They then decided to look for the protein that recognizes these sites. Based on their earlier work they had already hypothesized that it would be a zinc-finger protein with ≥12 'fingers' to bind the motif. So they used structural predictions to examine candidate zinc-finger proteins encoded by the human genome, and found five candidates, of which the best was a protein called PRDM9.

If PRDM9 is indeed the protein that, in humans, binds the 13 nt hotspot motif to initiate recombination, its human version should recognize this motif but its chimpanzee version should not. Consistent with this, PRDM9 was the only one of the five candidates that was different in chimpanzees (the other four had identical sequences in both species). Furthermore, it's sequence didn't just have random differences, but had many of its differences in the zinc-fingers that recognize DNA sequence,with hallmarks of positive selection for the changes. And independent work on this protein in mice and genetic mapping, both implicate it as playing a role in the initiation of recombination.

So what are the implications for the hotspot paradox? My simple view is that active hotspots do self-destruct over evolutionary time, as we predicted, and because we need recombination to hold chromosomes together in meiosis, this creates selection on the protein that recognizes them. Variant proteins that recognize new sequences are favoured (that's the positive selection) because they can cause more recombination and thus better prevent chromosome errors. So over long evolutionary periods, genomes may progress through a series of different hotspot motifs and locations.

Researchers have sometimes proposed that different kinds of genes evolve best with different amounts of recombination, and that the chromosomal locations of genes and/or hotspots have evolved to that optimize the amount of recombination. This new paper throws cold water on that idea.

Added later: Turns out the Myers paper was published together with two other papers about PRDM9's role in recombination, confirming and extending the conclusions. And, the same gene was identified last year as a locus very important in speciation, so maybe changing your hotspot specificity changes who you can reproduce with.