RRResearch: February 2010

Submitted!

Hooray, the CIHR proposal has been submitted. We think it looks good - we've incorporated just about all of the suggestions of our two internal reviewers, and we didn't have to cut any of our little illustrations to fit it into the 11 pages.

We took a break from proposal-polishing this afternoon to drag our visiting Chinese grad student to the pub to watch the hockey game, telling him that he'd otherwise be kicking himself for missing such an exceptional Canadian cultural experience. And it was!

Now I'll get into the lab, tomorrow morning, to start working on the preliminary techniques for the optical tweezers experiments. Things like figuring out how to get the cells to stick to coverslips without killing them, and how to tell that I haven't killed them.

Getting shorter...

The proposal, not me.

It's down to 11.7 pages. And that's with some additions to clarify previously obscure points, and with all the tiny illustrative figures retained. There's not much deadwood left, but I think there's still lots of sentences we can shorten and maybe even cut.

I've also been going through the figures for the Appendix, making them clearer and adding 'Conclusion' boxes.

Nearly done the CIHR proposal

I have to click 'Submit' on the CIHR proposal by 10:00 am on Monday. The signed forms are already at Research Services, who have managed to get UBC a blanket institutional-signoff extension from CIHR. (I suspect that they claimed that all the signing authorities would have fled the city, either to avoid the predicted Olympics-associated chaos or because they went to Hawaii on the money they made by renting out their house.).

It's looking pretty good now, thanks to valiant work by the postdoc and research associate, except that it's about 50% too long. It will get shorter when the reference citations ({name, year,Endnote #}) are replaced by simple numbers, and when the paragraph spacing is reduced from 6 pt to 3, but that won't be nearly enough to get it down from 17 pages to 11. So today's big challenge is to find and eliminate all the redundancies in the text, and all the irrelevant or dispensable information.

The research associate is off for the weekend - much deserved after some heroic work generating evidence of cross-species complementation of the pilin operons. I'm delighted with this result because I wasn't at all sure the operons could complement, and one of the best parts of the proposal depends on this complementation. I had started to test this several years ago but given up when my plasmids were misbehaving, but she's sorted out the plasmid mess (by throwing mine out and starting fresh) and gotten the result I had been hoping for.

Today and tomorrow the postdoc and I will cut and polish and cut and polish, with a break tomorrow afternoon to watch the hockey game (Canada vs US for Olympic gold!) at a campus pub. I'll also do the space-saving last minute editing that makes a paragraph a line shorter by eliminating one little word or nudging a paragraph margin by a millimeter. Then we'll assemble all the docs into one big pdf and I'll click 'Submit'.

And on Monday I'll get into the lab.

Explaining the gaps

OK, I'we worked out how to organize the first part of the Background section of our CIHR proposal: First a very brief overview of DNA uptake in all bacteria, with three sub-headings: (i) Regulation, (ii) Uptake and translocation, and (iii) Degradation and recombination. (There might also be a little figure showing the four stages, but this might not be needed now there's a detailed figure soon after.) This section serves also to emphasize the generality of the problem - I'm not just proposing to solve a H. influenzae problem.

The next short section is titled Current model of DNA uptake. It presents a new version of the figure I posted yesterday, with the gene names and the nucleotide import removed. It describes what we think happens in gram-negative bacteria, relying on evidence mainly from H. influenzae and the Neisserias. The basic steps are described, but the detailed evidence and the problems aren't pointed out until the Gaps sections.

Initiation occurs internally on dsDNA fragments, preferentially at short sequence motifs called uptake sequences.
The force for DNA uptake is produced by retraction of cylindrical protein multimers called pseudopili, short versions of long extracellular filaments called type IV pili.
Transforming DNA enters the periplasm through outer membrane 'secretin' pores like those through whch type IV pili exit.
DNA may bind nonspecifically to the pseudopilus, as it does to type IV pili, and this may be how the force of pseudopilus retraction is transmitted to the DNA.
Once DNA is in the periplasm, a fragment end interacts with the translocation machinery. One DNA strand is degraded to nucleotides as the other enters the cytoplasm.

Evidence underlying this model comes from several kinds of analyses, in H. influenzae, the Neisserias and other bacteria: the phenotypes of mutants, the fate of radiolabelled or genetically marked DNA in wildtype and mutant cells, protein properties predicted from the sequences of genes induced in competent cells or otherwise implicated in competence, the activities of homologs that contribute to type IV pilus function.

This model is superficially satisfying, but a critical analysis reveals a number of problems. Next I describe the three serious gaps that will be filed by the experiments I propose.

Gap 1. What are the players? (What proteins contribute to DNA uptake? Which of these interact directly with DNA?)

The first paragraph explains that H. influenzae is particularly appropriate for this analysis because (i) we've identified all the members of its competence regulon (can't be done in Neisseria (no regulation), or the Gram-positives (no competence-specific regulation).

The next two paragraphs describe what's known and not known about the H. influenzae competence genes; it includes a little figure showing all the genes clustered under their likely roles. Only four genes can reasonably be excluded from roles in uptake. Four others have no known function, and most of the rest have only suggested functions. Only six are directly implicated in uptake by assays of labeled DNA and phenotypes of non-polar knockout mutations.

Another paragraph describes the need to identify the proteins that bind DNA, and the limited success to date (some evidence of non-specific binding).

This section ends with a paragraph describing the Specific Aim and the first two questions: which proteins contribute to DNA uptake (and translocation), and which proteins contact DNA during uptake and translocation. I think the paragraph should also summarize the strategy.

Gap 2. What is the uptake specificity? How does it act?

The first paragraph will give basic factoids about uptake specificity and uptake sequences. Both H. influenzae and N. gonorrheoeae have long been known to have very strong preferences for DNA from close relatives. We now know that this is due to an uptake bias favouring sequence motifs abundant in these DNAs; our measures show at least a 100-fold bias (published reports are inconsistent). The preferred H. influenzae sequence was initially identified as a 9 bp AAGTGCGGT (N. gonorrhoeae's is XXXXXXXXX). However once genome sequences became available the focus shifted to characterizing the many copies of these sequences in the respective genomes. Once my lab realized that the genome sequences are not replicative elements but motifs (like other protein-binding sites), we shifted to characterizing them as such (see attached manuscript).

Relevance of uptake sequences to the mechanism of uptake: 1. Uptake sequences are likely to result from general features of DNA uptake in Gram-negative bacteria. Even though their sequences are different the genomic H. influenzae and Neisseria uptake sequences share almost all other properties: frequency, spacing, frequent accessory role as transcription terminators, strong consensus (see attached manuscripts). These are the two best studied DNA Gram-negative uptake systems. Their uptake specificities are also shared by other members of the Neisseria, and by all Pasteurellaceae. There is also some evidence of uptake specificity in other systems (Campylobacter, others?), and most have not been examined. The absence of obvious uptake-sequence-like repeats in genomes doesn't mean that the species' DNA uptake machinery has no sequence specificity. The failure to easily find a single protein that binds specifically to uptake sequences (in either H. influenzae or N. meningtidis) suggests that uptake specificity is not a detachable 'add-on' to the mechanism (a kind of pre-screening) but is rather intrinsic to the process.

One big problem with the uptake model presented above is the need for sharp DNA kinking, and that may be resolved by uptake sequences. Double-stranded DNA is quite stiff on the scale needed for uptake, with a persistence length of >50 nm (for comparison, the secretin pore has a diameter of about 6-7 nm ). (Can this be indicated on the figure; is the figure drawn roughly to scale?) The model shows a slight bend at the uptake sequence (structural models predict a ??° bend at each of the two AT-rich segments in the H. influenzae uptake sequence, and we see the predicted gel retardation in a 200 bp model fragment), but clearly a sharp kink of nearly 180° is needed for passage through the pore. We know that uptake does not need to occur at the ends of fragments, because closed circular plasmids are taken up as efficiently as linear fragments. Current understanding of DNA structure is not good enough to predict whether uptake sequences are preferential kink sites , nor what kind of force might be needed to cause it. Because of DNA's high charge, bypassing the pore is not an option. (There's also the problem of fitting the DNA in the pore beside the pseudopilus, but it may be better to not even bring this up as I don't propose to solve it.)

Say that we can't just rely on the genomic uptake sequence motif as a surrogate for the uptake specificity. Evidence in the manuscript that uptake bias only imperfectly corresponds to the genomic motif, also evidence of strain-to-strain variation at uptake sequence positions doesn't correspond.

I propose to carry out a very high resolution analysis of uptake specificity, examining both sequences in the uptake sequence motif and effects of nearby sequences. The high resolution will allow investigation of effects on uptake by interactions between different positions in the motif (like the genomic covariation analysis in the manuscript). We will also investigate the physical properties of the newly defined uptake motif. Finally, we will exploit the variant specificity of the related A. pleuropneumoniae DNA uptake system in a cross-species complementation experiment to identify the genes responsible for their different uptake specificities.

Gap 3. What forces does DNA experience during uptake?

Force must act on DNA to create the kink needed to initiate uptake; force is also needed to continue uptake. The model shown above addresses this by using pseudopilus retraction to pull DNA across the outer membrane, but it overlooks two big problems, the need for a ratchet and the absence of the retraction protein PilT.

In principle, retraction using a type IV pilus mechanism is certainly able to generate a sufficiently strong force; measurements using optical tweezers (where a cell is stuck to a coverslip and its pilus is attached to a bead in the optical laser trap) have recorded forces in excess of 150 pN, the strongest molecular forces known. The details of pilus assembly and retraction are shown in the figure (??? maybe we'll have a figure here). Prepilin subunits in the inner membrane are freed from their leader sequence by a prepilin peptidase (pilD), and assembled into the base of the elongating pilus by the PilB ATPase. Retraction occurs by disassembly of the subunits by the reverse of this reaction, typically catalyzed by the related PilT ATPase; this is where the force is generated. The entire complex is thought to be restrained in the inner membrane by other proteins (name them??).

Although H. influenzae has good homologs of all the other Tfp proteins, it lacks any identifiable homolog of PilT, as do all of its Pasteurellacean relatives. Thus we do not know where the force comes from. (One strain has been shown to be capable of assembling and retracting pili under special conditions, so we know it has all the required genes; to date thee recognized ones are all int the competence regulon.) We will take two approaches to finding the source of the force. First, characterization of all proteins that interact with incoming DNA should identify it, even if it's not one of the known competence proteins. Second, using optical tweezers to characterize the forces acting on DNA during uptake will show whether the force has the properties expected of a Tfp mechanism.

The second problem with the current model of uptake is the need for a ratchet. This problem has been largely overlooked by the Neisseria researchers, perhaps because their main focus has been on the role of Nesseria's long type IV pili in pathogenesis. However, type IV pili have not been detected on H. influenzae cells under normal growth conditions, and even though competent cells dramatically upregulate all of the Tfp genes they do not have detectable pili. Neisseria cell also do not need long pili to take up DNA, as mutants defective in pilus assembly are proficient for uptake. Many other naturally competent Gram-negative bacteria also lack detectable type IV pili, despite possessing the same genes as H. influenzae (though they do have PilT).

Type IV pili can be several µ long, so a single pilus retraction could in principle pull in DNA fragments as long as 10 kb (if one end of the DNA bound to the proxinal end of the pilus) (see 'Not this' figure). But DNA is normally taken up not by pili but by pseudopili, which are thought to only span the distance between the inner and outer membranes (??? 20 nm???), and not to protrude significantly beyond the cell surface. Thus a single pseudopilus retraction can only be expected to pull in about 100 bp of DNA (see 'But this' figure).

One possible solution would be coupling of translocation to uptake, with the pseudopilus only needed to bring some part of the DNA into contact with the translocation machinery. But we know this is not the solution in H. influenzae, because circular plasmids can be fully taken into the periplasm even though they cannot be translocated, and because uptake proceeds normally in a rec2 mutant, which cannot translocate DNA.

The simplest solution (thought by no means simple) is for the pseudopilus to act as a ratchet, alternately elongating and binding DNA and then retracting and releasing the DNA. A detailed drawing of this mechanism is provided in the Appendix. The timescale of DNA uptake (a few minutes?) and the length of each retraction event makes this potentially detectable with optical tweezers.

In addition to their use for characterizing pilus retraction, optical tweezers have been used to measure forces on DNA during uptake by competent cells. This work shows that this is a good way to measure forces but was not informative about ratchet mechanisms because the bacteria used don't face this problem. The first measurements were done with B. subtilis, which needs its pseudopilus only to bring a part of the DNA to the cell surface where it is cut for translocation (mutants lacking the pseudopilus proteins and translocate DNA provided the cell wall barrier is removed). The only other tweezer measurements have been done with Campylobacter, which is (with its relative Helicobacter) the only uptake system that doesn't use Tfp machinery.

(Say more about forces here). (Also say that maybe the need for sequence specificity arises from the complications introduced by the outer membrane and the need for an uptake ratchet??)

Now describe what I propose to do. Need details here.

Other ideas to include somewhere else:

If we detect binding of a particular protein to DNA, we can then test whether mutations in other proteins affect this binding - this might help define the order of events within, e.g. DNA uptake. For example, if we see that pilin does contact DNA, we can test whether a secretin mutation prevents this.

If sequencing costs drop we will repeat the Q. 3 analysis of uptake specificity with A. pleuropneumoniae cells and their USS.

Background section for the CIHR proposal on DNA uptake

I've convinced myself that I need to reorganize the Background section of our upcoming proposal to CIHR (due March 1), but I'm not making much progress on paper so I thought I'd try to outline it here.

The plan is to first give a very brief overview of natural competence, saying what's generally true for all bacteria. This could also give some H. influenzae-specific information but I think it's better kept general.

Then I'll have a drawing of our current model of DNA uptake in gram-negative bacteria (applicable to both H. influenzae and N. meningitidis, and maybe to most Gram-negative bacteria). The figure below is one from a review we wrote - I'll modify it for the grant. I was originally (i.e. yesterday) planning to just briefly point out that this 'model' is really only a static picture of the known and hypothesized players. But I'm beginning to think I should give some description of the mechanisms illustrated the figure, also telling the reader that it's all hypotheses based on limited information (what does and doesn't happen to transforming DNA in wildtype and mutants, what properties proteins are predicted to have based on their sequences, what homologs of the proteins are thought to do in Tfp function).

Initiation of uptake has a strong sequence bias towards a motif called the uptake sequence.
Initiation occurs internally on DNA fragments.
Force for uptake is produced by shortening of a protein multimer called the pseudopilus, which is closely related to long filaments called Type IV pili (more details below).
Transforming DNA enters the periplasm through an outer membrane secretin pore like those through which type IV pili exit the periplasm.
DNA can bind nonspecifically to type IV pili; this may be how the force is transmitted.
Once DNA is in the periplasm, a fragment end interacts with translocation machinery.
One strand is degraded (probably on the periplasm side of the membrane) and the other enters the cytoplasm.
The competence protein Rec2 may form a pore in the membrane.

The model is not informative about many points. We don't know:

What are the ultimate sources of the forces that pull DNA in (ATP? PMF?, other?).
How the pseudopilus is disassembled.
How the DNA fits through the pore?
What role the sequence specificity plays .
What is the full uptake specificity.
Whether sequence specificity only matters at initiation.
Whether uptake and translocation are usually coupled.
What most of the genes in the H. influenzae competence regulon do
What prevents backsliding.

The rest of the Background is headed by the three gaps I propose to fill.

Gap 1: Who (what?) are the players?
This section will describe what we know and don't know about the proteins that might contribute to DNA uptake. I need to also say what other proteins might do (process DNA in the cytoplasm)? End with an overview of what we'll do in Aim I.

Gap 2: What is the uptake specificity?
This section will describe why I think the uptake specificity is an important component of the mechanism (i.e why I don't think it's just a Haemophilus-specific artefact). Emphasize that properties (genomic and uptake) are shared by Neisseria (just not the actual sequence itself) and because these are the two best studied systems we have to take them as exemplars. Then I can describe what we know: crude uptake assays, detailed genomic analysis, two types in the pasteurellaceae, and give an overview of what we'll do in Aim II.

Gap 3: What are the forces?
This section will describe the unknowns about what forces act on the DNA, in the contexts of the B. subtilis and Helicobacter analyses. Here I'll describe the absence of PilT, and the apparent need for a ratchet mechanism. Also the backsliding problem. And the need for an uptake force that is independent of translocation.

Proposed dynamic model
The Background will end with my dynamic ratchet-based model of uptake.

Two submissions down

The NIH proposal got submitted (it just needed a 9-digit zip code), and yesterday the former post-doc submitted our manuscript on uptake sequence variation. It looks pretty good so we decided to submit to Genome Biology, even though BioMed Central is not my favourite journal publisher.

Now I'm back to working on the CIHR proposal about the mechanism of DNA uptake. I have comments from the two internal reviewers. One thought it was pretty good proposal, and had lots of comments about ways to increase the clarity and improve the explanations. The other thought the global organization was quite bad - this was depressing but his suggestions were very good so we're doing a major reordering of the material.

Yesterday the postdoc and RA gave me their revisions to our Encyclopedia of Genetics entry on transformation. I did a bit of polishing, and now it's just about ready to go. They're working at the bench, along with our visiting grad student from China, but I think I'll have to wait until the CIHR grant is in to get to my bench.

NIH's new forms won't accept zip codes!

OK, so I dotted every 't' and crossed every 'i' for the new NIH RO1 application forms. Everything done perfectly, to assemble the 16 form pages and dozen of so attachments into the multi-component Adobe-format application. But one ridiculous problem is preventing us (i.e. the UBC Research Services grants administrator) from submitting the application to NIH.

THE FORM DEMANDS BUT WON'T ACCEPT ZIP CODES!

My address, of course, doesn't have a zip code - I chose Country = Canada and entered the post code in the field above the zip code field, no problem. But my consultant is in the US, and I need to enter his address on the Key/Senior Personnel page. So I chose Country = United States and entered his zip code in the zip code field. But the text turned red and a pop-up window told me this isn't a valid zip code. I confirmed the zip code on his letter of support and on his university's web site, I tried other valid zip codes, no success. I tried to delete the Key/Senior Personnel section for the consultant, but that box is greyed out (apparently I can't delete it until I successfully complete it...). The 'Check application for errors' function told me that the application is perfect except for the invalid zip code.

So I took the all-but-complete form (on a memory stick) over to the grants administrator, hoping she could either submit the application as-is, or solve the problem. But the problem was the same on her computer, which told me that it's not a Mac-specific problem. She was too busy dealing with other people's messed-up NIH applications to contact NIH (or Grants.gov) for help.

Back to my office to try a few more things. Downloading a fresh copy of the application form took ages (mostly finding the right web page), but the new one had the same problem. In a way that's a relief, as I really don't want to have to redo the whole application on a fresh copy. As a test I tried changing my entry in the PI section, telling it that I was located in the United States, and giving myself a zip code. Same problem, so now I know it's a general problem with zip codes, not just with that one field.

While unsuccessfully searching Grants.gov for a technical support address or phone number, I found a tester version of the application form, provided to allow applicants to check that they had the right version of Adobe Reader. So I downloaded that form, and had no problem entering a zip code into it. So that tells me the problem isn't with my version of Adobe Reader.

I've tried Googling various combinations of terms (NIH, RO1, SF 424, "invalid zip code") but haven't found any evidence that other people are having this problem. The only solution I can think of now is to tell the form that the consultant is located in Canada (Charlottesville Virginia Canada?) and enter a fake post code.

Later: I was going to tell NIH that the University of Virginia is in the city of "Charlottesville VA USA 22908", in the country United Arab Emirates, but in the meantime the grants administrator discovered that the problem was simply a requirement for a 9-digit zip code!

Now I've also fixed the errors that were identified post-submission, which existed because I didn't realize that the 'Months' section of the Budget for salaries was to indicate the amount of their 'effort' each person would put into the project.

Last paragraph of the NIH proposal

I need to get it written in the next half hour, but my brain is jammed. Points it should make:

We're the best people to do this work. We have a unique combination of wonderful attributes.
The components of the work are well balanced. None is excessively risky, and later work is not dependent on the success of (or a particular outcome of) earlier work. Our preliminary results confirm that the basic strategy is robust.
The approach is cost-effective. Using genome sequencing to get answers about recombination is much cheaper than doing it with molecular biology, because of the breadth of information the sequences provide.
The results will give insights into the molecular mechanism of recombination.
The work is testing hidden assumptions about recombination. Over the past 60 years, studies of bacterial genetics have been forced to make assumptions about recombination events. These assumptions were reasonable, given the information available, but now we can finally test them.
The strains we will have sequenced are a resource for mapping clinically important phenotypes. They also provide a gold-standard control dataset for phylogenetic and epidemiological studies that must detect recombination.