RRResearch: July 2009

Manuscript drags on

My plan to explain how analysis of the Gibbs scores helps us understand the evolution of uptake sequences isn't working out - my brain refuses to think clearly about this. So I've set that question aside and am just trying to get the manuscript into slightly less awful shape before sending it back to my co-author. Then I think neither of us plans to put much effort into it until the fall.

Analysis of uptake sequences by score

Here are the logos for the N. meningitidis and H. influenzae uptake sequences after sorting the occurrences by the scores that the Gibbs motif Sampler assigned them. (I'm pretty sure that each score is a measure of how well that occurrence's sequence matches the position weight matrix that Gibbs determined for this data set, but I don't know how the calculation is done.)

The top set are the logos for 5381 N. meningitidis DUSs. The numbers are different than in yesterday's post because I realized I had been analyzing a N. gonorrhoeae data set. The overall picture is the same for N. meningitidis and N. gonorrhoeae - low-scoring DUS retain strong consensus for most of the central positions but have only very weak consensuses for the other positions. The drop-off is quite steep. The shapes of the logos are about the same for all the occurrences with scores lower than about 0.95.

The H. influenzae dataset is even more skewed; almost 60% of the USSs have perfect scores, and about 8% have zero scores. But the consensus decays fairly evenly across the positions, and even the zero-score occurrences have the full motif. Like the N. meningitidis DUS, the shapes of the USS logos are about the same for all occurrences with scores below 0.95.

I think the question in my mind was whether there is a obvious place to draw a line between 'real uptake sequence' and 'degenerate sequence that doesn't deserve to be treated as an uptake sequence'. Unfortunately the analysis is complicated by the different sizes of the datasets - the N. meningitidis set has almost twice as many sites as the H. influenzae set.

OK, I've dug out another set of H. influenzae runs, done with a high 'expected' setting to maximize the number of sites found. This has 3466 USSs, with a lot more having zero scores than in the previous set. Now the first and last Gs in the core are seen to be weaker in USSs with low scores, though not in the larger set of USSs with zero scores. Overall the consensus still remains constant as the scores and consensus strengths decrease. Notably, the flanking AT-rich segments remain as important in poorly matched USSs as the core does.

Poster

I spent last week at the Gordon Conference on Microbial Population Biology. All the science communicated at a Gordon Conference is confidential, so I didn't blog about it. But here's snapshot of my just-in-time poster, written on the back of a poster discarded from the previous week's Gordon Conference, using coloured pens I found in the school's art-supply cupboard.

While there I sat down with my former post-doc to discuss our manuscript on uptake sequence variation. We agreed that it needed major reorganization more urgently than it needed simple editing and polishing, so we worked out a new structure that we think will hold the ideas together much better. I made some new figures (cartoons of our explanation and of how our simulation model works) and rearranged the text into its now order, but I needed a paper copy to do any serious editing. Now I'm home I've printed out the rearranged draft and am hoping to do one quick pass through it and then set it aside till our grant proposals are done.

But of course I immediately got distracted by the data. I want to be able to say something about how our new analysis of uptake sequences as motifs gives us insight into their evolution. The most promising angle is the distribution of good and poor matches to the motif, which I can analyze because the Gibbs Motif Sampler assigns a score to each occurrence it finds.

I dug out a set of 4646 DUSs Gibbs found in the N. meningitidis genome, and sorted them by score. And now I've spent a lot of time trying to force Excel to draw a proper histogram. I get the histogram values using a math teachers' web site called Illuminations, and paste it into Excel, but Excel refuses to use the data ranges as the X-axis (instead using its line numbers). I've found a work-around - the graph is very ugly but here it is. The red bars are the numbers of DUS with each score range (0-.02, .02-.04, etc) and the lilac bars are the cumulative numbers with increasing scores. So about 0.5% of DUS have zero scores, very few have non-zero scores lower than 0.5, about 2% have scores in each category from 0.5 to 0.98, and more than 40% have scores greater than 0.98.

I made weblogos for the top-scoring 50% and bottom-scoring 50% of the DUS occurrences (my previous analysis had only looked at the high-scoring ones). Here they are; the bottom 50% logo isn't evenly weak at all positions, instead it's quite strong at some positions and very much weaker at others. I don't know what I'm going to do with this analysis... I guess I could make a range of logos, maybe for the 10 deciles (is that the word I want?), to see how the consensus decays. And I should probably do the same thing for the H. influenzae USSs.

Controlled nicking of supercoiled plasmid DNA

Last week I tried to introduce single-strand nicks into a closed circular plasmid (pUSS-R) by doing restriction digestion in the presence of ethidium bromide (background is here). I tested two different dilutions of HindIII, a restriction enzyme that should cut this plasmid once, with three different concentrations of ethidium bromide, to find conditions where a convenient range of incubation times gave a range of partially digested molecules.

I was hoping to see what's shown in the upper gel drawing - appearance of a novel band that migrated slower than the fully digested DNA in the rightmost lane and much slower than the supercoiled DNA in the leftmost lane. This new band would contain plasmid that had undergone a single-strand nick at its single HindIII site. I didn't know whether I might also see eventual appearance of linear DNA. But instead I saw what's shown in the middle gel drawing - gradual appearance of a linear-sized band as the supercoiled band disappeared.

I wondered if the HindIII now being sold no longer causes nicking (maybe New England Biolabs has 'improved' their HindIII clone...). So I did the control shown in the lower gel drawing, incubating pUSS-R with a very low concentration of DNase I, an enzyme widely used to create nicks in double-stranded DNA. Again I expected to see a novel, slow-migrating band, but instead saw only a linear-sized band. Damn!

I also ran samples from a few old plasmid preps. Most of these contained several bands, but I don't know if the slowest one is relaxed circles or dimers, because the plasmids may have been prepared from rec+ cells. I've now also checked the old literature, just in case I was mistaken in expecting nicked/relaxed DNA ("form II") to migrate slowly, but indeed it should (see for example the marker lanes in Fig. 2 on PNAS 86:1309).

Now what? Is the problem that the nicked DNA migrates at the same speed as linear DNA? Is this specific to this particular plasmid? Or do I not have any nicked DNA? Should I try another plasmid?

What to propose to NIH?

How about this?

Goal: To fully characterize all of the biases and sequence specificities of transformational recombination in H. influenzae.

Specific questions to answer:

Is DNA binding a distinct step that precedes the initiation of DNA uptake? If so, does it have the same sequence specificity as DNA uptake? Does it have any topological specificity?
What is the complete sequence specificity of DNA uptake? How absolute is the requirement for a good match to the USS consensus at uptake (are non-USS DNAs taken up at lower frequency or not at all)? This will be investigated with plasmids or short fragments containing 12% degenerate USSs, and with ones containing completely random sequences (we could create these or just use fragments of unrelated DNA). Does uptake have any topological specificity?
Does the translocation step impose any sequence specificity? How strict is the requirement for a pre-existing free end (does circular DNA sometimes get cut or nicked in the periplasm, or transported intact into the cytoplasm?
Is there any sequence specificity to the DNA degradation that accompanies uptake and translocation?
What proteins interact directly with DNA during binding, uptake and translocation?
What recombination biases affect indels? (How efficiently are different indels transfered by recombination?
What are the recombination biases along the full length of the chromosome?
How does mismatch repair affect the outcome of transformational recombination? How much of the recombination bias found by #7 is due to mismatch repair?

Proposal priorities

I've gone through our 2007 CIHR proposal, incorporating my polishing of the text and notes about things the reviewers liked or didn't like. I also added notes about what we could reasonably accomplish before Sept. 15; here is an updated version of the ideas I posted a few weeks ago.

Specific Aims:

"What is the H. influenzae uptake specificity? A pool of USSs that have been intensively but randomly mutagenized and then selected for the ability to be taken up by competent cells will be sequenced to fully specify the uptake bias." I've been getting info about designing and ordering the degenerate oligos. The post-doc had already set up a spreadsheet that does the calculations I wanted to think about, and he and I agreed that we should start with 12% degeneracy rather than 9%. This will reduce the fraction of oligos that are strongly preferred, giving more sensitivity for detecting weaker effects. I've had replies from two custom-oligo companies, and the good news is that our degenerate oligo pool will be easier to get and much cheaper than I had expected. For the proposal we'll only need to do some conventional sequencing, and as our USSs will already be in plasmids we think we'll just sequence each plasmid insert separately. This will be wasteful but not very expensive if we do the DNA reactions and cleanups ourselves, and probably cheaper when we consider the time/money we'll save by not having to troubleshoot a more 'efficient' sequencing strategy.
"What forces act on DNA during uptake? Laser-tweezer analysis of USS-dependent uptake by wild type and mutant cells will reveal the forces acting on the DNA at both the outer and inner membranes." My physicist collaborator is keen to have me back to get this working. The only think I could aim to get done before September 15 is to attach some chromosomal DNA to the styrene beads and show that Bacillus subtilis will bind to it and pull on it in the tweezers apparatus, and that H. influenzae doesn't stick nonspecifically to beads with no DNA on them (a concern raised by a reviewer). The biotin-linked chromosomal DNA I'll use for this will also be used in other preliminary experiments.
"Does the USS polarize the direction of uptake? Using magnetic beads to block uptake of either end of a small DNA fragment will show whether DNA uptake is symmetric around the asymmetric USS." Our 1 micron paramagnetic streptavidin beads are on their way, and I'll use these to check for non-specific binding of cells to beads and for specific binding of competent cells to beads with DNA on them (using the same DNA prepared above). I found the source of the 50 micron beads and will order them tomorrow; maybe I can use them in the same way.
"Does the USS increase DNA flexibility? Cyclization of short USS-containing fragments will reveal whether the USS causes DNA to be intrinsically bent or flexible, and whether ethylation or nicking can replace parts of the USS." I'm going to try the nicking protocol tomorrow.
"Which proteins interact with incoming DNA? Cross-linking proteins to DNA tagged with magnetic beads, followed by HPLC-MS, will be used to isolate and identify proteins that directly contact DNA on the cell surface." We'll have the DNA on the big magnetic beads (see 3 above) and can use this to try out the formaldehyde cross-linking. It would be good to show that we can distinguish between non-specific proteins or peptides (independent of both DNA and competence) and peptides cross-linked only when cells are competent and the beads have DNA on them. One of the reviewers really liked this part, but thought we should be more ambitions and propose more diverse approaches to this problem, including in vitro cross-linking to purified secretin.
"Which proteins determine USS specificity? Heterologous complementation with homologs from the related Actinobacillus pleuropneumoniae (which recognizes a variant USS) will identify the proteins responsible for this specificity." It would be good to have results of some complementation experiments with single gene plasmids, as the whole-operon plasmids seemed to cause growth problems.

The RA will be back on Monday, and she, I and the post-doc will sit down together to work out who can try to do what. Then I'll start thinking about integating these experiments and the post-doc's experiments into the NIH proposal.

Does EthBr turn restriction enzymes into nickases?

I'm looking for a way to create a nick at a specific point in a closed circular plasmid, to see if increased DNA flexibility facilitates DNA uptake. New England Biolabs now sells specific nickases, but their site preferences aren't very convenient. I also have an old paper (Shortle et al. 1982 PNAS 79:1588) describing use of ethidium bromide to convert restriction enzymes into nickases. So I think I'll just do a simple test. Here's the relevant paragraph from that paper:

Restriction Enzyme Nicking Reactions. Covalently closed circular pBR322 DNA was nicked with restriction endonucleases HindIII, Cla I, or BamHI by incubating 10 pkg of plasmid DNA in a 100-Al solution of 20 mM Tris HCl, pH 7.8, 7 mM MgCl2, 7 mM 2-mercaptoethanol*, gelatin* (100 Ag/ml) and a concentration of ethidium bromide (50 ug/ml, 75 ug/ml, or 100 ug/ml, respectively) determined by titration to give an optimal level of nicking. An amount of restriction endonuclease was added sufficient to convert 50-90% of the input DNA to an open circular form on incubation at room temperature for 2-4 hr. The nicking reaction with the EcoRI enzyme consisted of 100 mM Tris'HCl, pH 7.6, 50 mM NaCL, 5 mM MgCl2, gelatin* (100 ug/ml), ethidium bromide (150 ug/ml). Reactions were stopped by addition of excess EDTA followed by phenol extraction and ethanol precipitation.

I have lots of supercoiled plasmid that has one EcoRI or BamHI or HindIII site, and a big bottle of 1 mg/ml EthBr. So I'll just set up a series of digests with different concentrations of EthBr and restriction enzyme and the standard digestion buffers*, and run a gel to see what I get.

*In the old days we used to add mercaptoethanol and gelatin to stabilize our restriction enzymes, but I won't bother.

Email to potential suppliers of a degenerate USS oligonucleotide

Dear oligonucleotide expert,

I'm writing to inquire about ordering a large highly degenerate oligonucleotide.

We're hoping to obtain an oligonucleotide that will be about 50 nt long and 12% degenerate at every position (so this will really be a population of degenerate nucleotides). Below I've listed specifications.

Would you be able to synthesize such an oligonucleotide for us? If so, could you let me know the estimated cost and time frame, and any other issues you think we should consider?

Thanks very much,

Rosie Redfield

Sequence: Not yet finalized, but probably similar to: AAAGTGCGGTTAATTTTTACAGTATTTTTACTGGTACCATATGAATTCCA

Degeneracy: Every position should be degenerate, with 4% each of the three other bases. For example, every position specified as an A should be 88% A and 4% each of T, G and C.

Scale: Because synthesizing this oligo will probably require a dedicated run, we would like to obtain as much as can easily be prepared in a single run.

(Calculation: 1 micromole of a 50 nt fragment is 6.02 x 10^17 molecules, with mass = 15 mg.)

Modifications: None.

Purification: For our experiments it will important to minimize the fraction of oligos that are not full length. What purification methods do you recommend for oligos of this length, and what contamination level should we expect?

Synthesis and sequencing of degenerate USS

One of our planned experiments (for our grant proposals) will thoroughly characterize the sequence bias of the H. influenzae DNA uptake system. We'll do this by having the cells take up plasmids from a pool containing many degenerate versions of the canonical preferred sequence, the USS, and then comparing the USS sequences of plasmids that were taken up with the sequences in the input plasmid pool.

What are the steps (the challenges)?

synthesizing the degenerate USS
putting the degenerate USS in a plasmid vector
having cells selectively take up plasmids from the pool
reisolating the plasmids that have been taken up
sequencing representative USS from the two plasmid pools (input and taken-up)
interpreting the sequences (characterizing the sequence diversity)

What I've thought of so far:

Steps 1 and 2. We'll obtain the degenerate USS as a pool of degenerate oligos, each with a 20 nt adapter on their 3' end. We can then make double-stranded versions using a sequence complementary to the adapter as a primer, using Klenow polymerase or Taq. Taq will leave A overhangs on both 3' ends and we'll use these to ligate the USSs into pTOPO or another TA-cloning vector. We'll order the oligos from a supplier, specifying the 30 nt USS consensus and flanking adapter sequence, and that each position should have 3% of each of the other nucleotides. They will probably need to be synthesized as a special batch, using specially prepared input nucleotide mixes (e.g. for A positions, 91% A and 3% each of G, T and C).

The 9% degeneracy was chosen to give an average of about 3 non-consensus bases in each USS, but I haven't done the math to predict the expected distribution. This is essential because we don't want the mix to consist mostly of USS variants that are just as good as the consensus. OK, the USS is 30 nt long, with 7 positions that show no consensus (bases at these positions probably don't influence uptake at all). So 21 positions that matter, with some probably mattering much more than others. That means the average USS will have only 2 non-consensus bases at positions that matter, and many will have the perfect consensus. I suspect this is too low - I'll work it out with the post-doc today.

Step 3. The cells need to be as competent as possible (to maximize uptake and recovery), but they also need to be given the DNA at a concentration that maximizes their selectivity. So they shouldn't be given so much DNA that the uptake system is saturated. 200 ng/ml of DNA is saturating for standard preps of competent cells, so we'll give these cells 100 ng/ml. We usually have ~ 10^9 cells/ml, so if they each take up 10 plasmids we'll have a maximum of ~10^10 plasmids to reisolate and sequence (assuming no losses). Depending on other constraints, the whole experiment could also be done with short USS-containing fragments rather than plasmids.

Step 4. Reisolating the plasmids that have been taken up may be easiest if we use a rec-2 mutant, because these cells can't take DNA past the first stage of uptake. The post-doc is getting ready to do some preliminary tests of DNA recovery, using radiolabeled DNAs. If the plasmids remain un-nicked they can be isolated using a miniprep kit. If we can't reisolate them away from the chromosomal DNA we can instead PCR-amplify the USS sequences, using flanking plasmid and adapter sequences for the primers and the minimum number of PCR cycles to maintain diversity (to reduce amplification artefacts). We'd better think carefully about much diversity our recovered DNA will have.

Step 5. The post-doc and I had a long discussion about the sequencing yesterday. For thorough sequencing of the input and taken-up pools we'd probably use one or two Illumina lanes - I need to be coached on the details of how this works. But for preliminary characterization (for the grant proposals) we could get by with much less sequencing - even sequencing just 100 of the recovered USSs should be enough to demonstrate that things are working. If the recovered USSs are in plasmids, we can just transform these into E. coli and do conventional Sanger/capillary sequencing of some inserts - this wold be inefficient but not overly expensive. If the recovered DNA is in short linear fragments (from PCR?), I was originally suggesting that we'd ligate these end-to-end into blocks of 5-20 (depending on size) and clone these for conventional sequencing. But the post-doc pointed out that big problems can arise when sequencing repeats, so cloning and sequencing them singly might still be faster and cheaper than troubleshooting these problems.

Step 6. Analyzing 100 preliminary sequences for the grant proposal data will be no big deal, but analyzing many thousands from the Illumina runs will require a more sophisticated approach which I haven't thought through yet.

First uptake experiment

Didn't work.

The plasmid DNA minipreps didn't contain any plasmid at all (except for the positive control that I'd seeded with a known amount of plasmid). But this might just be because my frozen competent cells were old and not very competent at all. Their transformation frequency was only about 2.5 x 10^-4, so even if the competent cells had taken up supercoiled plasmid I probably wouldn't have been able to detect it in my minipreps.

I may repeat the experiment one time, using fresh competent wildtype and rec-2 cells. (The rec-2 mutant cells can't transport DNA out of the periplasm, so may give higher sensitivity.) Or I may decide that this experiment shouldn't be high on my priority list because it isn't something the reviewers of our previous proposal were concerned about.