Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
Manuscript drags on
Analysis of uptake sequences by score
The top set are the logos for 5381 N. meningitidis DUSs. The numbers are different than in yesterday's post because I realized I had been analyzing a N. gonorrhoeae data set. The overall picture is the same for N. meningitidis and N. gonorrhoeae - low-scoring DUS retain strong consensus for most of the central positions but have only very weak consensuses for the other positions. The drop-off is quite steep. The shapes of the logos are about the same for all the occurrences with scores lower than about 0.95.
The H. influenzae dataset is even more skewed; almost 60% of the USSs have perfect scores, and about 8% have zero scores. But the consensus decays fairly evenly across the positions, and even the zero-score occurrences have the full motif. Like the N. meningitidis DUS, the shapes of the USS logos are about the same for all occurrences with scores below 0.95.
I think the question in my mind was whether there is a obvious place to draw a line between 'real uptake sequence' and 'degenerate sequence that doesn't deserve to be treated as an uptake sequence'. Unfortunately the analysis is complicated by the different sizes of the datasets - the N. meningitidis set has almost twice as many sites as the H. influenzae set.
OK, I've dug out another set of H. influenzae runs, done with a high 'expected' setting to maximize the number of sites found. This has 3466 USSs, with a lot more having zero scores than in the previous set. Now the first and last Gs in the core are seen to be weaker in USSs with low scores, though not in the larger set of USSs with zero scores. Overall the consensus still remains constant as the scores and consensus strengths decrease. Notably, the flanking AT-rich segments remain as important in poorly matched USSs as the core does.
Poster
While there I sat down with my former post-doc to discuss our manuscript on uptake sequence variation. We agreed that it needed major reorganization more urgently than it needed simple editing and polishing, so we worked out a new structure that we think will hold the ideas together much better. I made some new figures (cartoons of our explanation and of how our simulation model works) and rearranged the text into its now order, but I needed a paper copy to do any serious editing. Now I'm home I've printed out the rearranged draft and am hoping to do one quick pass through it and then set it aside till our grant proposals are done.
But of course I immediately got distracted by the data. I want to be able to say something about how our new analysis of uptake sequences as motifs gives us insight into their evolution. The most promising angle is the distribution of good and poor matches to the motif, which I can analyze because the Gibbs Motif Sampler assigns a score to each occurrence it finds.
I dug out a set of 4646 DUSs Gibbs found in the N. meningitidis genome, and sorted them by score. And now I've spent a lot of time trying to force Excel to draw a proper histogram. I get the histogram values using a math teachers' web site called Illuminations, and paste it into Excel, but Excel refuses to use the data ranges as the X-axis (instead using its line numbers). I've found a work-around - the graph is very ugly but here it is. The red bars are the numbers of DUS with each score range (0-.02, .02-.04, etc) and the lilac bars are the cumulative numbers with increasing scores. So about 0.5% of DUS have zero scores, very few have non-zero scores lower than 0.5, about 2% have scores in each category from 0.5 to 0.98, and more than 40% have scores greater than 0.98.
I made weblogos for the top-scoring 50% and bottom-scoring 50% of the DUS occurrences (my previous analysis had only looked at the high-scoring ones). Here they are; the bottom 50% logo isn't evenly weak at all positions, instead it's quite strong at some positions and very much weaker at others. I don't know what I'm going to do with this analysis... I guess I could make a range of logos, maybe for the 10 deciles (is that the word I want?), to see how the consensus decays. And I should probably do the same thing for the H. influenzae USSs.
Controlled nicking of supercoiled plasmid DNA
Last week I tried to introduce single-strand nicks into a closed circular plasmid (pUSS-R) by doing restriction digestion in the presence of ethidium bromide (background is here). I tested two different dilutions of HindIII, a restriction enzyme that should cut this plasmid once, with three different concentrations of ethidium bromide, to find conditions where a convenient range of incubation times gave a range of partially digested molecules.
I was hoping to see what's shown in the upper gel drawing - appearance of a novel band that migrated slower than the fully digested DNA in the rightmost lane and much slower than the supercoiled DNA in the leftmost lane. This new band would contain plasmid that had undergone a single-strand nick at its single HindIII site. I didn't know whether I might also see eventual appearance of linear DNA. But instead I saw what's shown in the middle gel drawing - gradual appearance of a linear-sized band as the supercoiled band disappeared.
I wondered if the HindIII now being sold no longer causes nicking (maybe New England Biolabs has 'improved' their HindIII clone...). So I did the control shown in the lower gel drawing, incubating pUSS-R with a very low concentration of DNase I, an enzyme widely used to create nicks in double-stranded DNA. Again I expected to see a novel, slow-migrating band, but instead saw only a linear-sized band. Damn!
I also ran samples from a few old plasmid preps. Most of these contained several bands, but I don't know if the slowest one is relaxed circles or dimers, because the plasmids may have been prepared from rec+ cells. I've now also checked the old literature, just in case I was mistaken in expecting nicked/relaxed DNA ("form II") to migrate slowly, but indeed it should (see for example the marker lanes in Fig. 2 on PNAS 86:1309).
Now what? Is the problem that the nicked DNA migrates at the same speed as linear DNA? Is this specific to this particular plasmid? Or do I not have any nicked DNA? Should I try another plasmid?
What to propose to NIH?
Goal: To fully characterize all of the biases and sequence specificities of transformational recombination in H. influenzae.
Specific questions to answer:
- Is DNA binding a distinct step that precedes the initiation of DNA uptake? If so, does it have the same sequence specificity as DNA uptake? Does it have any topological specificity?
- What is the complete sequence specificity of DNA uptake? How absolute is the requirement for a good match to the USS consensus at uptake (are non-USS DNAs taken up at lower frequency or not at all)? This will be investigated with plasmids or short fragments containing 12% degenerate USSs, and with ones containing completely random sequences (we could create these or just use fragments of unrelated DNA). Does uptake have any topological specificity?
- Does the translocation step impose any sequence specificity? How strict is the requirement for a pre-existing free end (does circular DNA sometimes get cut or nicked in the periplasm, or transported intact into the cytoplasm?
- Is there any sequence specificity to the DNA degradation that accompanies uptake and translocation?
- What proteins interact directly with DNA during binding, uptake and translocation?
- What recombination biases affect indels? (How efficiently are different indels transfered by recombination?
- What are the recombination biases along the full length of the chromosome?
- How does mismatch repair affect the outcome of transformational recombination? How much of the recombination bias found by #7 is due to mismatch repair?
Proposal priorities
Specific Aims:
- "What is the H. influenzae uptake specificity? A pool of USSs that have been intensively but randomly mutagenized and then selected for the ability to be taken up by competent cells will be sequenced to fully specify the uptake bias." I've been getting info about designing and ordering the degenerate oligos. The post-doc had already set up a spreadsheet that does the calculations I wanted to think about, and he and I agreed that we should start with 12% degeneracy rather than 9%. This will reduce the fraction of oligos that are strongly preferred, giving more sensitivity for detecting weaker effects. I've had replies from two custom-oligo companies, and the good news is that our degenerate oligo pool will be easier to get and much cheaper than I had expected. For the proposal we'll only need to do some conventional sequencing, and as our USSs will already be in plasmids we think we'll just sequence each plasmid insert separately. This will be wasteful but not very expensive if we do the DNA reactions and cleanups ourselves, and probably cheaper when we consider the time/money we'll save by not having to troubleshoot a more 'efficient' sequencing strategy.
- "What forces act on DNA during uptake? Laser-tweezer analysis of USS-dependent uptake by wild type and mutant cells will reveal the forces acting on the DNA at both the outer and inner membranes." My physicist collaborator is keen to have me back to get this working. The only think I could aim to get done before September 15 is to attach some chromosomal DNA to the styrene beads and show that Bacillus subtilis will bind to it and pull on it in the tweezers apparatus, and that H. influenzae doesn't stick nonspecifically to beads with no DNA on them (a concern raised by a reviewer). The biotin-linked chromosomal DNA I'll use for this will also be used in other preliminary experiments.
- "Does the USS polarize the direction of uptake? Using magnetic beads to block uptake of either end of a small DNA fragment will show whether DNA uptake is symmetric around the asymmetric USS." Our 1 micron paramagnetic streptavidin beads are on their way, and I'll use these to check for non-specific binding of cells to beads and for specific binding of competent cells to beads with DNA on them (using the same DNA prepared above). I found the source of the 50 micron beads and will order them tomorrow; maybe I can use them in the same way.
- "Does the USS increase DNA flexibility? Cyclization of short USS-containing fragments will reveal whether the USS causes DNA to be intrinsically bent or flexible, and whether ethylation or nicking can replace parts of the USS." I'm going to try the nicking protocol tomorrow.
- "Which proteins interact with incoming DNA? Cross-linking proteins to DNA tagged with magnetic beads, followed by HPLC-MS, will be used to isolate and identify proteins that directly contact DNA on the cell surface." We'll have the DNA on the big magnetic beads (see 3 above) and can use this to try out the formaldehyde cross-linking. It would be good to show that we can distinguish between non-specific proteins or peptides (independent of both DNA and competence) and peptides cross-linked only when cells are competent and the beads have DNA on them. One of the reviewers really liked this part, but thought we should be more ambitions and propose more diverse approaches to this problem, including in vitro cross-linking to purified secretin.
- "Which proteins determine USS specificity? Heterologous complementation with homologs from the related Actinobacillus pleuropneumoniae (which recognizes a variant USS) will identify the proteins responsible for this specificity." It would be good to have results of some complementation experiments with single gene plasmids, as the whole-operon plasmids seemed to cause growth problems.
Does EthBr turn restriction enzymes into nickases?
Restriction Enzyme Nicking Reactions. Covalently closed circular pBR322 DNA was nicked with restriction endonucleases HindIII, Cla I, or BamHI by incubating 10 pkg of plasmid DNA in a 100-Al solution of 20 mM Tris HCl, pH 7.8, 7 mM MgCl2, 7 mM 2-mercaptoethanol*, gelatin* (100 Ag/ml) and a concentration of ethidium bromide (50 ug/ml, 75 ug/ml, or 100 ug/ml, respectively) determined by titration to give an optimal level of nicking. An amount of restriction endonuclease was added sufficient to convert 50-90% of the input DNA to an open circular form on incubation at room temperature for 2-4 hr. The nicking reaction with the EcoRI enzyme consisted of 100 mM Tris'HCl, pH 7.6, 50 mM NaCL, 5 mM MgCl2, gelatin* (100 ug/ml), ethidium bromide (150 ug/ml). Reactions were stopped by addition of excess EDTA followed by phenol extraction and ethanol precipitation.I have lots of supercoiled plasmid that has one EcoRI or BamHI or HindIII site, and a big bottle of 1 mg/ml EthBr. So I'll just set up a series of digests with different concentrations of EthBr and restriction enzyme and the standard digestion buffers*, and run a gel to see what I get.
*In the old days we used to add mercaptoethanol and gelatin to stabilize our restriction enzymes, but I won't bother.
Email to potential suppliers of a degenerate USS oligonucleotide
I'm writing to inquire about ordering a large highly degenerate oligonucleotide.
We're hoping to obtain an oligonucleotide that will be about 50 nt long and 12% degenerate at every position (so this will really be a population of degenerate nucleotides). Below I've listed specifications.
Would you be able to synthesize such an oligonucleotide for us? If so, could you let me know the estimated cost and time frame, and any other issues you think we should consider?
Thanks very much,
Rosie Redfield
Sequence: Not yet finalized, but probably similar to: AAAGTGCGGTTAATTTTTACAGTATTTTTACTGGTACCATATGAATTCCA
Degeneracy: Every position should be degenerate, with 4% each of the three other bases. For example, every position specified as an A should be 88% A and 4% each of T, G and C.
Scale: Because synthesizing this oligo will probably require a dedicated run, we would like to obtain as much as can easily be prepared in a single run.
(Calculation: 1 micromole of a 50 nt fragment is 6.02 x 10^17 molecules, with mass = 15 mg.)Modifications: None.
Purification: For our experiments it will important to minimize the fraction of oligos that are not full length. What purification methods do you recommend for oligos of this length, and what contamination level should we expect?
Synthesis and sequencing of degenerate USS
- synthesizing the degenerate USS
- putting the degenerate USS in a plasmid vector
- having cells selectively take up plasmids from the pool
- reisolating the plasmids that have been taken up
- sequencing representative USS from the two plasmid pools (input and taken-up)
- interpreting the sequences (characterizing the sequence diversity)