Field of Science

Planning the DNA sequencing part of the PhD student's project

The former post-doc (I'll call him the FPD) visited yesterday afternoon, and we had intense discussions of how to proceed with both the RNAseq work (summarized here on our Sense Strand blog) and with the PhD student's planned DNA uptake experiments.

His planned experiments take advantage of the phenotype of a rec2 knockout mutation.  These cells take up DNA normally across the outer membrane, into the periplasmic space, but they cannot transport it across the inner cell membrane.  This allows him to recover intact DNA that has been taken up, and to use DNA sequencing to compare it to the input DNA the ∆rec2 cells were given.

Some of the experiments will use genomic DNA of the species being tested, fragmented to appropriate length distributions, and some will use synthetic DNA fragments (~200 bp) containing a 30-50 bp stretch of random sequence (see figure).

The FPD, who developed the synthetic fragment protocol, pointed out that his experiments had used full lanes of Illumina sequencing only because it was not then possible for us to 'barcode' our different DNA samples and mix them for sequencing as a single lane.  The sequencing depth he obtained was useful, but it will be extreme overkill for the experiments the PhD student plans.  So we need to design barcoding into our analyses, so we can mix up to 24 samples in one lane for sequencing, and then separate the resulting sets of sequence reads by their different barcodes.  We'll still need to use two lanes, because each 'recovered' sample will need to have a corresponding identical 'input' sample.  Because these samples will have the same barcode they could not be distinguished if they were sequenced in the same lane.

So rather than doing one very-deeply sequenced experiment, he'll be able to do multiple replicates, each sequenced at a moderate but entirely adequate depth.  If he uses a HiSeq machine for the sequencing, he'll be able to get 1.6 x 10^8 reads for each of 12 samples; with a NextSeq this would give 4 x 10^8 reads per sample. (Is that right, per sample, not per lane?).

One issue to keep in mind is that it would be foolish to save all the sequencing for one big batch at the end of the thesis work.  Instead the work needs to be designed with an initial set of samples to be sequenced, so he can (1) tell whether everything is working as it should, and (2) begin analyzing sequence data from one part of the project while generating additional samples for other parts.  For a preliminary batch of sequencing, it might be better to use a MiSeq machine, whose smaller capacity would let us sequence a few samples more economically.

We also talked about how long the random-sequence segments should be in the 200 bp fragments, and about where to locate the barcode segments.  These consist of an independent sequencing primer followed by 8 bp that identify the source experiment.  Putting these to the right of the random segment will let him efficiently create the double-stranded 200 bp fragments, using the same long left-side oligo (containing the random segment) with many different right-side oligos, each containing a different barcode.

Sensitivity of the PhD student's planned analysis

The PhD student is proposing to use Illumina sequencing of input and recovered-after-uptake DNAs to detect possible biases in uptake of DNA by bacteria other than H. influenzae.  (This is a simplified version of the analysis proposed in our funded NSERC proposal.) We're discussing the factors that will affect the sensitivity of this analysis, so he can say how strong a bias would have to be in order for his experiment to detect it.

The factors we've thought of are:

A. Nature of the preferred sequence pattern: 
  1. How long is it (3 bp? 10 bp?)?  How specific is it (e.g. is each base specified, or just 'purine' or 'pyrimidine'?  Together these determine how often this pattern will occur in the input DNA (by chance or due to uptake bias-drive).
  2. How strong is the bias favouring uptake of fragments containing this pattern?  How strict is the preference (are variants of the specified pattern also taken up, but less strongly)?  Are fragments with more than one occurrence of the pattern more likely to be taken up?
 B. Properties of the input DNA:
  1. If this is genomic DNA, what is the size range of the fragments?   The sensitivity of the experiment will be low if the fragments are so large that each has at least one occurrence of the preferred pattern.
  2. If this is a synthetic fragment containing a fully degenerate segment, how long is the degenerate segment?
C. Sequencing coverage:
  1. How high is the sequencing coverage?  Is it the same for the control input DNA and for the recovered DNA?  This will determine the noise due to random factors.  
  2. Does the error rate of the sequencing matter?
  3. For genomic input DNA, are there position-specific differences in coverage across the genome?
  4. For degenerate-fragment DNA, are there non-random factors in the input DNA or in its sequence-ability?
He's going to start by working through the values for a very-strong-bias case, detecting the H. influenzae uptake sequence in genomic DNA (figure below), and then relaxing the inputs.

Mutagenesis plans

(I'll add some explanations later.)

1.  Mutagenize more RR805 DNA, using a range of high EMS doses (10, 15, 20, 25, 30, 40 min in 50 mM).  Transform this DNA directly into competent KW20 (without EMS inactivation or DNA purification) and select for CmR and maybe for NovR.

2.  Mutagenize RR805 cells, using a range of high EMS doses (from expt. #180, 80 mM for 1 hr gives ~10^-2 survival).  The cells don't need to survive, because I'll just grow the culture for a couple of hours and then extract all the DNA and use that DNA to transform KW20 to CmR.

For both 1 and 2, then pool CmR transformants and transform at low cell density to StrR with RR514 DNA.  Test individual StrR colonies for hypercompetence by colony transformation with MAP7 DNA.

3. Mutagenize NovR and NovS PCR fragments (made by the sabbatical visitor), using the same EMS concentrations as in experiment 1.  Then test the effects of the EMS mutagenesis by transforming each DNA into KW20, looking for gain of NovR in cells transformed with the NovS DNA, and loss of transforming ability of the NovR DNA.

I can do experiments 1 and 3 today (if I first pour lots of plates).  I can then do Experiment 2 tomorrow or on the weekend, once the cells have grown up.


1.  I must have put too little chloramphenicol in the Cm plates for this experiment, because all the cells grew on the Cm plates.  I need to repeat this experiment.

3.  Increasing exposure to EMS caused decreased transformation by the NovR fragment, as it should, but the corresponding exposures of the NovS fragment gave no NovR transformants, indicating no detectable mutagenesis.  So the decrease seen with the NovR fragment may just be due to damage, not mutation.

2.  My streak of RR805 cells has grown nice little colonies.


I've inoculated one of the RR805 colonies for an overnight culture, so I will be able to do the experiment 3 cell mutagenesis tomorrow.  And tomorrow I'll make lots and lots of Cm plates, with the right amount of chloramphenicol, so I can also repeat experiment 1.

No new candidate mutants (sigh...)

As I planned here, I pooled the CmR colonies resulting from transformation with EMS-mutagenized CmR murE+ DNA, and grew them to log phase (OD600 ~ 0.1).  The murE+ cells in the pool should have been non-competent under these conditions, but any murE* hypercompetenc mutants should have been competent.  To select for these mutants I transformed the cells in each pool with DNA carrying a streptomycin-resistance mutation, and plated on Str plates.  One pool gave several hundred StrR colonies (many more than I would have expected as transformants), but the other pools had very few or none (4 total). I then screened individual StrR colonies by mixing them with dilute NovR DNA and plating on Nov plates.

Unfortunately none of the StrR colonies transformed to NovR at the high frequency seen for the positive control (murE749) colonies.  In fact, none transformed any better than the murE+ negative control colonies.

This is a bit surprising, given that the 2-fold higher level of EMS mutagenesis reduced by 100-fold the ability of the CmR cassette to transform cells, and the 4-fold higher level eliminated it entirely.  I had assumed that this reduction/elimination was due to too-heavy mutagenesis, but perhaps it was a direct consequence of the DNA damage.  One possible explanation I'm considering is that damaged DNA is almost always repaired or destroyed, and rarely gives rise to recombinants.  Another possibility is that, when cells are mutagenized, the mutations arise mainly only when levels of damage are so high as to overwhelm the repair systems, allowing the damaged bases to be used as templates for DNA replication.  Maybe this also requires induction of the error-prone DNA polymerase.

So now the sabbatical visitor and I are designing a control experiment, to test whether this direct DNA mutagenesis is working as we think it should.  We're going to mutagenize two versions of a DNA fragment containing the gyrB locus.  One is wildtype, and the other has the novR allele we usually use in our transformation assays.  We expect the transformation efficincy of the novR allele to decline with high doses of EMS, and we hope that now novR mutations will arise from high doses to the wildtype allele

* Here's some wishful thinking: Ideally we should be selecting for a G->A transition mutation because those are what EMS induces best.  But we're using novR (G->T) because we have the porimers handy and know they work.  The mutation spectrum of EMS is reported to be much broader with the in vitro mutagenesis we're using, so we hope this will work.  But I just checked the numbers and they didn't see ANY of the kind of change we'd need.

Really we should use selection for streptomycin resistance, since its T->C mutation is a type that arose at high frequency with the in vitro EMS treatment.  I wonder if we have the primers for this - I think the post-doc might have gotten them for us.

Mutagenesis results

I don't have any novobiocin-resistant transformant-mutants after 24 hr (though slow-growing colonies might appear later), so I can't use that to tell how effective the mutagenesis was.  But I have tons of chloramphanicol-resistant ones at the low exposures to EMS (2, 5 and 13 minutes), 100-fold less at 30 minutes exposure and none at 60 minutes exposure (the highest dose).  This tells me that the EMS was doing its job, and that the DNA damage caused many potential transformants to have lethal mutations either in the CAT cassette or in nearby genes in the recombination tract.

So I think I'll go ahead and make pools of colonies from the 5-min and 13-min treatments and enrich them for hypercompetent mutants by selecting for StrR transformants in log-phase cultures. Then tomorrow I can screen these for hypercompetence by our crude colony-transformation assay.

Why not also the 2-min treatment?  OK, I'll include one pool of those too.

I'll have four five pools (10^4 and 10^5 transformants from each of the two treatments), which will be easy to handle.  What control cultures should I include?  RR805 (murE+) will give negative control colonies, and RR797 (murE749) will give positive control colonies.

* One reason to not use the ~1000 CmR colonies from the 30-min dose is that these are less likely to have recombination tracts extending all the way from the CAT cassette to murE.  That's because this segment contains two essential genes (ftsI & ftsL), and recombination tracts that cover the CAT-murE distance are much more likely to have had a lethal mutation in one of these genes than are tracts that don't reach to murE.

Mutagenesis planning in progress

By midday today I'll have checked my strains and made my DNA. 

The strains are RR805, which has a CAT cassette linked to the murE+ gene (normal competence), and RR797, which has the same cassette linked to the murE749 hypercompetence allele and a StrR point mutation elsewhere in the chromosome. I've checked their antibiotic resistances, done platings that will confirm their competence phenotypes (will count colonies this morning), and made crude DNA preps (I'll complete purification this morning).

Next I should do the mutagenesis dose-response curve, and I've now realized that this experiment can also be used for the first hunt for more hypercompetence mutants. 

Mutagenesis (today?):

Set up one tube containing 12 µg RR805 chromosomal DNA in 120 µl water or TE, at 37 °C.

Take a 20 µl time = 0 sample (see below).

Add EMS to the remaining DNA, to a final concentration of 50 mM. 

Take samples at time = 2, 5, 12, 30 and 60 minutes.  Immediately add each sample (including t = 0) to 100 µl of 5% sodium thiosulfate, which will inactivate the EMS and stop the mutagenesis.

The t = 2  sample will have had about 6-fold less exposure to EMS than used by the Lai et al. paper, and the final sample will have had 5-fold more.

Add NaCl to each sample to 0.15 M and add 2 volumes of ethanol to precipitate the DNA.  Rinse the pellets (probably invisible) with 70% ethanol and air dry.  Resuspend each in 50 µl TE.  (If the invisibility of the pellets is a problem I could add some E. coli DNA as carrier, since this won't interfere with the subsequent transformations.)

Transformations (today):

Thaw out lots of vials of frozen competent KW20 cells (wildtype).  I need one tube for each of the 6 DNA samples, and also one for RR797 DNA (chloramphenicol resistance control) and one for MAP7 DNA (transformation control).

Add 2.5 µl (= 100 ng) of each DNA to a tube containing 1 ml of cells.  Incubate for 15 min at 37 °C.

Add 3 ml sBHI and incubate for 90 min longer, to allow expression of the chloramphenical resistance.

Dilute and plate on plain plates (10^-6, 10^-5), Nov1 plates (for low-level novobiocin resistance, plate undiluted and 10^-1) and Cm1 plates (plate 10^-3, 10^-2, 10^-1 and undiluted).

Freeze the remaining transformed cells in case I want to do more with them later.

Analysis and next steps (Friday):

Use the colony counts to assess the extent of mutagenesis and gene inactivation.  For doses that gave high NovR mutagenesis without reducing the CmR transformation rate, make pools of the CmR colonies from plates that have >1000 colonies (one pool per plate). 

Then I cna grow each pool to early log in sBHI and transform it with StrR DNA to enrich for hypercompetent mutant.

Then I'll screen individual StrR colonies for hypercompetence by mixing them with MAP7 DNA and plating on Nov.

What if I don't get any NovR mutants? 

My previous use of EMS, mutagenizing cells, not DNA, gave NovR mutants at about 10^-6 of the survivors.  If this was the level of NovR mutations in my mutagenized DNA, the transformation assay probably wouldn't detect their presence because only about one cell in 1000 will have recombined the nov-containing DNA fragments, giving a transformation rate of 10^-9, below the detection limit.  But I expect the mutation rate to be much higher for the pure DNA, so I'm hoping that I'll see significant increases in resistant colonies.

 If I don't?  I could just go ahead and screen a couple of the high-dose CmR pools for hypercompetent mutants anyway, since if I find some then I can just forget about the Nov test.  If I don't find any hyprecompetent mutants I should repeat the mutagenesis using a NovS DNA fragment as control.

What mutation rate do I want for my experiment?

I need to decide on a desirable mutation rate for my murE mutagenesis experiment (described here).  To do this I need to think about (at least) how big the gene is, how large a region of the gene I want to investigate, what fraction of mutations will interfere with or eliminate gene function, and what fraction of mutations might cause hypercompetence.

How big is the gene?  1467 bp (489 aa).

Are hypercompetence mutations  equally likely to occur anywhere in the gene?  The mutations we have are in domain 3, at amino acids 361 and 435, so maybe other mutations would be nearby.  But maybe not.  Let's first consider the whole gene, and then decide * if focusing on the last third of it would make any difference.

What are the expected frequencies of mutations with different effects?  About 50% of random base changes change an amino acid (surely someone has done this calculation...).  Since all three of our known mutations change an amino acid, let's assume that silent mutations don't affect competence. About 34% of random amino acid changes interfere seriously with protein function (Guo et al. 2004).  Our known mutants appear to have normal MurE catalytic function, and defective mutants will not show up in our screen because murE is an essential gene.  So that leaves about 1/3 of all the mutations as causing well-tolerated amino acid substitutions.

What fraction of well tolerated amino acid substitutions cause hypercompetence?  We know of three that do.  How many different amino acids can each codon mutate to?  Probably about 9 or10 on average.  So let's say we have 500 codons of interest, that's about 5000 different possible amino acid substitutions.  About 2/3 of these will be well-tolerated.  So we know that 3 out of 3,300 amino acid changes cause hypercompetence.  Other mutations may cause hypercompetence too, but since half the mutations will be silent, this lower-bound means that at least 1/2000 colonies with a single murE mutation can be expected to be hypercompetent.  That's pretty good odds, given that our transformation-selection step can enrich 1000-fold for hypercompetence mutations.

So an average of 1-2 mutations per kb should give us easy-to-find hypercompetence mutations. Will higher mutagenesis give us more? Issues to consider:
  1. More mutations means more non-tolerated mutations, which means that some hypercompetence mutations won't be seen because their cells died.  I don't think this is a big deal, unless we made the mutation rate very high.
  2. More  mutations means more irrelevant mutations in each gene we sequence.  This is important.  Inference will be greatly simplified if genes from hypercompetent cells have only one mutation.  So it's probably best to  use the lowest level of mutagenesis that will give us easily-detected mutants.
The Lai et al paper had 5-6 mutations per kb. This is probably too high for us.

Another concern is mutations in the genes between the CAT cassette and murE.  Some of these are essential, and mutations in them will reduce the frequency of recovering viable transformants that contain both the CAT cassette and murE.  This is another reason to go for a low mutation rate.

* Back to a previous point.  Does it matter whether we want to screen only the last third of the gene?  No, because we don't have any way to isolate this from the rest.