Field of Science

A new kind of problem

The PhD student and I are analyzing the data from his mapping of H. influenzae's uptake of genomic DNA.  

The data was generated by Illumina sequencing of genomic DNA samples before and after they had been taken up by competent cells.  Using a rec2 mutant as the competent cells lets us recover the DNA intact after uptake.

He has done quite a bit of analysis of the resulting uptake ratios (ratio of 'recovered' coverage to 'input' coverage) but now we're going back and sorting out various anomalies that we initially ignored or overlooked.

One big issue is the surprisingly uneven sequencing coverage of the genome that we see even when just sequencing sheared genomic DNA (the 'input' samples).  The graph below shows the sequencing coverage of the first 10 kb of the genome of strain 'NP'.  The orange dots are from a short-fragment DNA prep (sheared, most fragments between 50 and 500 bp) and the blue dots are from a large-fragment DNA prep (sheared, most fragments between 1-5 and 9 kb).

Over most of this segment (and over most of the ~1900 kb genome), coverage of the 'short' sample is about 200-400 reads, about twice as high as the ~150-250 read coverage of the 'long' sample.  But in three places coverage of both samples falls to near-zero or zero.  Similar losses of coverage occur at many places throughout the genome, though this segment has more than usual.

What could cause this?  In principle it could be something about the DNA preparations, about the library construction, about the actual sequencing, or about the data processing and alignments to the reference genomes.  
  1. DNA prep artefacts: Long and short DNAs come from the same initial genomic DNA prep.  They were sheared differently: a Covaris G tube for the long-fragment prep, sonication for the short-fragment prep.  Perhaps there are places in the genome where breaks always occur?
  2. Library construction artefacts: This used Illumina Nextera XT kits (a transposon method).  Might there be strong sequence biases to this process?  The distribution of the spanning coverage of the reads indicates that the short library inserts were mostly 100-350 nt (mean about 200 nt) and the long library inserts were mostly 200-500 nt (mean about 325 nt).
  3. Sequencing artefacts:  The former post-doc used an Illumina NextSeq500 to collect 2-3 million paired-end reads of 2x150nt from each sample.  
  4. Analysis artefacts: We've been tracking down an artefact caused by indel errors in another genome used in this experiment, but the NP strain I'm analyzing now doesn't have these errors.  However it's possible that something about this reference sequence is causing reads that align to the dip positions to misalign or be discarded.

Very few fragments in the short DNA prep were longer than about 500 bp, and few library molecules had inserts longer than 350 bp, but one of the dips in coverage extends over more than 1 kb.  This means that the sequence feature responsible for this dip must either extend over more than 400 bp, or must cause loss of sequence over this distance.  

One thing I noticed is that the short coverage is more severely affected than the long coverage.  It falls lower and extends broader.  This is the opposite of what I would expect if reads containing a particular sequence were unsequenceable or misaligned.  This pattern shows very clearly in the graph below, which shows the ratio of coverage by the long-fragment and short-fragment samples. We see that the long coverage is generally about 2-fold higher than the short coverage, but that the ratio flips and falls to zero or near zero at the three low-coverage segments.


One thing we need to do is examine the locations of reads that were discarded because of low quality scores, either because of sequencing problems or because of ambiguous mapping (e.g. rDNA repeats).  Finding that reads at the dip positions were low quality would be a big clue.

We now join a series of experiments already in progress

In the previous post I described the RNA-seq data that indicates that our antitoxin knockout is unlikely to express functional toxin during log-phase growth, because the promoter it would otherwise repress appears to begin transcription well downstream of the toxin start codon.  The graphic I showed was not exactly the right one, but the evidence is good.)

This would explain why the antitoxin knockout grows normally in rich medium, and suggests we should look for a growth defect when the competence promoter is active.  If we find such a defect, we could explain the observed DNA uptake defect by hypothesizing that cells expressing the toxin fail to take up DNA because they're suffering from a general toxicity (likely a blockage of translation or of gyrase activity), not because the toxin specifically blocks DNA uptake without affecting other processes.

So now I'm doing growth time courses, comparing the wildtype strain KW20 with our ∆0659 antitoxin knockout.  So far the results are a bit messy, but suggestive.

Here's a schematic of the experiments:

Cells are grown (shaking, 37°C) at low density in rich medium for at least two hours.  When the culture reaches OD600 = about 0.25. (about 5 x 10^8 cfu/ml) the cells are washed and transferred (at the same density) to a flask of the 'starvation medium' used to induce maximal competence (again shaking, 37°C).  After 100 minutes the cells are diluted 10-fold into a flask of fresh rich medium, and growth is monitored for several more hours.  

And here's a schematic of the possible outcomes:

It's important to consider the outcome if not all cells are affected by the toxin (Outcome C), because we know from 'congression' experiments that many cells in 'competent' cultures cannot be transformed.  It's hard to give precise numbers because we don't know all the variables, and we don't know if the nontransformable cells in these cultures are expressing the competence genes or not.  Importantly, this is the same outcome we'd see if growth of all cells was slowed but not eliminated by the toxin. 

Results so far:  I was able to process a cfu-measurement time-point (KW20 and ˚0659 samples) every 10-15 minutes.  I took OD time points a bit less frequently (often not at the same times as the cfu samples).  Here's the cfu/ml data:

The blue lines are KW20 and the orange lines are ∆0659.  The lighter blue and orange lines are the cells in the starvation medium.  You can see that the two strains grow at very similar rates in rich medium, but that ∆0659's growth slows down more severely than KW20's in the starvation medium.  ∆0659 may also take longer to recover from starvation.

This looks like Outcome C.

(The plating data is noisier than I would like - I think the ∆0659 strain might be more sensitive to very minor differences in the agar plates.)

Here's the OD600 data for the same cultures:

But this doesn't look like Outcome C!

For the initial growth in rich medium, the OD600 lines parallel the cfu measurements.  I didn't follow the OD600 in competence medium (I will next time).  For the recovery growth, the lines differ. The OD600 values are the same for KW20 and ∆0659, and the both start increasing immediately at the same rate.  After about an hour the mutant's OD600 falls below that of KW20,

Optical density measurements reflect the amount of biomass in the cultures rather than the number of viable cells the biomass is distributed among.  In principle the difference between the two measures (cfu/ml and OD600) tells us about the size of  the cells.  So this result might mean that the mutant cells continue growing (getting bigger) in competence medium and immediately after transfer into rich medium, but are unable to divide.  This interpretation is complicated by the fact that the competence medium itself allows growth but limits cell division - cells typically form filaments in this medium.

Plan:  I'm going to repeat this experiment (probably twice), focusing on the 30 min before transfer to competence medium and the hour after.  I'll try to take a time point every 10 minutes, and to take OD600 measurements at the same times I take cells for the cfu/ml platings.  

If the results agree with those here, then I think we'll be ready to finish writing the manuscript.

p.s.  I also did a separate experiment to get Bioscreen growth curves, for cultures both before and after transfer to competence medium.  (This is also an OD600-based measurement.) The ∆0659 cells always grow a tiny bit slower than KW20 and the ∆toxin, and double-knockout mutants but their recovery kinetics are indistinguishable from their normal growth kinetics, giving no evidence of any time needed for recovery before normal growth resumes.

I'm actually going to do an experiment!

Last week I went to Philadelphia, to give a talk and spend a few days working with the former postdoc on the toxin-antitoxin manuscript.  This manuscript has a former Honours student as first author; it's has been languishing for most of the past two years, with a few spurts of progress.

Until now we have been thinking that the toxin protein was not broadly toxic to cells, but somehow only prevented them from taking up DNA.  How it could do this was very puzzling, because our RNAseq analyses showed that the competence gene transcript levels reached normal levels under competence-inducing conditions.

The main evidence for lack of toxicity was that the antitoxin mutant grows normally in rich medium, which it shouldn't if the toxin is harmful to growth or viability.  This mutant produces high levels of toxin RNA during growth in rich medium, and because there is no antitoxin present, the resulting toxin protein should be able to interfere strongly with growth.  But it doesn't.

But we discovered that this interpretation is wrong, by looking at the coverage profile of the toxTA operon in wildtype and ∆toxA cultures.  The figure below shows the normalized read coverage of the toxTA region in wildtype (purple) and ∆toxA (green) cultures, in log phase growth ('M0') and after 10, 30 and 100 minutes in the competence-inducing medium MIV ('M1', 'M2', and 'M3' respectively).

No, these are not the right graphs.  I need to go back to the R scripts the other Honours student left me with and create the right graphs.

Does variation in sequencing coverage help explain apparent variation in recombination?

Preamble:  The grad student and I have been looking at the Illumina sequencing coverage of the DNAs used for our DNA-uptake project, and considering how best to exclude from analysis the bits of the genome that consistently have very low coverage.  We're probably just going to exclude all positions that have fewer than 10 reads in the control 'input' DNA sample.  But thinking about this got me thinking about the extent to which position-specific differences in coverage could influence the estimates of recombination frequency in the sister project I'm doing with the former post-doc.

Here's a figure showing how dramatic the variation in coverage is (two 50 kb segments of the genome):

Here's a quick overview of the recombination project.  We transformed the Rd strain with DNA from the divergent NP strain, pooled 10,000 novR colonies, and genome-sequenced the pooled DNA at 20,000-fold coverage to measure the frequency of NP alleles at the ~35,000 SNP positions where the two strains differ.  This gave a genome-wide map of recombination frequency that showed surprisingly high variation.  The graph below shows the reproducibility of this variation across independent samples (one with colonies selected for novR and the other for nalR).

We need to check that some of the apparent differences in recombination frequency at different positions aren't actually due to differences between the sequencing efficiencies of the Rd and NP SNP alleles at these positions.  The former post-doc and I had a Skype conversation about this this morning.  Here's our plan.

He has the control sequencing data:  coverage for each position in the control NP genome sample and in the control Rd sample, each aligned to its own reference genome.  To simplify the comparisons he'll first normalize each data set to its mean coverage.  If we were to plot the SNP-position coverages against each other we'd expect to see something like this:

For each of the ~35,000 SNPs, the ratio of its NP allele and Rd allele coverages (call it the 'bias ratio') tells us how much sequencing biases could influence its apparent recombination rate.  

Now all the former postdoc needs to do is calculate the correlation between the bias ratios and the estimated recombination rates across the genome.  If he sees little or no correlation, then sequencing biases are not contributing to the measured frequencies of NP alleles in the recombinant-genome DNA pool and we can continue to search for the factors that do contribute.  But if he sees a solid correlation then we need to investigate further.

Steps for the former post-doc:  
  1. Normalize control-genome sequencing coverages to their means.
  2. Calculate bias ratios for each SNP.
  3. How much of the recombination variation is explained by the bias ratios? 

Making sense of RNA-seq comparisons

***Hey, it's RRResearch's 10th blogiversary!***

Back to work:

I'm working on the toxin-antitoxin manuscript, and trying to use the RNA-seq data to decide which genes have changed expression in which mutants.  This information should help us understand how the toxin acts to prevent DNA uptake, and what else it might affect.

The comparisons should be straightforward because the former undergrad/summer student left me with a superb set of analyses and R scripts, including EdgeR and DESeq2 analyses comparing expression of different strains at the various time points.  But I'm having a hard time making sense of the results, because some comparisons that I expect to give few significant differences give many, and others give very few.

There are also big inconsistencies between the EdgeR and DESeq2 results.  For example, in one comparison of two mutant strains (taxx vs antx, at time M1), EdgeR finds no genes that are significantly different but DESeq2 finds about 70 genes, with a cutoff for 'significant' that requires both of the following:
  1. must have padj or FDR score less than 0.05<0 .05="" li="">
  2. must have at least a twofold change in expression
In fact, all but two of the genes in this EdgeR comparison have FDR values greater than 0.5, and almost all have FDR values of 1.00.

I had a Skype conversation with the former post-doc this morning, and he suggested an analysis that might clear things up for me.  But I'll need to ask the former student to do it for me, or to modify the R scripts so I can do it.

Step 1:  Identify all the genes whose expression is significantly changed at each of the MIV timepoints (M1, M2 and M3), relative to their expression in log phase in sBHI (M0 timepoint).

Step 2:  Using the same cutoff, examine the genes that differ significantly between different strains at a single time point.  How many of these are also among the 'MIV-induced' set for this timepoint?

If we find that a particular genetic difference (wildtype vs a mutant, or two mutants vs each other) causes changes in the same genes that are changed by MIV in wildtype cells, we could conclude that the genetic difference affects the cellular response to MIV.  If there's no more overlap than we'd expect by chance, then no.

Evidence for a likely sample switch in the RNA-seq dataset (or not)

I've been working on the toxin/antitoxin manuscript, trying to extract all the conclusions from the RNA-seq data trove.  We have two odd results, and I now think they are both best explained by a sample switch in the first set of samples.

One odd result that the former post-doc drew my attention to is that, when the antitoxin is deleted, expression of the competence genes appears to be down at the last time point ('M3', 100 minutes incubation in MIV competence-induction medium).

The other odd result, which I just discovered a couple of days ago, is that, when the toxin gene is deleted, expression of the competence genes appears to be up at the second time point ('M1'; 10 minutes in MIV).

Each of these results is based on the mean of three biological replicates (samples pf the same strains cultured on different days).  I now think that they're reciprocal consequences of the same problem - switched identities of one pair of samples prepared on the same day.

History: I was originally focusing on the apparent up-regulation in the toxin deletion, which I discovered in comparisons of the toxin ('toxx') and toxin+antitoxin ('taxx') knockouts.  I was looking at this comparison because it's the only one where we would expect to see any expression differences that might be caused by action of the antitoxin on genes other than the toxin, and I wanted to know if we could rule these out.

The former summer undergrad had done pairwise comparisons of all the different mutants we'd tested, using both the Edge and DESeq2 packages, so I looked at the Excel files he'd generated comparing the toxx and taxx samples, sorting the expression ratios for each timepoint.  I was quite surprised to see that, with the Edge dataset, the genes with the most extreme expression differences at the M1 timepoint were ALL the competence genes (see below).  But there was no overexpression at the M2 timepoint, contrary to what I would expect if this was a competence-related effect, and inconsistent effects at M0 and M3.

So I was worried that this might be due to a problem with only one of the three replicate samples that had been averaged, so I looked at the before-averaging data.  Initially I suspected that two of the taxx samples (M2_E and M3_E) had been switched.  That might still be true, but it was a small effect compared to the bigger toxx anomaly I found when I plotted bar graphs of competence gene coverage for all the taxx and toxx samples.  The graph below is for expression of the comABCDEF operon in the taxx mutant (∆toxT),  but I found the same anomaly for the other operons:  the M1_A sample has much higher expression levels than we normally see at this time (usually only slightly higher than the M0 timepoint).

Now I was suspicious that this 'A' sample might be misidentified - not at M1 timepoint at all.  So I looked at all the samples that had been prepared on this day (Day 'A').  These samples were prepared by the  former research associate; they were the first RNA prep she did for what turned into the big RNA-seq dataset.  Here's the plot of all her Day A samples.

Consistent with my sample-switch hypothesis, the overly high competence-gene expression levels in the toxx M1 sample is balanced by overly low competence gene expression in the antx (∆toxA) M3 sample!  Again I'm only showing the comABCDEF operon, but pilABCD and comNOPQ show the same pattern.  

So my new hypothesis is that the antx_M3 sample and toxx_M1 sample were switched.  This is a good discovery, because it probably explains both the apparent reduction of competence gene expression at M3 in the antx samples and the apparent increase in competence-gene expression at M1 in the toxx samples.  But it's a big hassle, because if I'm right we'll need to redo all the bioinformatics analyses that involve these samples.  Luckily the summer undergrad is still in the picture, and the R scripts left us with should make this task easy.

But I want to be as confident as possible that my switch hypothesis is correct.  The best prediction is that we should see overexpression of toxT and absence of toxA in sample toxx_M1, and the reverse in sample antx_M3.  

(...Pause while I create this graph...)

But we don't!  

One confounding issue is that the expression scoring detects coverage of the remaining ends of the genes, because they weren't completely deleted.  (I can get the former summer student to look at the actual toxT and toxA coverage for each sample to confirm whether the deletions are present.)  

With this taken into account, I think we see the expression we would expect if the samples were not switched.  For the antx samples, we see elevated expression of toxT and toxA at all time points, and for the toxx samples we see normal (like KW20) expression of toxA and reduced expression of toxT. Importantly, the antx_M3 sample has much higher expression of toxT than the toxx_M1 sample.  So I think my hypothesis must be wrong!

OK, now I've checked the expression of toxA and toxT in the samples from other days, and they're nicely consistent with the expression in the Day A samples.  So I guess the samples are not switched.  DAMN!

So why are the competence gene levels so high in the toxx_M1 sample?  I suppose the research associate could just have been delayed in collecting this sample, so it has expression levels closer to those usually seen at 30 minutes.  (Unfortunately her notebook for this period has been lost.)

Maybe it will all seem clearer tomorrow...

Expression of DNA uptake genes in rich medium - a puzzle

I've been working on the toxin/antitoxin paper.  Right now I'm going through the RNA-seq data for the antitoxin knockout (again!), looking for hints of how unopposed toxin expression prevents DNA uptake.  The two graphs below show mRNA levels of the mutant compared to wildtype cells at the same stage of competence induction (upper panel, 30 min in MIV; lower panel, 100 min in MIV). The green bars are expression in the mutant (unopposed toxin) and the grey bars show the range of expression in wildtype cells. (In the upper panel the grey bars are centered on the mean expression at this time point.)

Conclusion:  Expression of competence genes is normal or near-normal at 30 min (when competence-gene transcription normally peaks), but is substantially lower than wildtype at 100 min (when DNA uptake and transformation peak).

Can this reduction explain the absolute competence defect of the mutant?  I think not.

Some other informative comparisons:  

1.  Compare the antitoxin knockout (∆toxA) to the toxin knockout (∆toxT) and the toxin/antitoxin double knockout (∆toxTA):  At 30 min, competence genes in all three knockout mutant have very similar transcription levels (more similar to each other than to KW20).  But ∆toxT and ∆toxTA have normal competence.  At 100 min some ∆toxA operons are a bit lower than in ∆toxTA (comABCDE is at about 65% and pilABCD is at about 50%).

2  Compare the antitoxin knockout to a hfq knockout:  The hfq knockout (∆hfq) is the only mutant we tested whose competence is reduced but not eliminated; it's MIV-induced transformation frequency is about 10% of the wildtype level.  At 30 min it's competence-gene expression levels are mostly higher than those of ∆toxA, which has no detectable transformation (∆toxA TF is 3-4 orders of magnitude lower than ∆hfq).  At 100 min its expression is overall a bit lower than ∆toxA.

3. Compare the antitoxin knockout to wildtype cells in 'late log': Here's where it gets weird.  We've known for a long time that competence rises when cell growth slows as cultures get dense (peaking around OD = 1-2).  Our old microarray experiments showed that expression of competence genes increases then too; in the paper we said that expression levels increased about 4-20 fold, but we didn't present any data. So I decided to compare wildtype expression levels in late log with ∆toxA expression levels at 100 min of induction.  

But I was surprised to see that, in our RNA-seq data, competence-gene expression levels in rich medium don't increase as the culture gets dense.  In the graph below, each cluster of blue bars is a DNA-uptake gene, with three replicate bars at OD=0.02 (true log phase, light blue), OD=0.6 (end of log phase, medium blue) and OD=1.0 (dark blue).  In most cased the dark blue bars are not noticeably higher than the other bars, indicating that the gene is not induced at all when cell density increases.

My first response was to try to find the original microarray data, to see how big an induction we actually saw.  It's probably buried somewhere in my computer (not with the array manuscript files), but I can't find it.  So instead I looked in my notebooks for any problems with the wildtype samples used for the RNA-seq analysis, and here I think I found the explanation.  Along with each sample we prepared for the RNA analysis, we froze one tube of cells that could be checked later for competence or other issues (e.g. contamination).   In May 2015 we had noticed the unexpectedly low expression levels of these samples, so we thawed out OD=1.0 samples and transformed them.  They were about 100-fold less competent than they should have been, which is consistent with their low gene expression.  This comparison is still useful, because even with this nearly undetectable induction the cells did become at least 10-fold more competent that the ∆toxA cells do after MIV induction.

Biofilm assay results

The summer undergrad did the biofilm assay this week. The results are quite clear: Haemophilus influenzae does form what might be biofilms on glass tubes, but this is completely independent of competence gene expression or the ability to make Type 4 pili (T4P).  Thus we won't be able to use biofilm assays to clarify how the toxT toxin prevents DNA uptake.

The basic assay was as described in the previous post.  She tested four strains:  the wildtype parent, which expresses T4P genes and becomes moderately competent at the onset of stationary phase, a strain unable to induce its competence genes (including the T4P genes), a strain whose type 4 pilin gene is deleted, and a hypercompetent strain that expresses all the competence genes very strongly at all stages of growth.

Cultures were grown for one and two days in 2 ml of rich medium in new glass tubes, either stationary in a rack or being gently mixed on a roller wheel.  Here's a photo of two of the Day 2 culture tubes, inverted to dry after staining.  Most of the stationary-culture tubes had a film of cells, mainly at the bottom of the tube (exception explained below).  All of the rolling-culture tubes had a bright film at the air-medium interface.

And here are the results.  (Each bar is the mean of three replicate cultures.)  With the exception of the stationary culture of the 'no pilin' strain, which failed to grow, all cultures gave equivalent staining intensity.  There was no effect of expression of competence genes or deletion of the pilin gene.

Now I need to go back and look at the H. influenzae T4P literature, to see if this is a new result or an entirely predictable outcome.

Later:  I looked through the H. influenzae pilus/biofilm literature.  Other types of pili are needed for biofilm formation.  A knockout of the T4P pilin induced in competent cells causes biofilms (grown in a flow-through chamber and observed microscopically) to be thinner and less 'organized', and reduces biofilm formation in the inner ears of chinchillas, so we might have expected our mutants to show altered biofilm staining.  

Maybe it is worth having the summer student repeat her experiment, so we can describe this in the toxin/antitoxin paper.  What improvements should we include?  
  1. Including no-cells control tubes
  2. Measuring the OD600 of each culture?  But would this require that the tubes be vortexed to resuspend the cells?  Maybe just do it for the 'rolling' cultures (removing 100 µl to 900 µl blank), which won't need to be vortexed.
  3. Anything else?

Does H. influenzae need DNA uptake genes to form lab biofilms?

This morning I had another Skype conversation with the (most recent) former post-doc.  We mostly talked about the toxin/antitoxin work.  One question that came up was whether the antitoxin knockout strain was unable to form simple biofilms as well as to take up DNA.

The kind of biofilm I mean is a simple film of cells that might stick to the surface of the glass or plastic container the cells are being cultured in.  Formation of such films depends on the species (do its cells have a sticky surface), on the genotype (how much of the sticky substances are being produced), on the container properties (glass? polystyrene? polypropylene) and on the culture conditions (cells may stick more easily if the culture is not being shaken).

Here's a diagram of the basic assay; the the amount of crystal violet depends on how many cells were stuck to the tube surface.

Many components of the cell surface can contribute to its stickiness, but we're interested in the effects of type 4 pilin (T4P) structures on the cell surface, because these are used both for adherence to surfaces and for DNA uptake. If our wildtype H. influenzae cells consistently form biofilms, and if this depends on the expression of the normal DNA uptake machinery, then we can test whether the DNA-uptake defect of our antitoxin knockout mutant is accompanied by a defect in forming biofilms.

Why do we care about this?  We know that this mutant has near-normal expression of the genes needed for DNA uptake, so why can't it take up DNA?  If the controls show that biofilm formation requires the uptake machinery, and the mutant does not form normal biofilms, we'll conclude that the toxin interferes with assembly of the basic T4P machinery.  If the mutant does form biofilms, we'll conclude that the toxin specifically blocks the DNA-uptake activity of the T4P machinery that has been assembled and is able to stick to surfaces, perhaps by blocking the retraction step that pulls the DNA in.

The experiments are quite straightforward.  Versions of this assay have been done on various H. influenzae clinical isolates, but not to examine the roles of the type 4 pilus machinery.  We'd use one of our competence-negative regulatory mutants, probably a sxy knockout.  The lab down the hall does similar assays with Campylobacter - I'll ask their advice before proceeding.

One more bicyclomycin try!

The previous Bioscreen experiment failed because, as we suspected, the vial we purchased didn't contain the expected mg of bicyclomycin.  The highest concentration we tested (20 µg/ml) caused only a very slight slowing of growth, so we contacted the supplier and had them send us a new vial.  This contained more visible powder than the previous one had, although still a very tiny amount), and we used it for a new Bioscreen experiment, testing concentrations up to 10 µg/ml.

This time the 10 µg/ml culture showed a substantial slowing of growth.  We also saw smaller decreases in growth, proportionally, with the lower concentrations.  Although the effects were smaller than we expected from the reported MIC (minimum inhibitory concentration of 3 µg/ml, we think we can go on to do our experiment.

Before we do the big competence-induction experiment we should really do another Bioscreen run to test the higher bicyclomycin concentrations we would need to include in the big experiment.  We can't afford to use up much bicyclomycin to do this, so we'll decrease the numbers of replicate wells we use:

The summer student thinks she can do this tomorrow (she'll fill the other wells with plain medium (no cells) as her contamination control), and then we'll be able to do the big experiment on Friday!

Once again, with the real bicyclomycin!

Last month I wrote that we were abandoning our plan to test whether the antibiotic bicyclomycin induces competence in Haemophilus influenzae, as it does in Legionella pneumophila, because (i) the free 'bicyclomycin' we'd been given by a colleague turned out to be bicyclomycin benzoate, and (ii) the real bicyclomycin we wanted to test cost hundreds of dollars per milligram.

But last week I got the budget statement for our NSERC grant and discovered that we're not as broke as I thought.  The credit for this goes mainly to the PhD student, who has been earning a large fraction of his annual stipend by working as a teaching assistant!  So the summer undergrad and I worked out how much bicyclomycin we'd need to test its effect, and found that 1 mg should be enough to detect if there is any effect, and if there is to begin characterizing it.

I'm going to write out the plan and explain our calculations here, to check that we haven't made any dumb mistakes.

Step 1:  Confirm that the reported MIC is correct and determine the best concentrations to test.

We will use the lab next door's Bioscreen incubation system for this.  It can record density changes in two culture plates, each with 100 wells holding 0.3 ml of culture each.  We'll only use a single plate.

Muller et al (1979) tested a wide range of bicyclomycin derivatives, and reported that the MIC (minimum inhibitory concentration) of bicyclomycin for H. influenzae is 3.1 µg/ml.  (MICs are typically only tested in increments of 2-fold, so this is a rough estimate.)  We will evaluate the following µg/ml concentrations for growth effects: 0, 0.1, 0.2, 0.5, 1.0, 2.0, 3.5, 5.0, 10.0 and 20.0.  That's 10 wells (3 ml) of each concentration, which would need 126.9 µg.  We'll actually make up 3.5 ml to allow for measurement errors, so I'll round this up to 150 µg.  We could even omit the 20 µg/ml test, and would then only need about 80 µg.

We should start these cultures with a fairly high density of cells, already in log phase, rather than the usual very low density.  With the usual low density, the cells must double for at least several hours before the culture turbidity becomes high enough to detect, but these first doublings are where we would see the effect of bicyclomycin.  So we should start the Bioscreen cultures with cells that are in roughly the same state and density as the cells will be in the competence-test described below.  The actual density to use is discussed there.

Notes after setting up the actual experiment:  We decided to replace the [0.1 µg/ml] treatment with a no-cells treatment.  The culture we used was in log phase, at OD600 = 0.1 (about 3 x 10^8 cells/ml).  When we went to make up the bicyclomycin stock, the vial we had purchased appeared to be completely empty.  Looking at it with the dissecting microscope revealed a tiny amount of dust-like material in the vial; we were reassured when it dissolved rapidly in water., and I'm now optimistically assuming that the 1 mg of bicyclomycin in the vial had been added as a liquid and dried onto the bottom of the vial.  But if we don't see any effect of the bicyclomycin on cell growth, I'll begin to question whether the vial actually contained any.

Step 2:  Examine kinetics of growth and survival in different concentrations:

The Bioscreen only measures OD600; this is a good indicator of initial growth but not of long-term survival.  At the end of the run we will collect the cultures from individual wells, and dilute and plate the cells to measure overnight survival.  I think we should test two wells of several different concentrations: 0 (the control), the highest concentration that gave normal-looking growth, a concentration where culture growth was slowed but eventually reached normal density, and a concentration that drastically reduced growth.

Maybe a killing-curve experiment:

We'd be happy to do the competence-induction test with a concentration of bicyclomycin high enough to inhibit growth, but we don't want to use one that will actually kill the cells in the 30-90 minute incubation periods, since then we couldn't test whether any cells became competent.  The only way to find this out will be to design a killing-kinetics experiment after we have the Bioscreen results, where we give cells a high concentration and take samples every 15 minutes to measure survival.  But probably this can wait until after we have the results of the first competence-induction test described below.

Step 3:  Test effect of short-term exposure to bicyclomycin on competence induction:

The induction experiment:  (first draft)

This is intended to be a quick-and-dirty experiment.  We won't worry about getting all the conditions just right, but will quickly assess a range of plausible conditions to see if any induce competence.

  1. Start with non-competent cells (in log-phase growth, at OD600 = 0.2?).
  2. Transfer 5 ml of the culture to tubes or tiny flasks with three different concentrations of bicyclomycin: one that slows growth without killing the cells, one that more severely inhibits growth, and one that completely stops growth within 90 min.  Also a negative control culture with no bicyclomycin, and a positive control culture with 1 mM cAMP.
  3. At three different times (30 min, 60 min, 90 min?), pellet 1 ml of cells and resuspend them in medium containing MAP7 DNA (no bicyclomycin).  Incubate for 1 hr (to allow continued development of competence and then DNA uptake), dilute and plate ± novobiocin to measure transformation (and survival).

Timing and initial density: As planned above, total growth times will range from 90 min to 150 min; this is enough time for about 2.5 to 4 cell doublings under normal conditions.  Initially I planned to start with cultures at OD=0.2.  But from this initial density the negative control (no bicyclomycin) cells would go on to develop the usual low-level competence during the incubation with DNA, and transformation frequency would be 10^-5 - 10^-4 even for the shortest growth time. This would also mask the induction we expect in the positive control (+ cAMP).

What if we started at OD = 0.05 instead?  Negative-control samples using 90 min of total growth time would not give any transformants.  Samples that had longer incubations would, but we might still be able to detect an induction effect.

Or we could start even lower.  But this risks using so few cells that we can't detect low-level increases in transformation frequency (the limit of detection depends on the cell density), especially with bicyclomycin concentrations that inhibit growth.

We could use different initial densities for the different concentrations (OD=0.05 for 0 and low bicyclomycin and cAMP, and 0.1 for the higher concentrations).

Time management issues:

Another timing consideration is the need to be doing many things at once.  As set out above, we'd need to be pelleting and resuspending the t=90 samples at the same time as we're diluting and plating the t=30 samples.  Better to spread out the exposure times a bit more (say 20 min, 60 min and 100 min), so the different tasks are due at different times.  The initial choices were semi-arbitrary, so these new times should be just as good.

Semi-final plan:
  1. Start with non-competent cells (in log-phase growth, at OD600 = 0.1?).
  2. Transfer 5 ml of the culture (undiluted or diluted 1:1 with sBHI) to tubes or tiny flasks with no drug, cAMP (1 mM), and three different concentrations of bicyclomycin, chosen after consideration of the Bioscreen results.  Use the undiluted culture for flasks with high bicyclomycin.  
  3. At three different times (20 min, 60 min, 100 min), pellet 1 ml of cells and resuspend them in sBHI medium containing 1 µg MAP7 DNA (no bicyclomycin).  Incubate for 1 hr, dilute and plate ± novobiocin to measure transformation and survival.

Input DNA fragment sizes and shape of uptake peaks

The grad student has completed an analysis of the size distribution of the DNA fragments in the chromosomal DNA preps used for his uptake experiments.  Now we need to think about how we'll use this information.

He used two DNA preps, one sheared to an average length of about 6 kb (the long-fragment prep) and one sheared to an average length of about 250 bp (the short-fragment prep).  He analyzed both with a Bioanalyzer belonging to a neighbouring lab (thanks neighbours!).  This produced intensity traces for each sample (red line), with size-standard peaks (blue).

The intensity traces reflect the number of base pairs at each position in the gel, not the number of fragments, so the values needed to be normalized to fragment length to get the size distribution.  The purple line is the final distribution of fragment sizes.  We see that most fragments are between about 75 and 300 bp.

Now, how do we use this information to predict the shape of the expected uptake peak around an uptake-promoting sequence (a USS)?

We first need to calculate the probability that the position we're looking at will be on the same fragment as a USS (call this value 'U').

 Now do this for each fragment size and plot it.

This is our expected peak shape, if all that matters is whether a USS is present anywhere on the DNA fragment.  We'll compare this to the average shape of well-isolated uptake peaks in the short-fragment dataset - the PhD student has already made a list of this subset of the peaks.

To do the comparison properly we'll need to take peak height into consideration too.  So we should do separate comparisons for different peak-height classes.  If the prediction nicely overlays the observed peaks we'll conclude that a USS anywhere on the fragment is equally effective.

If the location of the USS on the fragment matters, or its orientation, the peak would have a different shape.  For example, if USSs near the ends of fragments don't promote uptake very well, the observed average peak would be narrower than predicted by fragment sizes.

For another example, if USS in the forward orientation promote uptake well when they're near the left end of the fragment but poorly when they're near the right end, we might see different peak shapes for the two orientations - skewed right for 'forward' USSs and skewed left for reverse' USSs.   (Or is that backwards?)  If we only looked at the combined set of USSs in both orientations we might miss this effect.

Is there any other factor we could investigate using this analysis?  And what about the large-fragment data - should we treat it the same way?

Bicyclomycin ≠ bicyclomycin benzoate

A month ago I wrote a post about a planned experiment using the antibiotic bicyclomycin, to see if it induces H. influenzae cells to develop competence.  At the time I couldn't remember why this was a reasonable question, but a commenter pointed me to this paper, which describes the induction of competence by bicyclomycin in Legionella pneumophila.

Bicyclomycin is expensive, and we're close to broke, but a generous colleague had given us 4 mg of it to use in a trial experiment.  So I put our summer undergraduate to work on the project.  She began by testing H. influenzae's ability to grow in different concentrations of bicyclomycin, since we wanted to use a semi-inhibitory (but not lethal) concentration for our experiment.  We had found a paper that reported the minimum inhibitory concentration (MIC) for H. influenzae was 3.1 µg/ml, so she tested a wide range (up to 20 µg/ml).  But she saw no inhibition of growth at all.

That MIC had been for a clinical strain, not the lab workhorse KW20, so she repeated the test (this time using the neighbour-lab's BioScreen system) for both a clinical strain (86-028NP) and KW20, and for a couple of E. coli strains (the same paper reported MICs for E. coli  strains between 6 and 12 µg/ml), using bicyclomycin concentrations up to 50 µg/ml.  Still no evidence of growth inhibition!

But now I think I've solved the mystery.  Before making up our bicyclomycin stock we searched for solubility info.  We learned that it's reasonably soluble in water, but that there's a related antibiotic called bicyclomycin benzoate that needs to be made up in ethanol.  The colleague who gave us the 4 mg remimded me that she'd sent an email saying to dissolve it in ethanol.  I'd forgotten about this email, but reading it now reminded me of the solubility difference, and when I checked with her I found out that what she'd given us was bicyclomycin benzoate.

The same paper that gave us the H. influenzae MIC for bicyclomycin tested a wide range of derivatives, one of which was bicyclomycin benzoate.  It's MIC for H. influenzae was >100 µg/ml.  No wonder our cells didn't care about the concentrations we tested!

Bicyclomycin is about 10 times more expensive that bicyclomycin benzoate ($280/mg) so I don't think we'll be doing this experiment after all.

One more toxin/antitoxin growth experiment

I have one more experiment to do for our toxin/antitoxin manuscript.  I need to make sure that survival into and recovery from stationary phase is normal in the antitoxin-knockout mutant.  This strain overexpresses the toxin gene and cannot inactivate the resulting toxin protein.  We already know that it produces a normal-looking growth curve using the Bioscreen; one of the lines in the graph below is for the antitoxin knockout), but this analysis is based on changes in culture turbidity and does not consider whether some of the cells contributing to turbidity might be dead.  This isn't a concern for rapidly growing cultures, but is for cells that have ceased growing.  So I need to complement the Bioscreen result with growth curves made by diluting cultures and plating the cells, to measure viable 'colony-forming-units' rather than just turbidity.

I would normally set up four cultures (wildtype, toxin knockout, antitoxin knockout, double knockout), but there's a complication.  The double knockout antitoxin mutant only exists in a 'marked' version (with a spectinomycin cassette inserted in place of the missing genes) and the antitoxin knockout only exists in an 'unmarked' version (no spectinomycinR cassette).  If this cassette influences growth or survival this difference could cause anomalous results.  The toxin knockout exists in both forms, so I'll include both of them in the analysis.

First step is to restreak all the strains from the freezer stock.  I did this last week but foolishly let the cells die on the plates rather than restreaking them.