Field of Science

Input DNA fragment sizes and shape of uptake peaks

The grad student has completed an analysis of the size distribution of the DNA fragments in the chromosomal DNA preps used for his uptake experiments.  Now we need to think about how we'll use this information.

He used two DNA preps, one sheared to an average length of about 6 kb (the long-fragment prep) and one sheared to an average length of about 250 bp (the short-fragment prep).  He analyzed both with a Bioanalyzer belonging to a neighbouring lab (thanks neighbours!).  This produced intensity traces for each sample (red line), with size-standard peaks (blue).

The intensity traces reflect the number of base pairs at each position in the gel, not the number of fragments, so the values needed to be normalized to fragment length to get the size distribution.  The purple line is the final distribution of fragment sizes.  We see that most fragments are between about 75 and 300 bp.

Now, how do we use this information to predict the shape of the expected uptake peak around an uptake-promoting sequence (a USS)?

We first need to calculate the probability that the position we're looking at will be on the same fragment as a USS (call this value 'U').

 Now do this for each fragment size and plot it.

This is our expected peak shape, if all that matters is whether a USS is present anywhere on the DNA fragment.  We'll compare this to the average shape of well-isolated uptake peaks in the short-fragment dataset - the PhD student has already made a list of this subset of the peaks.

To do the comparison properly we'll need to take peak height into consideration too.  So we should do separate comparisons for different peak-height classes.  If the prediction nicely overlays the observed peaks we'll conclude that a USS anywhere on the fragment is equally effective.

If the location of the USS on the fragment matters, or its orientation, the peak would have a different shape.  For example, if USSs near the ends of fragments don't promote uptake very well, the observed average peak would be narrower than predicted by fragment sizes.

For another example, if USS in the forward orientation promote uptake well when they're near the left end of the fragment but poorly when they're near the right end, we might see different peak shapes for the two orientations - skewed right for 'forward' USSs and skewed left for reverse' USSs.   (Or is that backwards?)  If we only looked at the combined set of USSs in both orientations we might miss this effect.

Is there any other factor we could investigate using this analysis?  And what about the large-fragment data - should we treat it the same way?

Bicyclomycin ≠ bicyclomycin benzoate

A month ago I wrote a post about a planned experiment using the antibiotic bicyclomycin, to see if it induces H. influenzae cells to develop competence.  At the time I couldn't remember why this was a reasonable question, but a commenter pointed me to this paper, which describes the induction of competence by bicyclomycin in Legionella pneumophila.

Bicyclomycin is expensive, and we're close to broke, but a generous colleague had given us 4 mg of it to use in a trial experiment.  So I put our summer undergraduate to work on the project.  She began by testing H. influenzae's ability to grow in different concentrations of bicyclomycin, since we wanted to use a semi-inhibitory (but not lethal) concentration for our experiment.  We had found a paper that reported the minimum inhibitory concentration (MIC) for H. influenzae was 3.1 µg/ml, so she tested a wide range (up to 20 µg/ml).  But she saw no inhibition of growth at all.

That MIC had been for a clinical strain, not the lab workhorse KW20, so she repeated the test (this time using the neighbour-lab's BioScreen system) for both a clinical strain (86-028NP) and KW20, and for a couple of E. coli strains (the same paper reported MICs for E. coli  strains between 6 and 12 µg/ml), using bicyclomycin concentrations up to 50 µg/ml.  Still no evidence of growth inhibition!

But now I think I've solved the mystery.  Before making up our bicyclomycin stock we searched for solubility info.  We learned that it's reasonably soluble in water, but that there's a related antibiotic called bicyclomycin benzoate that needs to be made up in ethanol.  The colleague who gave us the 4 mg remimded me that she'd sent an email saying to dissolve it in ethanol.  I'd forgotten about this email, but reading it now reminded me of the solubility difference, and when I checked with her I found out that what she'd given us was bicyclomycin benzoate.

The same paper that gave us the H. influenzae MIC for bicyclomycin tested a wide range of derivatives, one of which was bicyclomycin benzoate.  It's MIC for H. influenzae was >100 µg/ml.  No wonder our cells didn't care about the concentrations we tested!

Bicyclomycin is about 10 times more expensive that bicyclomycin benzoate ($280/mg) so I don't think we'll be doing this experiment after all.

One more toxin/antitoxin growth experiment

I have one more experiment to do for our toxin/antitoxin manuscript.  I need to make sure that survival into and recovery from stationary phase is normal in the antitoxin-knockout mutant.  This strain overexpresses the toxin gene and cannot inactivate the resulting toxin protein.  We already know that it produces a normal-looking growth curve using the Bioscreen; one of the lines in the graph below is for the antitoxin knockout), but this analysis is based on changes in culture turbidity and does not consider whether some of the cells contributing to turbidity might be dead.  This isn't a concern for rapidly growing cultures, but is for cells that have ceased growing.  So I need to complement the Bioscreen result with growth curves made by diluting cultures and plating the cells, to measure viable 'colony-forming-units' rather than just turbidity.

I would normally set up four cultures (wildtype, toxin knockout, antitoxin knockout, double knockout), but there's a complication.  The double knockout antitoxin mutant only exists in a 'marked' version (with a spectinomycin cassette inserted in place of the missing genes) and the antitoxin knockout only exists in an 'unmarked' version (no spectinomycinR cassette).  If this cassette influences growth or survival this difference could cause anomalous results.  The toxin knockout exists in both forms, so I'll include both of them in the analysis.

First step is to restreak all the strains from the freezer stock.  I did this last week but foolishly let the cells die on the plates rather than restreaking them.

Does bicyclomycin induce competence? (What was I thinking???)

Last summer I started the blog post below.
 Does bicyclomycin induce competence?
Yesterday the summer student pulled out the public data files for E. coli microarray experiments that had included measurements of sxy mRNA.  We don't know how sxy expression is controlled in E.coli - nobody has found a way to induce expression of the chromosomal gene (we used an inducible plasmid clone to study its effects on other genes).  So it's good to see that some treatments did induce it. 
In the diagram below, each coloured vertical bar represents a single microarray comparison of sxy mRNA under two different conditions.  Mousing over the bar brings up a box describing the comparison and results.  Most of the bars are black or blackish; these are comparisons where sxy mRNA levels are the same.  Yellow bars are ones where it is down (bright yellow is ≥ 8-fold down, and blue bars are ones where it is up (bright blue is ≥8-fold up) (the scale is 'log 2 expression ratio').  
It's hard for me to tell which (if any) patterns are biologically significant.  The one I'm excited about

And that's the end of the draft post!

Subsequently I found a colleague who kindly gave me some bicyclomicin (it's an antibiotic), and roughed out a simple experiment.  Now I'm planning to train up our new summer undergrad so she can do the experiment.

But I can't remember why I thought that bicyclomycin might induce competence! 

Bicyclomycin is an antibiotic.  I'd never heard of it until last summer, but it's of general interest because it's the only antibiotic that inhibits the Rho transcription termination protein.  Given that competence development is limited by folding of the 5' end of sxy mRNA, it could be that Rho-mediated termination plays a role in determining whether sxy mRNA is translated.

Searching my blog posts for 'bicyclomycin' found the unpublished post above, which tantalizingly breaks off in mid-sentence just at the point where I was about to explain my interest.  The figure is a screenshot from a microarray database, and I would expect that one of the bright-blue bars (sxy induction) would be from an array analysis involving bicyclomycin.  But that doesn't seem to be the case.  Of the five analyses with bright blue bars, one is UV irradiation, two are  biofilms, one is heat shock, and one is a glucose-lactose shift.  No mention of transcriptional termination.  Searching the microarray database for 'bicyclomycin' brings up the expression of the bcl gene, whose mutations confer resistance, and a study of transcription termination in which sxy expression is unchanged!

This microarray study of transcription used bicyclomycin to inhibit termination. So I dug farther into it to see if there were any changes in expression of the competence-gene homologs that sxy induces. Some of them are tantalizingly up (the major T4P pilin and the comABCDE-homologs that specify the secretin pore and components of the T4P motor responsible for DNA uptake), but others are unchanged.

Subsequent searching also found an email I'd send to the summer student, with a link to this termination paper (Cardinale et al. 2008), asking 'Is this the one?".  So I think this study is indeed what got me interested in bicyclomycin.

So let's see what the new summer student can find out!

More thinking/planning about the new uptake-sequencing data

Some housekeeping issues:

The sequence data:  The PhD student has found that some segments of the genome have very low coverage in the input data - some positions have coverage of zero.  This means that the calculated uptake ratios for these positions are either unreliable (low coverage) or missing (coverage = 0).  He's going to plot segments of the genome with the low coverage points in a different colour, so we can see how bad the problem is.

Part of the problem may be due to how the reads were originally mapped onto the donor genome. The mapping used a concatenated donor-recipient double genome to remove the contaminating recipient reads from the data.  Because the donor and recipient sequences used were those of NCBI reference genomes rather than of the exact cultures used for the experiment, sequencing errors in the reference genomes may have caused donor sequences to mis-align onto the recipient genome.

This can easily be checked by examining the full alignment of the input DNA.  This should not contain any contaminating recipient sequences, so any reads that align to the recipient are alignment errors.  The ideal solution would be to realign the reads using better reference sequences, but we could instead just add this misaligned coverage into the donor-aligned input dataset we're analyzing.

Any remaining positions with near-zero coverage in the input dataset should probably be flagged and removed from the analyses.

The USS-scoring matrices:  A careful reader might have noticed in yesterday's post that the two scoring matrices are not the same length.  The uptake-based matrix is 32 nt long, but the genome-based matrix is 37 nt long.  They are also not exactly aligned to each other; position 1 of the uptake-based matrix is position 3 of the genome-based matrix.  Rather than dealing with these discrepancies later (or forgetting to deal with them), we should create concordant matrices now to use for the scoring.

This requires deleting the first two positions and the last three positions of the genome-based matrix. Since the remaining last few positions have no 'information' in either matrix, we might as well delete a couple more, to give concordant matrices that are both 30 bp long.

Forward-strand and reverse-strand USSs:  Since the USS motif is not symmetric (not a palindrome in DNA language), we need to identify and specify the locations of the USSs in the two strands.  The top panel below illustrates the problem.  To keep the position references consistent, the two strands are initially scored in the same left-to-right direction, with the reverse-strand scoring done using a matrix with complementary bases in the reverse orientation.  For both strands the left end of each USS initially specifies its position in the genome, but this is a bit misleading since it's not the centre or most important position of the USS.  Worse, since the crucial 'core' of the USS motif isn't at its centre, the initial positions of the forward USSs are skewed differently than the reverse USSs.

The lower panels indicate the two possible solutions.  Both are technically easy - we just create new USS positions by adding numbers to the original positions.  In the solution shown in the middle panel, we'd add 13 (I think) to both the forward and reverse positions (sorry, the figure shows the trimmed 30 bp USS but the numbers haven't been corrected for the removal of two positions at the start).  In the solution shown in the lower panel we'd add 7 to the forward strand positions and 21 to the reverse-strand positions. (I'm not certain these are the correct numbers...)

I think either solution would be fine, but we need to pick one.

Uptake dataset progress

The PhD student has been making lots of progress in analyzing the data from the chromosomal DNA uptake experiment.

The big progress came because we realized that we needed to stop looking at the data for the whole genome and instead examine a representative 5 kb segment.  This has allowed us to relate the results of each analysis to the specific sequence features and uptake data for each position in the segment. So now we have a pretty good understanding of what the various analyses can show us, and what they can't.

Rather than detailing what we learned, here I want to consider what our goals are, and what steps we should take.

Goals:  For the analysis of transformation frequencies (the bigger project this work is part of), we want to know how much of the variation in transformation frequencies across the genome is due to differences in DNA uptake.  In principle this could just be a number, e.g. 37%.

I guess one (mindless) way to do this would be just to subtract the differences in uptake from the differences in transformation.  I don't know whether the former post-doc has done this - I'm pretty sure we haven't discussed it.

A second approach would be to determine the extent to which the already-characterized effect of USS (uptake signal sequences) on DNA uptake explains differences in transformation across the genome. Doing this doesn't require any of the new DNA-uptake sequencing data, just the sequence of the genome of the DNA source.  The former post-doc has done simple versions of this, and he has a rotation student working on a more sophisticated version.

We (the PhD student and I) are instead using the new sequence data to improve our understanding of how DNA sequences determine how efficiently a fragment will be taken up by a competent cell.  This better understanding can be then used to predict the contribution of uptake to the transformation differences (as above), but its main value is more direct - understanding how DNA sequence differences affect uptake will help us understand the evolution of uptake biases and uptake signal sequences, in H. influenzae and other organisms.

So what have we learned so far:

Size distribution of the input DNA:  We don't yet have the direct DNA-analyzer data on length distribution.  But we can indirectly estimate this by looking at the graphs of uptake ratio as a function of genome position.  Positions that are more than 500 bp from the center of an uptake peak (location of a USS) have a very small uptake ratio (~ 0.01, often not distinguishable from zero).  This means that almost all of the fragments in the short DNA sample were shorter than than 500 bp.  The mid-height widths of the (well-separated) peaks are about 400-500 bp, indicating that the average fragment was about 200-250 bp.  I haven't taken the time to get the best image for this analysis,so we can be more precise than this.

Importance of USS:  It's abundantly clear that most of the variation in uptake seen in our 'short' DNA sample is due to the locations of 'USS', sequences with strong matches to the USS motif.  Most fragments containing a strong match (score > 20 with the 'genomic' scoring matrix) are taken up several hundred times more efficiently than fragments without a good match.

We've only examined 5 kb in detail, but so far all the uptake peaks we've examined are centred on positions with strong USS scores.  The height of the peak correlates with the score.

Importance of the USS scoring matrix: We have two types of position-weight matrices for scoring how well a sequences matches an uptake-promoting motif.

The first is the 'genomic' matrix that the PhD student has been using so far., shown in the figure below. It's based on analysis of abundant USS elements in the H. influenzae Rd genome, identified using the Gibbs Motif Sampler (Maughan et al. 2010).  In the figure each bar represents a position in the motif, and its height represents the 'information content' at that position (the sum of the weighted values of each base at that position in the table).

The genomic analysis means that this matrix doesn't directly represent the preferences of the uptake machinery, but rather some combination of these preferences with other factors affecting how sequences accumulate in the genome over evolutionary time.

The second type of matrix comes from the former post-doc's direct analysis of uptake biases, done using a synthetic DNA fragment containing a degenerate USS (Mell et al. 2012).  This 'uptake' matrix gives a motif with a strong consensus only for the a much smaller region, with only four very important bases.

We haven't yet analyzed any genome uptake data using this matrix, but it's high on our priority list. We expect similar results with both matrices, but the uptake matrix may be better because it's directly based on uptake data.

How will we decide if it's 'better'?  Here, 'better' means that position USS scores better predict the uptake ratios of nearby sequences.  We're still working our way to deciding the best way to do this. In addition to the USS score from the matrix, the prediction will need to consider how far the position is from the nearest 'USS' (on a list using a good score cutoff), whether fragments containing it are likely to contain more than one 'USS', the size distribution of the DNA fragments in the prep).  Maybe some of this would be incorporated in a matrix of USS scores and distances...

Ideally (i.e. if computational time and resources were unlimited), for each focal position whose uptake we want to predict, the uptake prediction would incorporate:

  1. the USS scores at each distance from it (two scores for each distance), weighted by our observed correlation between USS score and height of uptake ratio peak
  2. For each distance, a weighting factor that reflects the probability that the focal position is in the same DNA fragment as the sequence being scored (based on the measured size distribution of the input DNA prep)
  3. A factor reflecting the interactions between USS scores at different positions, weighted by the probability that both USS would be in the same fragment.
In practice, our job is to characterize these effects and then distill the important ones into a computationally simple prediction algorithm. 

Understanding the results of the first analysis

The grad student did the analysis I had described in this post.  Here's what I had said I expected:

 And here's what he found:

His data extends over a larger scale, and there is no empty space on the left below the main peak of points, perhaps just because the dots are too big to resolve.  A few uptake ratios are as high as 10, which is  also expected.  Some of the distances to the nearest 'USS' (position on the USS list) were surprisingly large - outside of the common fragment sizes in the 'short' DNA prep, but these might represent the several places in the genome where USS are widely separated. 

The most surprising aspect was the appearance of well-defined lines of points forming peaks at distances longer than the fragment sizes, and the absence of the clusters of points I'd originally hypothesized.

These long-distance peaks made sense once the grad student identified the positions responsible for them  and checked their assigned USS scores.  At the site of the peak he found a position with a USS score only slightly lower than the cutoff he'd used when generating his list.  When he checked the USS scores for the positions of the other long-distance peaks he again found scores that were locally high but below the list cutoff. 

The figure below illustrates what we think is going on.  First consider the top graph, which is a simpler schematic version of the uptake-ratio graph in the earlier post.  It shows two local peaks in uptake, one at the site of a USS on the list, and one at the site of another uptake promoting sequence. In principle this sequence could be a lower-scoring USS, or it could be an unrelated sequence that also promotes uptake.

The lower graph shows what we expect when this data is replotted with the distance to the nearest 'USS' on the X axis.  As I originally expected, points close to the recognized USS give two lines heading down and away from position 0 (the position of that USS).  But because the other uptake-promoting position isn't recognized as a 'USS', its points show up farther along the x axis, according to their distance from the position-zero USS.

Are USS that fell below the list cutoff responsible for all of the long-distance peaks?  One simple test is to reduce the cutoff for the USS list, and see if the peaks go away.  Sure enough, when the grad student reduced his USS-score cutoff from 19.04 to 18, all but one of the peaks disappeared.  I'm a bit surprised that the long-distance low-uptake points disappeared too; I guess this means that they weren't just due to gaps in the genomic distribution of USSs.

Does this result mean that the genome doesn't contain any non-USS sequences that promote DNA uptake?  No.  There's still that one remaining peak at about 800 bp, whose USS scores need to be checked.  And there are all the points in the black part of the graph, where non-USS peaks may be obscured by all the other points.

More about analysis of the DNA-uptake sequencing data

The graph below shows the efficiency of DNA uptake relative to the 'input' DNA sample) across a 13 kb segment of the H. influenzae Rd genome.  The red dots are for a 'short' sample with average fragment size about 0.25 kb, and the blue dots are for a 'long' sample, with an average fragment size of about 6 kb  (The average lengths come from crude examination of agarose gels, which might underestimate the abundance of short fragments, so the actual length distributions will be measured with a DNA Analyzer).

The previous post considered why the red data are so spiky - each spike corresponds to the location in the DNA of a short sequence matching the uptake-signal-sequence (USS) motif. Fragments containing a USS sequence are taken up much better (maybe 25-50 times better?) that fragments lacking a USS.

But the blue data are also spiky, and I don't know why.  Ignoring the two big spikes for a minute, the spikes and dips have much smaller amplitude than the big red spikes (they don't go up as high or down as low), but they're also more frequent on the distance scale.    

The gradual rise and fall of the blue dots over distances of several kb is expected from the length distribution of the fragments, but this jaggedness is entirely unexpected, especially given the apparent smoothness of the red points between the USS spikes.  Is this just noise in the data?  Is it an artefact of how the uptake data were normalized to the input data?

The two high spikes might be a different puzzle, or they might be extreme cases of whatever is causing the low-amplitude spikiness.  How could variation in uptake of DNA fragments that are mostly at least several kb long give a spike that's only about 11 bp wide?  Could this be an alignment artefact that somehow affects 'uptake' DNA very differently than 'input' DNA?

Here's a different graph of the uptake ratios (over about 100 kb), made by the former post-doc; again we see much more spikiness in the long-fragment DNA than in the short-fragment DNA.
To investigate the cause(s), I think the first thing to do is to go back one step from the uptake ratio data and look separately at the coverage for the input DNA and the recovered 'uptake' DNA.  Luckily, the first thing the post-doc did when he got the sequencing results is to send us a screen shot of a 20 kb Integrated Genome Viewer view of the 4 sample types (long input and uptake, short input and uptake).

I'm surprised by how variable the input coverage is.  The very fine scale variation is perhaps noise, but the larger peaks and valleys (500-2000 bp) are quite consistent between the long and short input DNA samples.

Unfortunately I don't have the uptake ratio graph for the same region that I have this IGV analysis, and I don't have the R skills to generate it.  But I can ask the grad student to do it for me, and to send me his code so I can figure out how it's done.

How to analyze next-gen DNA uptake data

We want to understand why competent Haemophilus influenzae cells take up some parts of H. influenzae chromosomes more efficiently than others.

To this end, before Christmas the grad student reisolated preparations of DNA fragments of chromosomal DNA from strain 26-028NP (hence 'NP') that had been taken up by competent cells of the standard lab strain Rd.  He sent these DNA samples to the former post-doc for sequencing (with the original 'input DNAs as controls).  The post-doc has now sent us the sequencing data, and the grad student is going to analyze this, with two main goals:
  1. Determine how a DNA fragment's probability of uptake is affected by the presence of sequences matching the uptake signal sequence ('USS') motif.
  2. Identify other sequence factors that influence uptake.
The grad student has written up an overview of his plan for accomplishing these goals, and that has stimulated me to also think about how it could be done.

He (or the former post-doc?) has already done the first step, scoring the degree of preferential uptake for every position in the genome.  I think this was done by comparing each genome position's coverage in the recovered-DNA dataset to its coverage in the control 'input' dataset.  This gives a score they call the 'uptake ratio'.

Here's a graph made by the grad student, showing the uptake ratios for two different preps of chromosomal DNA, over a 13 kb segment of the 1830 kb H. influenzae Rd chromosome. The dark blue points are for a DNA prep whose average fragment size was about 6 kb, and the red points for a DNAS prep whose average fragment size was about 250 bp.  Because the actual distributions of fragment sizes in these preps have not yet been carefully measured, I'll refer to them as the large-fragment and small-fragment DNA preps respectively.

The first thing you notice is that the uptake ratios for the large-fragment prep are much less variable than those for the small-fragment prep.  We are very gratified to see this, because it's what we expected from the known contribution of the uptake motif.  Sequences with strong matches to this motif occur all around the chromosome, with an average spacing of about 1 kb.  Thus most fragments in the large-fragment prep will have contained at least one USS, but many fragments in the small-fragment prep will not have contained any USS.

The large-fragment prep does show two strong spikes of high uptake (at about 8000 and 18500 bp).  These are certainly very interesting, especially since they don't correspond to high uptake in the short-fragment prep.  But for now I'm just going to consider how we might analyze the short-fragment prep, since this provides much better resolution of what we think are the effects of individual USSs.

Here's a strategy I came up with:

Step 1:  Score each position of the NP genome for its match to either orientation of the 'genomic' USS motif.  This motif was identified by Gibbs Motif Sampler analysis of the RD genome (see this paper).  Each position will have a '+' score and a '-' score; we need to make sure the positions are aligned at the most important of the USS motif.  Because the score depends on correct alignment, the result will be punctate, with about one high-scoring position and about 999 low-scoring positions in each kb.  Here's a figure of what the analysis might look like for the 13 kb segment shown above.

Step 2:  Using a reasonable score cutoff, create a list of positions that qualify as 'USS' for the initial analysis of the uptake data.  In the case above we'd include all positions scoring higher than 15.  

Step 3: For each position in the genome, calculate its distance from the nearest 'USS' on the above list.  For now don't distinguish between 'USS' in + or - orientations.  (I'm keeping 'USS' in quotes to remind us that we used only one of many possible criteria to define our list.)

Step 4: For each genome position plot its uptake ratio as a function of its distance from a 'USS'.  Because most of the red peaks in the grad student's graph have uptake-ratio scores of about 4 and bases about 1 kb wide, I expect the graph to look something like this: 

There are a lot more points on this graph than on the previous one because there's a point for every position in the 1.8 Mb genome.  Most of the points fall on a rough band that drops from uptake ratios of 4 (peaks, for a very close 'USS') to uptake ratios that are about 0.1 (troughs, for positions that are more than 500 bp from a 'USS').

If we see a broad band with lots of scatter, this will mean that our distance-to-the-nearest-'USS' score doesn't capture other aspects of the USS that influence uptake.  These factors might include:
  1. whether the USS's orientation on the chromosome affects uptake (USS motifs are asymmetric)
  2. how well the USS's sequence matches the several different ways we can score sequences as possible USS (genome-based, uptake-based, and with or without internal interaction effects between positions)
  3. how much the presence and relative locations of additional USSs adds to uptake
We will come back to the above analysis and develop more nuanced measures of the affects of nearby USS, judging success by how much each nuance reduces the scatter of the points.

For now I'm more interested in identifying any non-USS sequence factors that influence uptake. These factors should appear in the above graph as outliers, positions whose uptake ratio is not correlated with their USS-distance score.  Our previous analysis suggests that these outliers should be common.  If they are common, they might be clustered as shown above, but they're probably more likely to be scattered all over the place and perhaps not easily distinguishable from the overall background scatter. 

The best way to see if these positions are not noise is to see if their scores correlate with genomic positions.  Below is one way I've thought of to do this.

Step 5. 
 Use the uptake vs USS-distance graph to develop an equation that best predicts uptake ratio (U) as a function of distance to nearest 'USS' (D, in bp).  For the above example, a very crude equation might be 

U = 0.1 or (4 - D/100), whichever is greater.

Step 6:  For each position in the genome, use this equation and the 'USS' list to predict an uptake ratio, and then calculate the difference between its predicted and observed uptake ratios.

Step 7:  Now plot this 'anomaly' as a function of genome position.  If we're lucky it will look something like this:

If some of the apparent scatter is due to positions where non-USS sequences influence uptake, these will show up as peaks and troughs above and below the main bands, and we can go on to analyze these sequences bioinformatically for shared features and experimentally for direct effects on uptake.  If the scatter really is due to noise, then it will be scattered over the genome and not fall into discrete peaks and troughs. 

Ready for sequencing?

I think I finally have the appropriate PCR fragments from my A. pleuropneumoniae mutants, to be sent for sequencing:

I have 3 knockout mutants, removing the toxin, antitoxin and toxin+antitoxin segments (∆T, ∆A, and ∆TA respectively).  I designed new 'S-up' and 'S-dn' primers to use with the original 'F' and 'R' primers amplify the segments on either side of the Spectinomycin-resistance cassette that's inserted at the sites of deletion.  I need to check the sequences of these to be sure that the appropriate segments have been removed, and that the remaining gene is intact.

I've successfully used these primers (black arrows above) to amplify the ∆T and ∆A segments shown above (light blue and lilac bars).  Now I just need to clean up the PCR products, check their concentrations, and send them with the appropriate S-up and S-dn primers (red arrows above) for sequencing.  I don't need to sequence the far ends of the fragments.

I also tried to use these primers for the ∆TA double knockout but for some reason I can't get any amplification.  This may mean that there's something wrong with the mutant, but I've decided I don't really need to discuss this mutant at all in our paper, since both the ∆T and ∆A mutants have normal growth and competence phenotypes.  (Well, I think I do need to do at least one more check of the transformation frequencies, since there's been a lot of variation in my colony counts.)

[Ooh, idea!  Maybe the ∆TA mutant won't amplify because its Spec cassette is inserted in the opposite orientation to the others!  The Honours student created each mutant by blunt-end ligation, so either orientation is possible.  I'll go set up one more pair of PCR reactions with the alternate combinations of primers right now...

And YES!  Reversing the primers gave the expected amplification!

What do the toxin and antitoxin gene products do?

Now that I'm finally close to finishing my benchwork task for the Honours student's manuscript, I've gone back to thinking about the results and implications of our RNA-seq analysis.

When the Honours student wrote the manuscript (actually her Honours thesis, but in excellent manuscript format), we had only incomplete RNA-seq results - specifically we had only one replicate of the critical antitoxin mutant.  The other two replicates were in the pipeline at the time, and the full dataset was analyzed subsequently by the other Honours student when he stayed on for the summer.

I'm going to just summarize the results now, and come back to them later.

Basic points:

  1. The antitoxin knockout mutant has normal RNA levels for all the genes that regulate the competence regulon (crp and sxy, which encode the transcription enhancers CRP and Sxy, and cya, which encodes the adenylate synthase that synthesizes the essential cyclic AMP cofactor for CRP.
  2. Consistent with this, the expression levels of the competence regulon genes are not very different than in wildtype cells.  A few genes are down by 40-50%, but most are near-normal, with error bars that overlap the range of wildtype expression (see his complicated green figure below - compare the heights of the bright-green bars with the spans of the grey shaded areas, which represent the normal expression levels at the bright-green timepoint).  
  3. The double knockout (∆toxinantitoxin) transforms normally, so the competence defect of the antitoxin mutant is due to competence-blocking activity of the toxin.
  4. The transformation defect of the antitoxin knockout is much more extreme than these expression levels would predict.  We see few or no transformants (transformation frequencies less than 10^-8), whereas wildtype cells give transformation frequencies higher than 10^-3.  
  5. The antitoxin mutant also has an extreme DNA uptake defect, so the transformation defect is not caused by defective recombination machinery. 
  6. The summer student also did an RNA-seq analysis of the hfq knockout mutant he had worked on for his Honours project. This mutant has a more severe reduction in expression of all the competence-induced genes, but a much less severe defect in transformation (only about ten-fold lower than wildtype cells).  Thus the antitoxin mutant's competence defect is unlikely to be due to modestly lower expression of one or more key competence genes.
  7. In the antitoxin mutant the toxin mRNA is overexpressed during exponential growth.  This is consistent with the roles of related antitoxin's in other systems, where it acts as a repressor of transcription of the toxin-antitoxin operon.
  8. The antitoxin knockout cells have a normal doubling time in exponential growth, and survive competence induction and stationary phase just as well, so the toxin protein must not be toxic for growth or survival.

Where does all this leave us?  One possibility is that the toxin directly blocks DNA uptake, by some mechanism we are completely ignorant of.  But related toxins are known to act by cutting mRNAs on the ribosome, so it's possible that the RNA-seq results are misleading in that they detect all RNAs, including ones that have been cut.

Luckily the summer student wrote an R script to compare coverage patterns between wildtype and mutant cells, and generated lovely graphics showing the effect of the antitoxin knockout on coverage of segments containing competence-induced genes.  Just as an example, here's his comparison of expression of the pilABCD operon in wildtype (purple) and hypercompetent (green) cells.

He's generated data for all the competence-induced genes in the antitoxin knockout, so I'll check these to see if there are any alterations in transcript profiles that might indicate the action of a mRNA-cleaving toxin.

Toxin/antitoxin knockout updates, and bonus DNA uptake results

My last post was all about failure, so it's high time I updated things with some successes.

Constructing an Actinobacillus pleuropneumoniae antitoxin gene knockout:  At the last report, I had what I thought were four independent knockout mutants, but my attempts to PCR- amplify the genomic segment containing the knockout were not working.

I eventually switched to using a different thermostable polymerase (NEB's standard OneTaq) rather then the fancier Q5 polymerase I had been using.  Eureka - the PCRs all worked perfectly, giving strong bands of approximately the expected sizes.

...then I let everything sit around for a month while I dealt with other things...

Now I'm finally following up.  The first step is to digest these PCR products with a few other enzymes that should cut in either the genomic segments or the inserted SpecR cassette.  I've made rough predictions of the expected fragment sizes, which are all different for the ∆A mutant, wildtype cells, and the two mutants made by the Honours student (∆T and ∆TA).

The next step will be to do more PCR amplifications.  My original amplifications used the F and R primers that amplify a 2.6 kb segment containing the toxin and antitoxin genes (~300 bp each).  Now I'll use the F primer with the S-R reverse primer for the SpecR canssette, and the R primer with the S-F forward primer for the cassette.

If these both give the expected fragments then I'll (probably) send the PCR amplicons for each mutant to be sequenced.

If the sequencing confirms that the knocked-out genes are gone but the remaining gene is intact, then I'll give a sigh of relief.

Determining the competence phenotype of the Actinobacillus pleuropneumoniae antitoxin gene knockout:  My first test of the transformability of my first two ∆antitoxin mutants showed transformation defects, but in later tests they transformed within the range of the wildtype control.  But there was a lot of experiment-to-experiment variation in transformation levels (see graph below), so I'd like to do it one more time, to get clean publishable data.

Bonus DNA uptake results:  Just before Christmas the grad student finished his DNA preps of H. influenzae chromosomal DNA fragments that had been recovered after being taken up into the periplasm of competent H. influenzae.  He sent these to the former post-doc for sequencing, and the post-doc has now sent us some lovely preliminary results.  

The grad student had used DNA preps that had been sheared to two different size ranges.  We expected the genome coverage of the long fragments (mean length ~6 kb) to be fairly uniform, since almost all of them should contain at least one instance of the preferred uptake sequence motif.  These 'USS' motifs are distributed fairly evenly around the chromosome, with a mean spacing of about 1 kb.  We do see this, but with enough anomalies to keep things interesting.  And we expected coverage by the short fragments (mean length ~0.25 kb) to be much more strongly dependent on chromosomal position, since many such fragments would not include a USS.  And we do see this, again with interesting anomalies.

DAMN! Complete PCR failure!

Yesterday I ran a PCR amplification using DNAs from single colonies of 7 different A. pleuropneumonia isolates, and got absolutely no DNA fragments from any of them.

This amplification worked fine last time.  Can I figure out what went wrong?

  • I checked the run record of the PCR machine - it looks fine.
  • I checked the freezer box with the tubes of dNTP stock, 5X buffer, and Q5 polymerase, to be sure I hadn't picked up a wrong tube.
  • I checked my notes, to be sure I hadn't left out any component of the reaction mix.  I'd checked off each reagent as I added it, and the final volume was as expected.
  • I checked the 'F' and 'R' primer tubes (in another freezer box) to make sure I'd used the correct ones.  I'd made up more of the 10 mM dilution stock, so I also checked that I'd used the right tubes of the more-concentrated 100 mM stock to do this.  I even checked the remaining volumes in the two primer tubes - if I'd added one primer twice and not the other these volumes should differ by about 17 µl, but they're within a few µl.
  • I prepped the colony DNAs slightly differently.  Last time (prep 1) I put a whole colony into 100 µl of medium, then diluted 5 µl of that into 45 µl water and heated to 98 °C for 10 min to lyse the cells and free their DNA.  This time (prep 2) I put part of a colony into 100 µl water, heated that, and then pelleted out any cell debris.  Both times 1 used 1 µl of the heated sample.
What could I try now?
  • Use leftover Prep 1 colony DNA as template
  • Vortex the Prep 2 colony DNA tubes
  • Use as template purified DNA from lab stocks
  • Use a different pair of primers (the Spec-cassette ones worked well last time)
  • Repeat with the same reagents and template I used this time
  • Make fresh colony DNA preps
  • Make proper DNA stocks to use as templates
  • Prep 2 14-1 colony DNA, Spec primers
  • Vortexed Prep 2 14-1 colony DNA, F & R primers
  • Prep 1 14-1 colony DNA, F & R primers
  • Prep 1 14-1 colony DNA, Spec primers
  • 1/100 dilution of lab-stock DNA, F & R primers


    When I last posted, nearly 3 weeks ago, my first attempt to generate the desired full-length knockout construct had given a mixture of fragments rather than just the desired full-length one.  But this mixture did include a relatively faint fragment of the desired size (3.6 kb).

    I did try to get a better PCR product, but increasing the annealing temperature made things worse, and I couldn't find a PCR app that would let me diagnose which incorrect-priming reactions were producing the unwanted fragments.  So I went ahead and transformed competent Actinobacillus pleuropneumoniae cells with the mixture, selecting for spectinomycin resistance.

    My logic was that only the desired fragment is likely to efficiently transform cells to SpecR, because other fragments were unlikely to have the correct homologous DNAs flanking the SpecR cassette.  If the 3.6 kb fragment was what I hoped it was, I should get thousands of transformants even though it was only about 10% of the total DNA in the mixture.  If it wasn't what I wanted, then it would probably transform very inefficiently if at all and I would get very few transformants.

    I got thousands of transformants in my first try.  Since the real goal of this project is to find out whether knocking out the antitoxin gene prevents transformation in A. pleuropneumoniae as it does in H. influenzae, I did a quick-and-dirty competence assay, using 7 pooled SpecR colonies and some kanamycin-resistant A. pleuropneumoniae chromosomal DNA.  This gave lots of KanR transformants, but luckily I didn't take this as a final result.

    Instead I went back and redid the transformation of A. pleuropneumoniae with the PCR mixture, this time using a lot less  DNA.  I did this because the high DNA concentration used in the original transformation meant that many cells could have taken up multiple DNA fragments.  In H. influenzae such fragments are known to undergo ligation in the periplasm, allowing formation of chimeric recombinants that give very confusing results.  Using 100-fold less DNA still gave plenty of SpecR transformants, and I streaked 4 of these to get clean single colonies.  (Two of the picked colonies were large, and two were smaller, but all gave large colonies on their streak plates.)

    I tested 2 of these colonies by PCR.  Only one (14-1) gave the expected 3.6 kb full-length product and 1.1 kb Spec cassette products.  The other (14-2) gave no product with the full-length primers and what looked to be a slightly small product with the Spec primers.  The control wildtype cells gave the expected 2.6 kb full-length fragment and no Spec fragment.

    At the same time I tested both colonies for the ability to be transformed.  Both were defective, with transformation frequencies 100-fold lower than the wildtype cells.  This is the most interesting result - it suggests that the Toxin-antitoxin system in A. pleuropneumoniae plays the same role in competence as its homologue does in H. influenzae.

    Next steps: More comprehensive characterization of all the A. pleuropneumoniae mutants.  First do the full-length PCR on colonies 14-1, -2, -3 and -4, on the ∆toxin and ∆toxin/antitoxin mutants made by the honours student, and on the wildtype control, this time running the gel more slowly to better characterize the fragment lengths.  Then do additional PCRs using other primers, to confirm the mutant structures, and repeat the competence assays on all these strains.

    I'll also need to get all the final mutants sequenced, to confirm that they have only the expected deletions. I'll email the former RA to ask her the best way to do this (do I send genomic DNA or PCR products, what primers are best...).


    Here are the results of the first attempt at getting full-length PCR products:

    In the left lane (high template) I see a faint full-length band, and a stronger band the expected size of one of the expected intermediates.  In the right lane (low template) I see only the intermediate band.

    This is a fine result.  I'm now using 0.5 µl of the high-template reaction as the template in a new reaction with the same F and R primers and an annealing temperature optimized for them.  I hope this will give me lots of the full-length product.

    In anticipation of having the desired full-length DNA fragment, I've just streaked out the recipient Actinobacillus pleuropneumoniae cells I will transform this fragment into.  There are several steps I need to do before the final transformation:

    • Streak out the honours undergrad's A. pleuropneumoniae SpcR mutants (she made three different ones with the same cassette I'm using).
    • Check the sensitivity of A. pleuropneumoniae to spectinomycin, since this is the selection I will be using for transformation by my fragment.  The honours undergrad did this but her notes are not very good here.  I need to identify a concentration that will prevent colony formation by the sensitive cells but allow it by the resistant cells.
    • Make a competent stock of the recipient (SpcS) by growing the cells in MIV starvation medium.  
    • Check the competence of these cells by transforming them with genetically marked DNA.  I know I have some old DNA for this purpose (NalR?), but it would be good to select for SpcR using DNA from one of the undergrad's SpcR strains, if I can find this.
    Before doing the final transformation I should also digest my transforming fragment with a couple of diagnostic restriction enzymes, just to be sure it is what I want.