Field of Science

Might H. influenzae competence be phase-variable?

One of the postdocs just raised an issue I've never seriously considered. Many surface structures on bacterial cells undergo what's called "phase variation". That is, a key gene controlling the structure has evolved to have a high rate of mutations that switch it from an active allele to an inactive allele, and from the inactive allele to an active one.

By "high frequency" here I mean more often than one switch per million cell divisions. Switching is thus still a very rare event, but is much higher than the background mutation rate for normal DNA sequences. Such elevated frequencies are usually caused either by short sequence repeats that cause DNA polymerase to add or miss bases in critical positions, or by specific DNA-altering enzymes that recognize the gene.

That's the proximate cause of the variation. The ultimate cause (the evolutionary cause) is thought to be natural selection created by predators or host immune systems that recognize the surface structure and attack cells expressing it. Under such pressure, a cell that has turned the structure off will have an advantage, so cells with elevated mutation rates affecting the structure are favoured. Because the structure is strongly advantageous in the absence of external attack, selection favours cells that also have a high rate of reversion mutations that switch the structure back on. Such genes are often called "contingency loci".

Competence for DNA uptake requires expressing DNA uptake proteins on the cell surface, so it's a logical target for attack by the host immune system, and thus perhaps for phase variation. But how would we detect it? In Neisseria competence is known to be phase variable, but only because it depends on the phase-variable expression of type 4 pili, a phenotype that is easily assayed in the lab. Screening H. influenzae cells for phase variation of competence is likely to be very difficult, as our only assays are uptake of radioactive DNA and transformation to antibiotic resistance.

Rather than screening for variation, a more efficient approach is to examine the H. influenzae genome for sequences that could promote such variation, and check each for its ability to affect competence. These have been thoroughly investigated by Richard Moxon and his colleagues. They found no enzymatic switches but many short sequence repeats affecting production of complex carbohydrates on the cell surface. Now we need to carefully check whether any of these could also affect genes needed for DNA uptake.

Simplifying the cyclization experiments

One of the analyses I proposed in the DNA-uptake grant proposal is designed to find out whether uptake signal sequences are unusually flexible or bent, by testing whether presence of a USS helps short DNA sequences bend around into circles whose ends can be joined by DNA ligase.

Because competition for grants is very tight this time around, we want to be prepared to submit even better proposals in September if the ones we submitted two months ago aren't successful. This means we want to have lots more preliminary data showing that the experiments we propose will actually work. So one of the post-docs has been working to get preliminary results for this circularization experiment.

The DNA fragments should be about 200bp long; shorter ones usually can't circularize at all, and longer ones have enough flexibility that the ends readily bump into each other. Even for fragments in the right size range, exact length is critical because of the 'polarity' of the ends of the double-stranded DNA. Each 'end' is really the ends of two base-paired strands, one ending in a 3'-OH and the other in a 5'-P. When the ends do bump into each other, ligase can only join them if a 3'-OH is aligned with a 5'-P (the bond must be a 3'-5 connection, not a 3'-3' or 5'-5'). Because the two strands of the DNA wind around each other every 10.4 base pairs, the length of the DNA fragments must be approximately a multiple of this length so that the ends will meet in the right alignment. The strategy is to use PCR to synthesize fragments of the right length, and she designed two sets of primers, giving fragments of 208 and 260bp (my notes say 260 but I think I have it wrong as this seems too long, unless it's the positive control).

The ends created by the PCR process are not easy to ligate, because they have inconvenient incompatible tails, so her primers include sites for digestion by restriction enzymes. Cutting both ends of the PCR product with the same restriction enzyme will generate compatible 'sticky ends' that can base pair with each other. The base pairing will hold together any ends that do bump into each other in the right orientation until ligase can seal them together permanently.

Sounds good so far. But there's one more factor. The circularization reactions must be done at low DNA concentration to decrease the frequency of ends of different molecules bumping into each other. This intermolecular reaction has 'bimolecular' kinetics, meaning that its rate depends on the DNA concentration. In contrast, circularization has 'unimolecular' kinetics, and its rate is independent of the presence of other molecules and thus of DNA concentration. Doing the reaction at low concentration is easy (need less DNA), but detecting the results of the ligation is hard, because small amounts of DNA (linear or circular) are difficult to see in the gels used to separate the different conformations that result from the ligation.

The solution is to label the DNA fragments with 32-P, making even very small quantities easy to detect. The post-doc followed a published method for labeling DNA for these experiments, which should put a 32-P at each 3' end of each fragment. The fragments are first cut with the restriction enzyme, purified using little spin-columns, and then incubated first with a phosphatase, to remove the non-radioactive P from each 3' end, and then with the 32-P nucleotide and a kinase, to put the hot phosphate on. Then the fragments are purified again.

Initially I was concerned by the low recovery; only about 10% of the input DNA was recovered after all the intervening reactions and clean-up steps. After a bit more thought I became more concerned by the labeling reactions, mostly because I've always found phosphatases to be nasty treacherous enzymes that don't know when to stop. If the phosphatase removes more than the single terminal phosphate, the fragment will not be circularizable even if the kinase then does its job correctly. Furthermore, any fragment that the kinase misses will also not be circularizable, even if the phosphatase has behaved itself. Even if one end is processed correctly, any problem with the other end will still prevent circularization. In principle these problems can be controlled for, but any experiment-to-experiment variation will invalidate the conclusions we're hoping to achieve.

Luckily, once I started worrying about these issues I realized that we can eliminate both the recovery problem and the phosphatase/kinase problems by labeling the DNA internally rather than at its ends. So the new plan is to add a labeling step after the PCR reaction. This will be essentially one extra PCR cycle, this time with one radioactive precursor nucleotide added to the mix. The resulting DNA fragments will then only need to be digested with the restriction enzyme and cleaned up once.

When the post-doc gets back from her visit home, we'll still need to solve the problem of why the gels run so oddly, but at least we'll have enough labeled DNA to do lots of tests. The gel problem may be related to the high concentration of ligase needed in these reactions. The standard ligase stock is purchased in 50% glycerol at the low concentrations needed for cloning reactions, and the circularization reactions use so much ligase that they are about 25% glycerol. I'm hoping we'll be able to buy a high-concentration ligase stock, rather than having to make our own ligase....

no transductants??

I thought I had a lot of KanR AmpR transductants from infection of the ppdD::lacZ strain (AmpR) with the P1 lysate made on the sxy::kan strain, because lots of colonies grew overnight on the Kan+Amp plates from this infection but not from mock-infected cells. But I picked 4 colonies and streaked them on various plates (Kan, Amp, Kan+Amp, Maconkey-lac) and they mostly didn't grow. Now I suspect that some ofthe plates I used may have been too old (Amp is unstable) or otherwise problematic.

My test transduction of lacY from W3110 into C600 doesn't seem to have worked either (though I did find the TTC and get cute little red colonies). And the subsequent transduction of crp::kan into the ppdD::lacZ strain doesn't seem to have worked either.

After I carefully recheck the controls I'll redo it all with fresh plates.

Doing it right (P1)

I've been futzing around with phage P1, trying to get a high-titer lysate from a single clear (P1vir) plaque, with no success. Part of the problem is the high frequency of what I guess are revertant phage, giving larger and turbid plaques, and partly it's that I've been trying to squeeze the work in between other stuff (peer-reviewing manuscripts and a proposal, preparing my freshman biology final exam, etc.).

But today's clear, and I'm ready to roll. Last night I innoculated cultures from single colonies of the E. coli wildtype strain W3110 and three mutant strains: ppdD::lacZ, sxy::kan and crp::kan. Last night I poured lawns of W3110 with appropriate dilutions of two of my not-very-satisfactory lysates. When I could the plaques this morning I'll know the titer of these lysates, and can then use one or the other to infect the four E. coli strains. I'll do these infections in broth and maybe also on plates.

Then I'll collect the lysates and titer them all overnight on W3110 lawns. I may also do a test transduction overnight, using the W3110 lysate (before I know its titer) to transduce the lacY gene into the lacY mutant strain C600. This should give me all the information I'll need for tomorrow, when I want to infect the sxy and crp knockout mutant strains with the ppdD fusion lysate, and vice versa, and select for transductants that have both the ppdD fusion and the sxy or crp knockout. Then I can test whether the high 'baseline' expression of lacZ in the ppdD::lacZ fusion is due to high baseline activity of its CRP-S promoter, as such activity should drop dramatically if sxy or crp is knocked out.

I'll need to make some minimal-lactose plates to properly score the result of my test transduction, for which I'll need to make up and sterilize stock solutions of the amino acids threonine and leucine, the vitamin thiamine, and lactose, because C600 is thi- thr- leu- as well as lacY-. (I used this strain a lot in grad school, and its genotype is burned into my brain.) And ideally I should also make up some of the TTC we used to put into minimal plates that makes the colonies turn red and easy to see. Right now I can't remember what TTC stands for, but it will come back to me, and I think we have a big bottle of the stuff somewhere.

How to compare protein sequences?

In the last post I described an analysis that depends on comparing protein sequences. There are two different ways to do the comparison; and I need to decide which is more appropriate for our analysis.

Both methods rely on first aligning the amino acids in the proteins to be compared (here we'll only be comparing two proteins at a time), and then comparing the amino acids at each aligned position. The goal of the alignment is to align amino acids that are homologous - that is, those that are similar because of descent from the same position in the ancestral sequence both proteins evolved from. (If the proteins are not themselves homologous the analysis can't be done.)

In studies of evolutionary relationships, the usual method of comparison is to simply count how many of the positions have identical amino acids, giving a "% identity" score. In studies of protein function, scores based on the functional similarity of the aligned amino acids are often used. These rely on a matrix that gives similarity scores of all pairwise combinations of amino acids. These matrix scores are themselves derived from comparisons of large numbers of aligned amino acids, giving highest scores to amino acids which most often have evolved to perform the same role in a protein. For example, valine and leucine are commonly found in homologous positions, and matrices give this pair a high score. Wikipedia gives a good explanation of the use of matrices to compare protein sequences.

I'll post later about the issues raised by our analysis problem.

Analyzing the effect of USS on the coding function of genes

While I've been doing other things a collaborator has been working hard on a comparative genomics project that will tell us how much impact uptake signal sequences (USS) have on gene function.

Reminder: USS are short sequence motifs (the longest are ~30bp) present in many copies in the genomes of naturally transformable bacteria, probably because the cells preferentially take up DNA fragments containing the motif. Most of the USS in the Haemophilus influenzae genome are in coding sequences, and we want to find out whether their presence forces genes to specify sub-optimal amino acids at positions encoded by USS.

This analysis is testing the effect of USS by comparing the amino acid sequences of proteins with and without USS. For each H. influenzae gene with one or more USSs, we first find homologous protein sequences from at least three genomes with no USS. We compare these three protein sequences with each other (that's three no-USS comparison scores), to get a measure of how strongly selection acts on the protein, especially on the segment that in H. influenzae is specified by a USS. Then we compare each of the three with the H. influenzae sequence (that's three +USS comparison scores).

Then we compare the mean no-USS score with the mean +USS score; if the scores are similar then we conclude that the USS doesn't significantly constrain the protein's function. There's a lot of random variation, so we do this for every USS-encoded gene in the the genome and then plot each pair of scores as a point on a scatter-plot. Points that fall on a diagonal line represent genes whose USSs don't constrain them, and points that fall below the line represent genes whose USSs may be causing problems.

We're not interested in specific genes, but in the general picture - we want to know whether, on average, USSs cause problems or not. A preliminary analysis done years ago suggested they don't, but the answer from this new improved analysis will be interesting in any case.


Well, lysates aren't really that exciting, but it's been a while since I got to do anything with phage. I used both the turbid-plaque and the clear-plaque streaks to infect two cultures, one with the ppdD::lacZ reporter fusion and one with the sxy::kan knockout. They all grew up nicely and then promptly lysed. Titering showed that the clear-plaque phage gave about 2x10^10 pfu/ml, all the small clear plaques expected of P1vir. The other phage gave a mix of clear and turbid plaques, suggesting that the colleague who made the source lysate had neglected to start from a single plaque (naming no names).

Today I'm going to make a good stock lysate, by infecting the "wild type" E. coli strain W3110. (I put wild type in quotes because the strain does not carry the lambda phage present in the original "wild" K-12 isolate. And I'm going to do the transduction that was the point of getting a P1 lysate, to create a strain carrying both the ppdD::lacZ reporter and the sxy::kan knockout). I have lysates of both strains so I'll try the transduction in both directions, each selecting for AmpR KanR (the reporter carries AmpR). And a control transduction transferring lacY from W3110 into C600.

P1 (vir?)

I have some phage P1, courtesy of a colleague. But (as usual) complications have arisen. The colleague is out of town, and his P1 collection consisted of five different lysate tubes, of unknown ages as none of the tubes had a date. So I tested them all for viable phage by streaking a drop onto a plate that had a lawn of host cells on it, and looked this morning for the plaques (spots of lysis) that phage would make.

Two of the lysates gave no plaques, so they were probably very old. Two others (#1 and #4) gave lots of conspicuous plaques, and #2 gave tiny plaques. The colleague tells me (by email) that #1 is his newest stock, but the big plaques are not what I expected.

The phage should be not normal (wildtype) P1 but a mutant called P1vir (vir = virulent). Wildtype P1 can form lysogens, becoming dormant in cells that are then immune to being killed by other P1. The large plaques I see look like plaques made by wildtype P1, because they have cloudy centers (the plaques are "turbid"), where surviving cells are growing; these are likely to be lysogens. The virulent mutant should not form lysogens, so its plaques should be clear. Our manual of bacterial genetics methods says that P1vir plaques are tiny, so I think lysate #2 is more likely to be genuine P1vir. Lysates #1 and #4 would then have been grown from P1vir that had mutated, reverting to wildtype. (Perhaps the person who made the lysate mistakenly picked a "nice big plaque" instead of the more common tiny plaques.)

Wildtype P1 is not very suitable for doing transductions because many of the survivors are likely to be lysogens, whereas we usually want to work with phage-free cells. I'll do test transductions with lysates #1 and #2; if #2 works well I won't bother doing experiments on #1 to sort out the virulence issue, but just make a good fresh lysate from a single plaque of #2. Part of this lysate will be given to the colleague who supplied the original lysates.

Before making proper lysates and testing transduction I need to grow up appropriate E. coli host strains. I did the preliminary checking on the strain carrying the ppdD::lacZ fusion. Unfortunately the computer that we keep our strain list on is in the shop, but one of the post-docs has a recent backup.

Results of beta-galactosidase assays

So I did some beta-galactosidase assays on the strain with ppdD fused to lacZ. The immediate goal was to characterize baseline transcription of the ppdD gene, with the longer goal of finding ways to turn it up by inducing expression of sxy.

The researcher who kindly sent us the strain with this fusion said that its colonies are pale blue on plates with the beta-gal indicator X-gal, but for me they were quite a strong blue. So I wasn't too surprised that my assays showed moderate beta-gal activity in all the conditions I tested. I tested cells in exponential growth and after overnight culture in LB, in LB+glucose (which should prevent production of cAMP and thus expression of the ppdD gene's CRP-S-regulated promoter, and of cells in LB+glycerol, which should allow cAMP production like plain LB but provide the same amount of extra carbon source that glucose does. I also tested cells transferred from log-phase growth in LB+glucose to minimal salts with added amino acids ("M9+caa"); a treatment that might roughly approximate the competence-inducing effect of transferring H. influenzae cells from sBHI to MIV. None of these treatments made much difference; all the samples produced between 150 and 500 units of enzyme activity per ml of cells.

So the next test is to find out whether the transcription of ppdD depends on Sxy. If it does, then this suggests that Sxy is being produced at least a bit under the standard conditions I tested. If not, the expression is genuinely baseline, and maybe I should test other genes as indicators of sxy expression.

To do this test I need to introduce our sxy knockout into the ppdD::lacZ fusion strain. So tomorrow I'll go searching for the needed P1 lysate. The colleague who has it has unfortunately gone off to Europe for a couple of months, but he tells me that one of the people in his lab can help me find it.

Preparing for the E. coli sxy induction experiments

The strain and plasmids we've been waiting for arrived on Wednesday (thank you to the Pugsley lab), so now I can get started on my attempts to induce sxy expression in E. coli. And we do already have an E. coli sxy knockout from the Japanese group, so solving the recombineering problems isn't critical (though I should get to work on that anyway).

What to do first? The initial tests will use a reporter strain carrying a fusion of the ppdD gene to lacZ. The person who sent the strain says it should be pale blue on X-gal plates, indicating weak baseline expression of lacZ. I should first characterize its lacZ expression more precisely by growing it under defined conditions and measuring the amount of beta-galactosidase (the lacZ product) with the substrate ONPG. This is a simple classic assay, done in every introductory molecular biology lab. I'll grow cells in rich medium and minimal medium, doing the assay at various cell densities.

I need to find out whether this baseline expression is independent of the transcriptional regulators Sxy and CRP/cAMP, which we know activate the ppdD promoter. This requires transferring the ppdD::lacZ fusion into strains that carry knockouts of either sxy or crp (or cya, which encodes the adenylate cyclase that makes cAMP), or transferring the sxy and crp knockouts into the ppdD::lacZ strain. We have both these knockouts, but I don't yet have the P1 lysate I'll need to do the transfers. And I need to find out the antibiotic resistances associated with each of these knockouts, so I can plan the selections. And I need to get all the strains from the grad student who's been working with them.

I do have what I need to test whether baseline expression of the ppdD::lacZ fusion is affected by cAMP levels. The grad student tells me cAMP is normally high in the rich medium LB, presumably because LB lacks glucose, causing the phosphotransferase system to activate adenylate cyclase to make lots of cAMP. If this is correct, simply adding glucose to LB should reduce cAMP levels. So should growing cells in a minimal medium with glucose as the carbon source. If the baseline lacZ expression from the ppdD promoter is due to weak activation of the promoter by CRP + Sxy, these conditions should reduce it, and adding exogenous cAMP should restore it. If it's just constitutive activity of the promoter, these conditions should have no effect.

Negative control for E. coli sxy experiments

I'm still planning the experiments to identify conditions that induce sxy expression (and possibly competence) in E. coli.

One issue I hadn't considered previously is the appropriate negative controls. If I'm using the sxy-inducible ppdD::lacZ fusion to indicate sxy expression, the best control will be a sxy-knockout mutant carrying the same lacZ fusion. We might already have such a mutant (from the wonderful Japanese group that provides clones and knockouts of E. coli genes), but if not I'll need to make one. This would be best done using the genetic technique called recombineering, which one of the post-docs has been trying to get working for us. I'll start working with her on this.

How hot can it be?

I'm going to test biotin-tagging of the ends of my big DNA fragments by doing a preliminary tagging with a radioactive (33-P) nucleotide. I need this test because I don't have specific information about the short single-stranded overhangs I expect these fragments to have. I don't know whether most fragments have overhangs, whether 3' and 5' overhangs are equally common, or how long the average overhang is.

But I realized that the unless the overhangs are very long, they will constitute only a tiny fraction of the DNA in such long molecules. So I don't expect much of my 33-P or biotin to be incorporated. I do know the specific activity of the 33-P I'll use (2500 Curie/mmol* **), and this lets me do a back-of-the-envelope calculation of how hot the DNA can get if the tagging reaction works perfectly.

[This is a long but not difficult calculation, relying on the kind of 'dimensional analysis' I learned in Grade 11 Physics class. It has Avogadro's number and Rosie's universal constant (10^18 bp/g) and arithmetic-simplifying assumptions that the average fragment is 75kb long and that a single 33-P nucleotide gets incorporated at each end. This last assumption would be right if 3' and 5' ends were equally common, blunt ends insignificant, and the average overhang about 8 bases.]

The result of this calculation: a perfect labeling reaction could incorporate enough 33-P to give only about 240 dpm per microgram of DNA. This is so low that the 33-P labeling experiment may not be worth doing at all. Using 32-P won't make a big difference, as the standard specific activity of 32-P nucleotides is 3000 Ci/mmol.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

*A Curie is 2.2 x 10^12 dpm (radioactive disintegrations per minute).

** This is close to the theoretical maximum specific activity, with a 33-P as the alpha phosphate of every nucleotide.