Field of Science

Yuck!

  

I found this in a small box of antibiotics in the cold room. It's mycophenolic acid that had somehow eaten through the foil ring (blue) that held the bottle's rubber seal in place.  

I had no idea why we would have this chemical, but Wikipedia says it's an inhibitor of de novo purine biosynthesis in eukaryote cells (but maybe not in E. coli?), so maybe we had been planning to try it on a protist or on Haemophilus.  I threw it out of course.

Strategy for making the HI0660/HI0659 double mutant

Here's my plan for making the mutant strain knocked out for both HI0660 and HI0659:

I'll start with the two single-knockout plasmids that the RA made by recombineering.  Both were made from the same parent plasmid containing a chromosomal segment (green) containing both genes and about 500 bp of flanking DNA on each side.  In the left plasmid the HI0660 gene has been replaced by a SpcR cassette (orange).  In the right plasmid the HI0659 gene has been replaced by the same cassette.


I'll cut both plasmids with the same two restriction enzymes.  SpeI cuts in the vector, close to the left end of the insert, and SacII cuts in the SpcR cassette, close to its right end.  Then I'll inactivate the enzymes (with heat or phenol extraction), mix the two digested DNAs, ligate the mixture and transform it into E. coli, selecting for AmpR and maybe SpcR.

The single fragments won't be able to self-ligate because the enzymes give different sticky ends, but 10 different bi-molecular ligation products are possible.  The plasmid I want (A+D) is shown below.  Three of the others won't be able to replicate (A+A, A+C, C+C), and three others will contain inverted duplications of the vector (B+B, B+D, D+D) and thus probably be unstable; they'll also be much larger than A+D and not SpcR. The other unwanted combinations will also be larger than A+D (A+B, C+D and B+C) so I should be able to easily distinguish them.
Once I identify the plasmid I want I'll just transform our wildtype H. influenzae strain with its insert DNA and select for SpcR.  Then I can find out whether deleting HI0660 eliminates the need for HI0659.

HI0659 progress and plans

I'm making progress in figuring out how the knockout of gene HI0659 prevents cells from becoming competent.  I don't know the answer yet, but I've ruled out some alternatives.  We know that the mutation blocks both DNA uptake and transformation, so the defect is not at the translocation or recombination steps - the mutant cells must either fail to induce their competence genes correctly or be blocked at some point in DNA uptake (assembly of the uptake machinery or uptake itself).

The first experiments were to see if treatments or mutations that normally induce competence would override the competence defect of the knockout strain (strain RR3112, HI0659::spc, but for convenience here I'll just call it HI0659-).  Competence induction requires that the CRP protein bind to its cofactor cyclic AMP (cAMP) and then induce transcription of competence genes, which is normally synthesized under competence-inducing conditions.  Adding cAMP restores competence to cells unable to synthesize it, so I tested whether the HI0659- competence defect was corrected by adding cAMP.  It's not.

I could also test whether the HI0659 mutation interferes with the ability of CRP, by assaying the strain's ability to ferment CRP-regulated sugars.  But I don't need to do that because one of the other experiments I've done (described below) shows that CRP regulation works normally in HI0659 mutants.

I also tested whether competence is restored to the HI0659- strain by mutations that cause expression of competence genes under conditions that normally repress this (hypercompetence mutations).  We have two sets of these mutations, in the sxy gene and in the murE gene.  I made double mutants by transforming these strains with DNA of strain RR3112, selecting for its SPcR cassette, and tested their competence.  They were not competent at all, even after normal MIV induction, so the defect isn't that the competence genes just require stronger-than-normal induction.

The next test asked whether HI0659- cells fail to induce competence genes.  This is a bit odd to think about since HI0659 is itself a competence gene whose transcription is induced by Sxy and CRP+cAMP, but maybe once some HI0659 gene product is made it increases or stabilizes transcription or translation of the other genes, or protects transcripts from degradation.  As I explained here, we have 'reporter' strains that let us detect transcription of the comA and rec2 competence genes because these gene's promoters have been fused to a lacZ gene, whose beta-galactosidase product is easy to detect with a colorimetric assay.

I introduced the HI0659- mutation into four fusion strains, two carrying a comA::lacZ fusion and two carrying a rec2::lacZ fusion, and assayed their production of beta-galactosidase and their competence.  Here are the results:


All the HI0659 mutants have the same beta-galactosidase levels as their HI0659+ parents (yellow bars and tubes).  Only the comA parent was included in this assay (the leftmost column), but you can see the induced and uninduced activities of both parent strains in the previous post.  Importantly, all the HI0659- strains were completely non-transformable (blue bars), confirming that they had replaced their HI0659+ allele with the HI-659- allele.

This result tells us that the HI0659 mutation does not act by interfering with normal transcription or mRNA stability of comA or rec2.  It's possible that it specifically affects the expression of another of the competence genes, but this is unlikely.

We've been preparing to do 'RNA-Seq' analysis of the HI0659 mutant - this analysis uses Illumina of other 'next-gen' sequencing of reverse-transcribed mRNAs to measure the amounts of transcripts present in the cell.  We have been hoping that it would reveal changes in transcription caused by the mutation, but the lacZ fusion results make that unlikely.

The RNA-Seq analysis is expensive and quite a lot of work - should we still do it?  The controls we'd need to do would give us lots of solid information about the regulation of competence, complementing the microarray analysis we did ten years ago.

I have one more analysis to do, suggested by the bioinformatics analysis I did a couple of weeks ago, described here.  The bioinformatics suggested that the HI0660/HI0659 gene pair might be a toxin-antitoxin system (or derived from one), with HI-0660 being the 'toxin' and HI0659 the 'antitoxin'.  If so, then HI0569's job is likely to be preventing HI0660 from doing something that prevents competence.  This is consistent with the normal phenotypes of the HI0660::spc and HI0660 unmarked mutants.  They both take up DNA and transform normally, even though the HI0660::spc insertion might be expected to interfere with expression of the downstream HI0659 gene and thus reduce competence.  If HI0659's job is just to stop the HI0660 product from doing something that prevents competence, then  the competence defect of the HI0659 mutant should be corrected by adding a HI0660 knockout.

This is simple in principle (just transform the HI0660 cells with HI0659::spc DNA), but complicated by how small the two genes are and how close they are to each other.  We have the E. coli plasmids carrying the mutations, and my plan is to instead construct a new plasmid that's deleted for both genes, with one spcR cassette inserted, and transform wildtype cells with this DNA to get the desired double mutant.  How easy this construction is will depend on whether the HI0660/0659 genes and the spcR cassette have convenient restriction sites, so I'm going to spend this afternoon looking for them.



That's more like it!

I'm using fusions of lacZ to the comA and rec2 competence genes to find out whether the HI0659 mutation acts by blocking competence induction.  The first step was to put the HI0659 mutation (strain RR3112, HI0659::spc) into strains carrying these fusions.  That was easy, because the fusion strains are still transformable (the fusions were introduced as duplications of the chromosomal comA and rec2 genes rather than as replacements), and because I can select for the HI0659 mutation using spectinomycin.

We has two versions of each fusion strain in the freezer - the original strains sent to us by their creator, and derivatives we'd made by transforming the fusions into our standard strain KW20.  I decided to start by using them both, in case anything wonky turned up.

So I made the strains competent and transformed them with RR3112 DNA and, as a control, with our standard MAP7 DNA.  Both transformations worked fine, with transformation frequencies between 10^-3 and 10^-2.  I streaked two colonies from each RR3112 transformation onto chloramphenicol plates to make sure they still had the fusion - only one didn't.  The next steps are to freeze these now strains (in case we want to do more with them), and to make them competent by incubation in MIV starvation medium.  I'll then test the competent cells for transformation (should be negative) and for expression of the lacZ fusions.

But first I needed to check that competence induction did induce the fusions on the parent strains.  When I had made these competent (for the RR3112 transformations) I had frozen aliquots of log-phase and competent cells, so I thawed them out and did beta-galactosidase assays on them.  My first set of assays were a complete failure (no yellow colour even after 18 hours!), because I'd used 10% SDS rather then 0.1%, but the second set worked great, with bright yellow colour after 20 minutes.


Here's the graph:


I forgot to label the XY-axis - it's the OD420 reading, indicating the level of expression of the fusion.  (I didn't bother to convert these numbers into Miller units.)  Three of the strains have almost no fusion expression in log phase and high expression after competence induction, which is what we expect of strains with normal competence regulation.  But the fourth strain (878) has high expression in log phase, because it also carries a mutation (murE749) that causes the competence genes to be highly induced even in log phase, giving a 'hypercompetent' phenotype.  I'll include the HI0659 derivative of this strain in my assays as a control.

Why does white gunk develop on the anode?

 

The gunk is soft, almost-gel-like.  In the photo it's sitting in lumps in the buffer in the bottom tank, but only because I gently scraped it off the anode wire with a spatula after I took the gel plates out of the apparatus.  The anode was clean before I ran the gel.

The gel buffer was TBE with 10 mM MgCl2 added; might this be Mg(OH)2?

Later:  The white gunk is alkaline and dissolves in acid but not alkali.  It's not from the gel.  It also appeared when I ran a test minigel using TAE buffer with 10 mM MgCl2 added.

Why my gels wouldn't set?


This is the expiry date on my bottle of TEMED (the catalyst for acrylamide polymerization).

500 ml?


I think our polypropylene measuring cylinders must shrink with age or autoclaving.  The black lines mark the height of the water when this '500 ml' cylinder is filled with 500 grams of water.  (Well yes, the temperature is only ~20°C, not 25°C).  The markings are off by about 50 ml!

And the USS is bent at...

So yesterday I poured and ran my first polyacrylamide gel in many many years.  Actually I ran my second gel, because the first one didn't set at all.  I was quite proud of myself for remembering a lot of little tricks, like flushing the wells before loading them and flushing the bubbles out from under the gel.

And the results told me that the USS is bent at the T-tracts.



Above is the gel photo. I tested seven different versions of the USS sequence, each embedded in an otherwise-identical 200 bp fragment.  The white bands in the gel are the positions that the DNAs migrated to during the electrophoresis (6% acrylamide in 0.5 x Tris-Borate buffer, run at 60-70 volts for about 5 hours).  'S' indicates DNAs that I scored as running slower, and 'F' DNAs that ran faster.

Below is the key to the differences between the DNAs:

The first good result is that the randomized-sequence DNA did run faster than the consensus USS, as it had in the former postdoc's experiment five years ago.  That might not look obvious in the gel photo, where the random fragment (in the leftmost lane) is at about the same position as the USS fragment on its right.  But this end lane ran slower than the others because it stayed cooler - I've tried to indicate this by the shape of the blue tracking-dye bands I've drawn onto the gel.

Most of the other DNA fragments ran with the same mobility as the USS, but DNAs 6 and 7 ran faster, like the randomized-sequence DNA.  These are the only two DNAs whose T-tracts are changed: 6 has one T-G substitution in each T-tract, and 7 has these plus the same two outer-core changes as DNA 4.

You may wonder why I didn't run the DNAs for longer. to better resolve the migration differences.  But I expected the DNA to have run much further; the Molecular Cloning manual I was using as a guide said that xylene cyanol (the upper turquoise dye band, labeled 'xc') would migrate at the same rate as a 260 bp DNA fragment in a 5% polyacrylamide gel, and at the same rate as a 160 bp fragment in an 8% gel, so I expected my DNAs to coincide with the xc band.

The only explanation I can think of is that I put 10 mM MgCl2 into the gel buffer but forgot to also put it in the running buffer in the tanks.  The manual says that even minor differences in ionic strength 'can greatly distort the migration of DNA'.  So I should probably repeat the gel with the correct buffer.  Maybe I'll also try running it in the cold room.

What part of the USS is bent?

For years I've been referring to the gel photos and DNA-structure as evidence that the H. influenzae consensus USS sequence is naturally bent.  The gel images on the left (gel run by a former postdoc) show that a 222 bp fragment containing the consensus USS migrates slower than the control fragment with a randomized-sequence version of the USS.  Slow migration is an established consequence of DNA bending.  The structure diagram on the right (generated by the MDDNA website, which appears to have gone tits-up) compares the same consensus and randomized sequences, and predicts a slight bend at each of the USS T-tracts.


The current postdoc's analysis of uptake bias indicates that the T-tracts play a relatively minor role in uptake specificity, which got me thinking about whether it really is the T-tracts that are bent.  We're in a position to test this, since he's synthesized a number of different variants of the USS for uptake assays, and I can easily repeat the gel analysis to see if any of the changes affect gel migration.

Below is a list of the different USS variants that he has synthesized.  (The DNAs don't contain 'n's; I've just put these in at positions where the bases don't matter for DNA uptake.)


From the top, we have:
  • the consensus USS
  • three different USS with changes in different parts of the outer core of the USS (A4G, T6G, and T11G)
  • a USS with two changes in the outer core (A4G + T11G)
  • A USS with a change at the most important position in the inner core (C8A)
  • a USS with two changes, one in each of its two T-tracts (T17G + T27G)
  • a USS with 4 changes, two in the outer core and one in each T-tract (A4G + T11G + T17G + T27G)
  • a control 'USS' with the same bases as the consensus USS but in randomized order.

These variants let us examine effects of changing the inner core, the outer core and the T-tracts, singly and in combination.  We expect to see slower migration of the consensus fragment than the randomized fragment.  Finding that all the variant fragments have the same mobility as the consensus fragment might not tellus much because the individual changes we're testing might not have dramatic effects (e.g. maybe we'd need to change the whole T-tract, not just one T).  But finding that one of more variants migrates faster than the consensus would be very informative.

The fragments we'll test aren't cloned; the psotdoc generates them when needed using PCR and specific primers.  So he's going to make them for me, and clean them up to remove primers and primer-dimers.  Then I'll check their DNA concentrations (because concentration can affect gel migration) and run them all in a gel.  

I'll try to dig up the gel details in the former postdoc's notebooks - I/m guessing 1% agarose, TAE buffer, and a fairly high voltage, but she might have done something different.  The gel image was included in our 2007 CIHR grant proposal so I'm guessing she did the experiment in January or February of 2007.  OK, I found the postdoc's notes (December 2006).  The gel was 6% acrylamide, run at 70V, with 10 mM MgCl2 added to the buffer.  Hmm, I wonder if we have acrylamide made up.

She then resequenced both the DNAs to confirm that they were the same lengths (these were cloned fragments).  This seems a bit silly for the PCR fragments we'll test, though it might we worth it if the gel results are interesting.  I can't think of any other way to check that the fragments are the same lengths - I don't know of gel condition where bending wouldn't happen or where it wouldn't affect mobility.


Questions about the new 'journal' PeerJ

Yesterday people were tweeting about a new open-access journal called PeerJ, which at present offers a provocative web page and blog but no solid information.


Their website says:
The $99 Sustainable Model
PeerJ is establishing a new sustainability model. Researchers will be able to purchase Lifetime Memberships, starting at just $99, giving them the rights to publish their articles in our peer reviewed journal. All published articles are made freely available to the public. Subscription fees made sense in a pre-Internet world, but now they just slow the progress of science. It's time to change that.
But there's no explanation of what they will offer or how it will be accomplished.  Apparently this will be revealed in a few weeks.  In the meantime viewers are encouraged to build the buzz by pointing more people to the site.

There's a good discussion here about the costs of running a site, comparing it to the ArXiv and to PLoS ONE.  Maybe this could be feasible.

But for me the big issue is the peer review.  Peer review has critical parts that can't be automated or delegated to authors, and if this fails then the journal becomes just another repository for anything researchers want to post.  Somebody has to select the reviewers - authors can be encouraged to suggest them, but peer review that depended entirely on the reviewers suggested by the authors would be close to worthless.  And getting reviewers to agree to take on a manuscript, and to do the reviewing they promised, takes more than automated emails.  And someone with appropriate expertise has to read the reviews and make a decision about whether the manuscript should be published.  Is this all going to be done by volunteers?  Or is this going to be a journal with very low standards?

The soon-to-be-former Publisher of PLoS ONE, Peter Binfield, is apparently behind PeerJ.  He certainly should know what he's doing, but this combination of spin and secrecy is something I expect of companies looking to make a buck out of the unwary.

HI0659/HI0660 bioinformatics

I haven't done my experiment with the HI0659 knockout yet, because I'm waiting for the RA to pop in and get me the correct strain she made (it hadn't been frozen in the lab stocks).  In the interim I've been poking around in the databases to see what I can find.

HI0659 has a helix-turn-helix motif and not much else (it's only ~100 aa long).  Hi 0660 is about the same length; it has a motif typical of Holliday junction resolvases.

In our paper about the knockout mutants, our summary table says that no homologs of HI0659 have been found, and in the text we say that alleles of HI0660 in other H. influenzae strains often contain deletions and that homologs are missing from most Pasteurellaceae.  I've now found good homologs of both HI0659 and HI0660 in Actinobacillus pleuropneumoniae, but nothing in any other Pasteurellaceae.  Homologs in other bacterial groups are rare - the only genus whose name I recognize is Streptococcus, where several species have homologs of both genes, but many don't.

Here's the Genomes Region Comparison view from the JCVI Comprehensive Microbial Resource.  I included quite a few other Pasteurellacean and Streptococcal species in the search, but only these gave homologs.


A BLAST search with the HI0659 protein sequence turns up homologs in only the same species.  Some of these are annotated as members of an 'Xre-family toxin-antitoxin system' (I think HI0660 is homologous to the toxin component, and HI0659 to the antitoxin component).  HI0660 is also tagged as a member of the Gp49 superfamily (also phage proteins I think).  Xre family repressors are known to perform a variety of regulatory functions unrelated to toxin-antitoxin systems' (ref).  The same paper suggests that the Tad toxin components might be mRNA-cleaving ribonucleases.  Maybe that's what HI0660 does, and HI0659 is a repressor that prevents it from acting.  If so, and if sxy mRNA was HI0660's target, then the mutant phenotypes would make sense. 

Maybe I should transform the HI0659::spc mutation into our lacZ reporter strains (sxy::lacZ, comA::lacZ and rec2::lacZ) and see if it affects their expression.  I think I might have pooh-pooh'd this idea when the postdoc or RA suggested it a few months ago, but now maybe I see the light.  We're going to do RNA-seq on the HI0659 mutant, but if its only job is to repress HI0660 we might miss this. But, if HI0660 is a mRNA-destroying RNase, we might see the loss of other mRNAs.

Back to the bench

I think I'm about done with manuscript writing for a while.
  • The RA's manuscript on natural competence in E. coli, has been published in PLoS One.
  •  The visiting grad student's paper on natural competence in Gallibacterium anatis is in press at Applied and Environmental Microbiology.
  • The RA submitted her manuscript on all her H. influenzae competence knockout mutants last week; it's now under review at J. Bacteriology.  
  • The post-doc's DNA uptake manuscript will be ready to submit tomorrow or Monday to Nucleic Acids Research.  
  • On Thursday I submitted what I hope are the final revisions for my PLoS Biology opinion piece on genetics teaching.  
  • And, we're still waiting for the word from Science about our revised #arseniclife manuscript (it was sent back to the reviewers).
So it's time to do some experiments.  In fact I hope that the whole summer will be time to do some experiments.  I think I'll start with something easy, checking the knockout mutant of HI0659 for effects of known competence inducers.

HI0659 is a competence-induced gene that is needed for both DNA uptake and transformation.  Lots of genes are needed for these processes, but what's surprising about HI0659 is that the protein it specifies doesn't appear to be part of the DNA uptake machinery.  Not only does it lack any features of known DNA uptake proteins, it's small, cytoplasmic, and has a helix-turn-helix domain (these usually function as DNA binding elements).

I'm going to check whether any of the known competence inducers can turn on competence when HI0659 is knocked out.  Two of the inducers I'll test are genetic - the sxy-1 mutation and the murE749 mutation, both of which cause cells to be competent when they normally wouldn't be (hypercompetent).  The third is the small molecule cyclic AMP.

I'll need to make double mutants (sxy-1 HI0659::spc and murE749 HI0659::spc).  This should be straightforward - I'll just transform the hypercompetent mutants with DNA from HI0659::spc and select for spectinomycin resistance.

First I need to streak the cells out - no time like the present.  Then on Monday I'll do a quick DNA prep of HI0659 and transform the hypercompetent mutants.  No need to make them competent - they'll be competent enough straight from the plate, or in log phase growth.

Monday:  The cells all grew up nicely but I'm not going to proceed until the RA has confirmed (by email, she's on leave) that the HI0659 strain in the freezer stocks is the correct new one, not the incorrect old one.  The label looks old, and there's no note in the strain list saying that the vials of the incorrect strain were discarded and replaced..

A simple bioinformatics experiment, done with Word

One of the results of the postdoc's lovely analysis of DNA uptake specificity is that one block of four positions of the 32 nt 'uptake sequence' are critically important for uptake.  This is 5'-GCGG-3' (and 5'-CCGC-3' in the other strand).

The H. influenzae genome contains 10,044 occurrences of this sequence, but a random sequence of the same length and base composition is expected to only contain 4107.  This suggests that the molecular drive arising from biased DNA uptake may have caused the the excess ~6000 occurrences to accumulate in the genome.  We know that about 2000 of them have strong matches to the full uptake sequence motif, but what about the rest?  Might they also have more-or-less-weak matches, because they are under weaker  drive?

The postdoc could do a thorough test of this using R, but he's busy with the final polishing of his uptake-motif paper (to be submitted by Monday, we hope).  So I just did a quick test using Word.


I had used Word's Find/Replace function to count the GCGGs and CCGCs, so I did it again, this time highlighting the occurrences.  I copied the sequences around the first 30 GCGGs in the genome sequence, and around 30 CCGCs from the middle of the genome, aligned them by hand, and used WebLogo to look for any patterns in the flanking sequences.

No patterns.

Interaction effects (yes, again...)

I'm mostly very confident that the ideas I wrote in my last post were correct, but the postdoc still has doubts, and each time I try to explain my position to him I come away with my own little niggling doubts.   So here's a different perspective.

This time I'll start from the biology of DNA uptake and the consequent accumulation of preferred sequences in the genome.


Our computer simulation model showed that the sequences that accumulate in the genome have the same properties as the bias of the uptake machinery.  The model could consider only the base biases of each position individually (there were no between-position interaction effects), but I think we can safely expect that any interaction effects in the uptake bias will also be seen in the sequences that accumulate in the genome.

When we, as researchers, examine the sequence patterns of the preferred-uptake sequences and the overrepresented-in-the-genome sequences, we first compare the single-position pattern and then compare the interaction-effect pattern.  Because the two sets of sequences are the same, we will find the same patterns.

        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Now let's consider a real research situation, where we don't know anything about the underlying biology. We identify a set of sequences that are preferred by the uptake machinery, and another set that are overrepresented in the genome.  We analyze each set, and find that the single-position patterns are different.


What can we infer about the underlying biology?  We must conclude that the two sets of sequences  have different properties, and thus that the sequences preferred by the uptake machinery are not the same as the sequences overrepresented in the genome.  The differences must be due to post-uptake forces that alter the sequences accumulating in the genome.

We might go on to analyze the interaction effects in the uptake set of sequences and in the genomic set of sequences.  These analyses may give us insights into the uptake process and the post-uptake forces, but they won't change the fact that the two sequence sets are different.

Interaction effects in uptake bias and in the genome

Well, the postdoc and I continue to struggle with our revisions to his manuscript about the sequence bias of the Haemophilus influenzae DNA uptake machinery.  Quite a bit of the struggle is with each other, as we each try to clarify what we think.

One issue that's just come up is how interactions between bases at different positions of the preferred sequence motif will affect what sequences accumulate in the genome.

The top part of the figure below is a drawing of a double helix of DNA, with a specific sequence drawn on it, and below that are two 'sequence logos'.  The first one is the pattern derived from the uptake sequences in the genome, and below that is the pattern derived from the sequences that were preferentially taken up by the cells' uptake machinery.  The overall difference in height of the two logos isn't significant (they use sequences derived in very different ways), but the differences in the relative heights of the individual positions are.  For example, in the genomic logo all of the Gs on the left are about the same height, but in the uptake logo the first G is much smaller than the others.


One issue our paper needs to address is the reasons that these two logos are so different.

Both of these logos are derived by considering only how frequent each base (A, G, C or T) is at each position in the set of sequences being analyzed.  The analysis doesn't consider the actual sequences.  For example, the two sets of sequences in the figure below (made using WebLogo) give the same logo. But the two sets of sequences are different; in the left one we have only strings of six As or six Ts, whereas in the second the As and Ts are often interspersed or in strings of different lengths.


The postdoc has done a detailed analysis of the actual sequences taken up by the cells (see figure in this post), to find out the importance to uptake of the interaction effects that the logo analysis doesn't consider.  We were both thinking that these interaction effects might be responsible for at least part of the difference between the uptake-bias logo and the genomic logo.

But one of the reviewers of the version we originally submitted said that we were wrong: "If the consensus in the genome reflects only the incoming DNA and the filtering at the outer membrane (as the authors state) then the two consensus should be similar with or without interaction effects because the genomic consensus is the simple result of the initial consensus."  I've thought about this today, and I now think the reviewer is correct.

Let's consider two simple situations for an imaginary uptake machinery whose preferred sequences gave the A&T logo above.  In Situation 1, the actual sequences were those in Set 1, and we would conclude that there were strong interaction effects between the positions because the machinery preferred a sequence where six Ts in one strand were basepaired with six As in the other strand.  In Situation 2, the actual sequences were those in Set 2, and we would conclude that the uptake machinery preferred a string of six A:T basepairs but didn't care which base was in which strand at any position.

Now let's imagine that species exist with each of these uptake biases, and that each uptake bias is causing its preferred sequences to accumulate in its species' genome (because these sequences come in as part of longer DNA fragments that often replace homologous sequences in the genome by recombination - this is our molecular drive model).  In Situation 1 the genome will accumulate strings of 6 As on one strand paired with six Ts on the other.  In Situation 2 the genome will accumulate strings of six A:T pairs in various orders.

Now we sequence the evolved genomes, collecting sets of the overrepresented sequences in each, and make logos of the sequences.  Both logos will look like the logo above.  To see the how the interaction effects in the uptake bias affected the accumulated sequences in the genome, we'd have to do an interaction analysis of the genomic sequences.

Years ago we did an interaction analysis of the genome sequences; you can see them in the last figure in this post from 2006.  It found only weak interactions, and only between adjacent or near-neighbour positions, very different from the interactions the postdoc has identified in the uptake bias.  More recently he applied his interaction analysis to the set of genomic uptake sequences, and he's now  repeating it (that's easier than digging through his notes to find what it showed).


Have bacteria evolved gene-specific rates of point mutations?

A paper just out in Nature (Martincorena et al. Evidence of non-random mutation rates suggests an evolutionary risk management strategy) concludes that E. coli genes have different mutation rates.  Genes that serve important 'housekeeping' functions mutate less often than genes that are used less often or whose functions are less important for survival.

Although such a difference in mutation rates might indeed be beneficial, since most non-neutral mutations are harmful, the result seems very improbable because we don't know of any mechanism by which the processes that cause mutations could adjust their activities according to the function of particular DNA sequences.  The authors don't know of any such mechanism either but they postulate that one must exist.

This is very reminiscent of the 'directed mutation' controversy that arose about 15 years ago, in response to work by Jim Shapiro and John Cairns showing that selection for ability to use a sugar was much more effective if the sugar was present in the environment.  That phenomenon has been shown to not be due to changes in the mutation rate (considered per base pair), but to initially unsuspected cryptic growth on the sugar and changes in the number of copies of the gene under selection.

Mutation rates are tricky to measure directly because mutations are identified by examining the phenotypes or DNA sequences of bacterial cultures many generations after the mutations would have happened.   This means that there has been plenty of time for confounding forces to also act on the mutations - we find only the mutations present in surviving cells, not all the mutations that happened.  The most important confounding force is thought to be natural selection acting on any phenotypic changes the mutations cause, but lots of other factors are known or suspected.

On first reading, I think that the authors of this paper did a good job of controlling for these factors.  But, given what we know about the processes that cause and prevent mutation, their results are so improbable that  I suspect they have missed other factors we don't know about yet.  So I predict that, like the directed mutation controversy, the long-term outcome of this work will be identification of additional confounding factors in the analysis of mutation rates rather than of a clever risk management strategy in the bacteria.

Here's a quick outline of what the authors did:  They started by comparing the genome sequences of 34 E. coli isolates; I think these were sequences available in GenBank, not ones they determined themselves.  Even very closely related bacteria like these have a lot of variation in which genes are present, so the authors first identified a set of 3420 genes, each of which was present in at least 75% of these genomes.  They then carefully compared the DNA sequences of these genes to find all the differences, which must have arisen by mutations accumulating over the many millions of years since these genes shared a common ancestor.

They then filtered out all the differences whose accumulation might have been confounded by natural selection.  First they eliminated from consideration all the differences that changed an amino acid encoded by the DNA.  Then they corrected for effects of E. coli's known codon biases, because mutations that don't change the specified amino acid may still change how efficiently that amino acid is incorporated into the specified protein.  They also corrected for suspected effects of RNA folding by trimming off the ends of the gene sequences (I'm not sure how effective this would be...).

This analysis produced estimated gene-specific mutation rates that differed by as much as ten-fold (look at the jagged line and two examples below).  The mutation rates of nearby genes were strongly correlated over distances of 10-20 kb, especially for genes that were assigned the same 'function' and the same direction of transcription; these are likely to be mostly genes in the same operon.



One factor I wanted more information about is the functional classification scheme used.  This was something I hadn't heard of - the Multifun classification for E. coli, developed by Monica Riley and M. H. Serres.  It looks good, certainly better for E. coli genes than the usual COG analysis (clusters of orthologous groups).

Another issue important for their conclusions is how they assigned functional importance to each gene. They estimated the strength of selection on each gene using the number of changes that did change the encoded amino acids (the info they had discarded in estimating mutation rates).  By this measure, genes in subsets with higher mutation rates tended to have weaker evidence of selection.  Genes in the low-mutation-rate subsets were also enriched for known to be essential for survival in lab culture in rich medium, and they were, on average, expressed as mRNA at higher levels.

The authors then examined how other confounding effects might alter the results, by examining the sequences for evidence that natural selection had acted on them, by checking the possible sizes of other confounding effects (transcription-coupled DNA repair, base composition, homologous recombination), and by using computer simulations to estimate the sizes of possible effects.  These analyses revealed only effects that would be much too small to explain the big differences in estimated mutation rates they found.

Bottom line:  This appears to be a very well done piece of work.  (The Supplementary Materials file is enormous and dense with relevant information and analyses.)  Nevertheless I'm very skeptical of their conclusion that cells have evolved a mechanism to mark important genes and protect them from mutation.  That's both because we don't know of any way cells could do this, and because I think natural selection on such 'evolvability' traits is likely to be many orders of magnitude weaker than as-yet-unidentified direct effects on mutation accumulation.



Conference social skills

(Hmm, new Blogger interface...)



I'm just back from EVO-WIBO a small conference for evolutionary biologists in the Pacific Northwest (WIBO=Washington, Idaho, BC and Oregon).  The quality of the talks and the science was very high, but a few experiences got me thinking that I should write a post about how to handle social interactions at conferences.  So here goes.

On the conference bus:  Maybe you're sitting next to someone you don't know, and maybe they're too nerdy or shy or intimidated or self-centered to start a conversation.  Don't just sit there, ignoring each other.  Say 'Hi', my name is Sandra.  I work on axolotl toenail proteins in Joe Blow's lab.  What do you do?'  Or 'Hi, I'm Sam.  What did you think of that last talk?'

At the first-night mixer:  You and a friend (or a new acquaintance) are chatting with each other, when a complete stranger walks over and stands near you, looking like maybe they'd like to join the conversation.  Don't just ignore them!  Say 'Hi, we were just chatting about the snacks.  Do you think this could be real caviar?'  Or 'Oh, sorry, we're having a bit of a private conversation.  We'll go talk in the corner where it's quieter.'

At meals:  If your conference includes meals, try to sit at a table with people you don't already know.  If you're already seated and talking with someone when another person sits down, smile and say 'Hi, we're talking about the weird last slide in Susan Smith's talk.'  Then turn a bit so they feel included in the conversation.  If more people show up, start a round of introductions.  If you're planning a free-time side trip to the swimming hole or the farmer's market, ask your lunch companions if they'd like to come along.

If everyone has to find their own lunch and you're on your own, try to strike up a post-talk conversation with someone.  You can then say 'I'm going to look for some lunch, like to join me?'  If they're on their own too, they'll appreciate the invitation. If they already have lunch plans, maybe they'll invite you along. If you're the one who already has lunch plans, consider inviting someone who might otherwise be on their own.

At your poster:  Maybe you're explaining your poster to someone, thankful that it's attracted at least a bit of interest, when a second person walks up.  Don't ignore them until the first visitor walks away!  Make eye contact, smile, say 'Hi, I'm just explaining how we collected our data.  If you can wait a minute I'll be able to talk about our goals.'  Then continue your original conversation, but make it easy for the new person to join in or ask questions.

In the question period after your talk:  Try to choose questioners who aren't Mr. Big in the field, and who aren't your friends or labmates.  Make it easy for junior researchers to be heard.

You get the picture.  One of the big reasons we come to conferences is to talk with other researchers in our field.  Do what you can to help this along.  Many of the people at conferences are junior scientists, are there for the first time, don't know anyone.  Make them feel welcome and included.  If you're one of these people, you should expect to be welcomed as a new colleague.  If someone instead treats you as an interloper, go talk to someone with better social skills.

Latest on our #arseniclife manuscript

Here's an update on the status of our #arseniclife manuscript.

We originally submitted our manuscript to Science at the end of January, and posted a copy of it on the arXiv server, asking for comments/critiques from readers.  We received a few of these, and on March 16 we received three detailed reviews from Science, and a provisional acceptance.  On April 13 we submitted the revised version, and we're waiting with fingers crossed to for final acceptance.

I've just posted the revised manuscript on arXiv, replacing the original version.  Here's the link.  We tried to incorporate suggestions from blog comments too.

Below I've pasted the text of our 'Response to Reviews' letter.  We didn't do a detailed response to the reviews because the Editor had clearly indicated the changes she thought important.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


                                                                                                                                    April 13, 2012

Dr. Caroline Ash,
Senior Editor, Science

Dear Dr. Ash,
Thank you for giving us the opportunity to improve our manuscript.  We are now submitting the revised version.
We have closely followed the suggestions in the pre-edited copy of the manuscript that you provided. We felt that the most important request of the reviewers was to directly measure the phosphate concentration in our basal AML60 medium. To this end, we conducted ICP-MS, obtaining a concentration 0.5 µM, in close agreement with our prior estimates based on cell growth. This new measurement fully supports our conclusion that the growth of GFAJ-1 in the hands of Wolfe-Simon et al. was due to residual phosphate in their putative -P conditions. When combined with shortening other text as indicated in the pre-edits, this has reduced the manuscript’s length from 2193 words to 1577 words.
We have retained a few sentences discussing explanations for the discrepancies between our results and those of Wolfe-Simon et al.  Since these discrepancies are the point of our paper we feel that possible explanations for them should be considered even when they cannot be directly tested.
We would prefer to retain our original title, as we feel that the word ‘negligible’ puts undue emphasis on the trace of arsenate present in the DNA.  Two alternative titles we would be happy with are ‘Absence of detectable arsenate in DNA from arsenate-grown GFAJ-1 cells’ and ‘No covalently bound arsenate in DNA from arsenate-grown GFAJ-1 cells’.
Changes in response to points raised by the reviewers:
The ‘-P’ and +P’ growth conditions we used are now clarified in both the Methods and the legend to Figure 1.
We now explain how cell numbers were determined.
We now explicitly say that we obtained strain GFAJ-1 from the authors of the Wolfe-Simon et al. paper.
The discrepancy in glutamate concentrations and the incorrect formulae have been corrected.
The ingredients of AML60 medium are now given in the Methods.
Reviewer 3 was concerned about our statement (in Methods) that cells were pre-grown in phosphate-limited medium containing 40 mM arsenate.  We now explain that the cells were thoroughly washed to remove the arsenate before being frozen, and that the purpose of this pregrowth was to deplete cellular reserves of phosphate and to replicate the standard growth conditions used by Wolfe-Simon et al.
All of the reviewers, but especially Reviewer 3, would have liked more information about GFAJ-1’s growth properties and metabolism.  Unfortunately, characterizing these in depth is beyond the scope of this work.  We do not know why GFAJ-1 cells need glutamate or another amino acid for growth in our AML60 medium independent of phosphate supplementation, why they reproducibly grew to a higher density in AML60 medium with 70 µM phosphate than with 250 or 1500 µM phosphate, nor why they did not grow in Wolfe-Simon’s low-phosphate AML60 medium unless arsenate was provided. 

Sincerely,

Rosemary Redfield

Final checks on a surprising competence gene (Whew!)

We now have almost all the data in place for our paper about the roles of all the genes in Haemophilus influenzae's competence regulon.  We (really the RA) created deletion mutations of all the 26 genes except ssb, which is essential; these deletions remove almost all of each gene's coding sequence.  One set of mutations contains spectinomycin cassettes inserted at the site of the deletion; these are very useful because they let us select for each mutation by the SpcR phenotype it causes.  The other set is 'unmarked', and these clean deletions are 'in-frame', preventing disruptions of translation that could interfere with expression of downstream genes in the same operon ('polarity).

For each unmarked mutant we've examined (1) its growth using the Bioscreen incubator/recorder, (2) its survival after transfer to the MIV starvation medium that induces competence, (3) its MIV-inducible ability to take up radiolabeled DNA and (4) its ability to be transformed by genetically marked chromosomal DNA.  For all but one of the genes these phenotypes are at least roughly consistent with what we expected from the phenotypes of known mutations in H. influenzae or other bacteria and from the predicted properties of the encoded proteins.

But one gene's phenotypes surprised us.  HI0659 is predicted to be a small cytoplasmic protein, and it has a predicted helix-turn-helix that would be expected to bind to DNA, probably at a specific sequence.  It's mRNA is induced about 20-fold on MIV treatment.  We expected it to either play no role in DNA uptake and transformation or to have normal uptake but reduced transformation.  But our unmarked mutant (∆HI0659) doesn't take up any detectable DNA and doesn't transform at all, which suggests that it is required either for assembly/function of the uptake machinery or for continued expression of the competence regulon after initial induction by Sxy and CRP.  That's of course very interesting, and we've thought of lots of cool experiments we could eventually do to find out how it acts.

But there's one wrinkle that needs to be cleared up before we publish this result.  The phenotype of the marked (SpcR) HI0659 mutant (∆HI0659::spc) is not the same as that of the unmarked mutant - its transformation frequency is much higher, though still substantially lower than that of wildtype cells. (I don't know if its DNA uptake has been tested.)  This is unexpected and suggests that there's a problem with the structure of either the marked or the unmarked mutation.

The structure of the unmarked mutation has already been carefully checked by PCR and it appears exactly as it should, so we suspect a problem with the marked mutation.  The RA has now created new versions of the marked mutation, and yesterday I made four of these MIV-competent and transformed them.  I'll learn the results of this test later today - if they don't transform at all we'll conclude that all is well with our mutants.

But if the new marked mutants do transform, we'll have to suspect that something is instead wrong with the unmarked mutant.  The most likely problem is that this strain accidentally acquired a mutation elsewhere in its chromosome that prevents DNA uptake.  Testing for this is a bit tricky, but here's my plan, diagrammed below).

HI0659 is in the same operon as HI0660, whose mutants both transform normally.  If the new ∆HI0659::spc mutants transform, I'm going to transform the marked HI0660 mutant (∆HI0660::spc; SpcR) with a PCR fragment containing the normal HI0660 allele and the unmarked version of HI0659.  I have frozen ∆HI0660::spc competent cells ready to use, and the RA is making the PCR fragment for me using primers she already used for another experiment.  I'll plate the transformation mix without selection, using a control transformation with chromosomal NovR DNA to confirm that the cells were competent.  Then I'll screen the colonies for loss of SpcR by picking them onto plain and Spc agar plates.  Colonies that grow on the plain plate but not on the Spc plate will be ones that have lost their spc cassette by recombination with the ∆HI0659 fragment.  I'll test these for transformability - if those that have acquired the ∆HI0659 deletion (checked by PCR) have lost the ability to transform then we can be reasonably confident that the ∆HI0659 deletion prevents transformation.  If not then something is probably wrong with the ∆HI0659 mutant.

This would be a fair amount of work, probably too much to get done before the RA goes on a few months' leave at the end of April. so I very much hope that the transformations I did yesterday give no transformants.

Later:  All the new ∆HI0659::spc mutants are nontransformable, just like the ∆HI059 mutant.  Because I actually tested deletion mutants created in two independent experiments, this means we can be extra confident that the deletion is responsible for the loss of competence.