RRResearch: 2007

Sorry, wrong link

Here's the link to my TEACHING blog: RRTeaching

Sorry for the paucity of posts

The research side of my brain has been devoured by the looming need to teach introductory biology to 450 freshmen (two sections of 225). Last year at this time I was focusing on grant proposal writing, and so I let my teaching coast on the course preparation I'd done the year before (the first year I taught this course). This year I'm trying to make up for last year's neglect, and my brain is struggling to come up with concept maps and personal response system questions and stimulating homework assignments and lecture content that better matches our new learning objectives and classroom activities suitable for large lectures and ...

But I did spend much of the last couple of days working with one of the post-docs on her manuscript about the competence phenotypes of diverse H. influenzae strains. One of the issues that came up about the Discussion is why our standard lab strain is one of the most competent, rather than being more typical of average strains.

Our initial thought was that perhaps, over more than 50 years of lab culture, descendants of the original isolate had been gradually selected for higher and higher competence in lab transformation experiments. That is, each time a transformation was done, variants that had taken up or recombined more DNA would be enriched in the plate of transformed colonies. But such transformants do not replace the original lab stock, but become new lab strains with new names and new places in the freezer. The original strain has (I think) always been maintained as a frozen stock, with individuals occasionally replacing their depleted vials with a new culture grown from descendants of a previous one. Depending on the culture history int eh intervals between thawing the parental stock and freezing a new one, these cells are likely to have been variably but unintentionally selected for improved growth in broth or on agar, of for longer survival after growth had stopped. We have no particular evidence that the ability to take up DNA would have played a significant role in this selection.

But there are other explanations for why the Rd strain is so competent. First, it was not a completely random isolate. The original H. influenzae transformation paper (Leidy and Alexander 1952?) reports testing strains of different serotypes, with Rd being the most competent. Second, if this most-competent isolate had transformed poorly, H. influenzae might not have become the first model organism for studies of competence in gram-negative bacteria.

We'll need to concisely explain this thinking in or Discussion, as a reviewer is likely to raise the issue.

Genespring progress and problems

We did fork out the $$$$ for GeneSpring to analyze our new microarray data, and the post-docs have been hard at work analyzing their E. coli Sxy arrays. It looks like E. coli Sxy and H. influenzae Sxy have overlapping but not identical effects.

It's not clear yet whether these differences are in quality or in kind. That is, using a cutoff of at least a 4-fold effect in at least 3 of the 4 replicate arrays, some genes are turned on (or off) by E. coli Sxy but not by H. influenzae Sxy. But in some cases it may just be that these genes are affected more strongly by E. coli Sxy than by H. influenzae Sxy.

The postdocs have also had to work hard to get GeneSpring set up. The promised 24/7 tech support appears to have been outsourced, with phone advisors who rely on a set of help files rather than personal experience. Some of its promised functionalities can't be made to work at all, even with the data that comes with the software. We've escalated our complaints, and results are promised, but of course not until after the holidays.

Data on E. coli protein abundances

My previous post complained that our new mass spec data was difficult to interpret, partly because it gives no information about the relative abundances of the proteins it identifies in our Sxy prep. But it occurred to me that useful data about this may be available online.

This Sxy prep was purified from a standard E. coli K-12 strain growing probably in LB + an antibiotic. So I did some Google searching for "coli protein abundance" and easily found a year-old paper in Nature Biotechnology that compares the abundances of mRNAs and the corresponding proteins for E. coli and also for yeast (full text here). This paper nicely explains the reasons why mass spec alone can't really estimate protein abundance, and then describes a new method of combining data to does this. And a supplementary file provides all of their data on E. coli protein abundance ("APEX" estimates, as molecules per cell) in an Excel spreadsheet!

[Reading this paper also taught me that I was wrong to say in the previous post that peptide composition was calculated from each peptide's molecular weight. Instead each peptide is digested and put through a second mass spec that directly detects the amino acids it is composed of.]

What can we do with this information? We want to know, among other things, whether CRP is just one of many proteins that the purification procedure used for our Sxy prep fails to completely wash away, or whether our Sxy prep contains CRP because CRP specifically interacts with Sxy and thus co-purifies with it. If the latter, we expect our prep to contain more CRP than would be predicted based on its usual abundance in cells. Thus I think we should check the APEX abundances of all the proteins identified in our sample. If CRP has a much lower APEX value than the other contaminating proteins the mass spec analysis identified, we can suspect that CRP does interact with Sxy.

Of course, lots of confounding factors are likely to affect the efficiency with which different proteins will be removed by the purification procedure. Unfortunately our 'informed' guesses about these factors are not informed by much. But I think this is still worth a try.

Progress (lack thereof?)

I keep asking myself "Why aren't I getting any science done?". The answer seems to be partly that I've been bogged down with reviewing manuscripts and reading theses, partly that preparing for next term's teaching is starting to loom large, and partly that I've been doing incremental bits of work on various fronts that don't feel as much like research as I would like.

Today I struggled to understand what we could and couldn't learn from the mass-spec analysis of a purified protein. This is a His-tagged E. coli Sxy protein, purified from E. coli cells using a nickel-affinity column. The prep has two odd features. First, in a gel it gives two bands, both close to the size of the expected Sxy-His product and both of roughly equal intensity. We want to find out what distinguishes these two proteins. Second, the prep behaves in band-shift assays as if it also contains a small amount of another protein, CRP. There's not enough CRP to detect in a gel (I forget whether we can detect it with an antibody in Western blots). We hoped the mass spec would tell us whether the prep does indeed contain enough CRP to explain the bandshift results.

Now that I have a better idea what the mass spec analysis does and doesn't do, I see that it can't give us very useful answers.

Here's what it does: All the protein in the prep is first digested into peptides by the protease trypsin. Mass spec analysis of this mixture then determines the exact (to about 7 significant figures) molecular weight of each detectable peptide. The threshold of detection is very low; I think the post-doc who best understands this told me that it's about a femtomole of peptide. Software then calculates the amino acid composition or compositions that could give this molecular weight. (Because different combinations of amino acids can sum to the same weight, several alternative compositions may be possible.)

Other software has analyzed the E. coli proteome, calculating the composition of every peptide that could be produced by trypsin digestion. This database is then compared with the observed peptides in the mass spec sample, to see which mass spec peptides could have come from which E. coli proteins. If a significant number of mass spec peptides match a particular E. coli protein, the software reports that that protein was likely present in the sample.

We have two main problems with the results from our sample. The first is because the mass spec analysis is so sensitive - it identified about 150 proteins in our sample! The second is because the report gives no indication of the relative amounts of the different peptides in the sample - we have no idea which proteins are abundant and which are present only in femtomole amounts. Sxy is one of the 150, which reassures us that we have purified the right protein. So is CRP. Finding that CRP was absent from the prep would have been very significant, because it would have meant that CRP could not be responsible for the bandshifts the prep causes, but finding that it is present doesn't advance things very much. This is largely because we get no information about how much CRP is present, relative to Sxy and to all the other proteins.

We also have some practical problems in interpreting the data. First, the results file is full of hyperlinks, but none of them work (we're 'not authorized'), so we can't tell what we would learn by clicking on them. Second, some of the peptides seem to not match the indicated protein - we don't know if there's a flaw in the software of if we're just misinterpreting the data. So more consultation with the person who does the mass spec analysis is needed.

We had been planning to cut out each of the Sxy-sized bands from a gel and run them separately through the mass spec analysis. But if each of these excised bands is even slightly contaminated with protein from the other, the mass spec will detect them both in both preps. Excising the bands will remove (or at least greatly decrease) most of the contaminating proteins, so he results should be much simpler, but I don't know how much we can learn about the identities of the proteins in these bands, especially if one or both of them differs in sequence from the predicted E. coli proteins.

Luckily the post-docs have lots of ideas for tests that don't rely on mass spec.

We're a winner!

Our lab's web site just won a Judge's Choice award in The Scientist's Laboratory Web Site and Video Awards contest! The judge said that our site "...gives us clues on how the lab sites of the future should look".

"No correlation" can be a result...

This afternoon was my turn to present at lab meeting, so I talked about the results of the uptake sequences-vs-proteomes manuscript. One of the analyses we've done compares the degree of conservation (measured by % identity of BLAST alignment) with the numbers of uptake sequences. I had originally thought this was going to show a strong negative correlation (higher % identity = fewer uptake sequences), consistent with the general pattern that uptake sequences preferentially accumulate in genes lacking strong functional constraint.

But when I saw the graph of the final data I was disappointed, because the sets of genes with no uptake sequences had only slightly higher mean % identities than the sets of genes with several uptake sequences. We haven't done any statistics on these means yet, but it looked like the correlation was weak at best. So I was considering just leaving this analysis out of the manuscript. But the post-doc suggested instead keeping it in, and describing the lack of correlation as an interesting result. That seems like a good idea (though first we need to do the stats - I don't have the raw data so I've emailed my collaborator).

The same post-doc also reminded me of an analysis I did last summer (link to post). I don't think this result should go in this manuscript, as it has nothing to do with proteomes. But it might fit nicely in the reworked Gibbs-analysis manuscript.

A Correspondence Arising for Nature

Today I submitted our Correspondence Arising on the Diggle et al. paper I posted about a couple of weeks ago. The delay was because Nature asks authors of such submissions to first send them to the authors of the paper in question, and to include the resulting correspondence (i.e. the emails) with the submission. By requiring this step Nature makes sure that there is a genuine and serious issue being raised by the Correspondence, not just a confusion that can be quickly cleared up.

In our case the authors replied promptly, but their response didn't make the problem go away. Instead it confirmed that we had correctly interpreted their descriptions of what they had done, and that they agreed with us on the immediate causes of the results they had observed. Most importantly, it confirmed that we strongly disagree about the significance of the results.

Here's hoping that Nature thinks this issue sufficiently important to publish. If they do, they will contact the authors directly to solicit a formal response to our submission, and will then publish our submission and any response online (but not in the print version). If they don't I expect we'll hear from them within a few days.

Subscription-supported journals are like the qwerty keyboard

Tomorrow afternoon I'm participating with several other faculty in a panel on open access/scholarly communication. It's being organized by our research librarians, who hope this will help them make the best use of their meager resources. I have 10-15 minutes to talk, as do the others, and then we'll be 'participating in discussion groups about these topics with other faculty/librarians'. My theme will be "Why subscription-supported journals are like the qwerty keyboard."

As you probably know, the arrangement of letters on the 'qwerty' keyboard that all our computers come with is far from optimal for efficient typing. The original mechanical typewriters had the keys arranged alphabetically. But this caused levers to jam up if their letters were typed in rapid succession, so a key arrangement was devised that interspersed the commonly used letters with uncommon letters, and split up commonly-used sequences of letters. This was a good solution: although it slowed down the speed at which a skilled typist could hit the keys, it eliminated the time they would otherwise have to spend unjamming the levers. You can read all about this on Wikipedia.

The jammed-levers problem became no longer an issue with the invention of roller-ball typewriters such as the IBM Selectric, but by then the qwerty keyboard had become standard and there was no market for a more-optimal keyboard. Now everyone uses computers - these of course have no levers to jam, and can quite easily be switched to, for example, the Dvorak simplified keyboard.

But switching the users is a lot harder. We're used to doing our typing the hard way, and unlearning one keyboard and learning another seems so daunting that very few of us ever even try.

Using reader subscriptions to support the cost of scientific publishing is a lot like the qwerty keyboard. The first scientists disseminated their results by sending letters to their colleagues. The cost of disseminating the research (paper, ink and postage) was seen as part of the cost of doing the research.

Later the desire to reach more readers, and to reach readers not known to the author, led to the first scientific journals, published by scientific societies or for-profit publishers and supported by subscription fees paid by the readers. (The formal peer-review component was added later.) A large part of the cost of publishing a journal was physical, and required specialized facilities that only a professional publisher could afford. Because the cost of producing and mailing a paper copy for each subscriber was rightly borne by the person or institution receiving it, it made sense that they should also bear the cost of the editorial process.

As subscription costs rose, university libraries spent more and more of their budgets on journal subscriptions. If a journal's readership was large enough, some of the cost could be paid by advertisers, but the more specialized journals had to cover their full costs from subscriptions. As the publication costs got higher, some journals, especially those that wanted to remain independent of advertisers, introduced 'page charges' to the authors. As subscription fees rose higher and higher, fewer and fewer people could afford them, so publishers began charging individuals much less than the supposedly deep-pocketed institutional libraries. Publisher profits got higher and higher, because there was no competition to hold them in check.

Like the qwerty keyboard, subscription-supported scientific publishing was a solution to a technical problem that no longer exists - how to distribute research to an audience. Now that journals can be published online, the costs of producing and mailing paper copies are gone, and there is no need for massive printing presses. In principle we should be able to go back to the original state, where the dissemination costs are considered part of the cost of doing the research, rather than a price the reader pays for the privilege of access. Instead of paper, ink and postage, these costs are now those of administering peer review, copy editing, and web-site maintenance. But the principle is the same.

But we're tied down by history. Our reputations depend on rankings of the journals we can get our papers into, so we're very reluctant to shift to new ones of dubious reputation. The cost of journal subscriptions (now often electronic rather than paper) is entrenched in university budgets, and we don't want to spend our tight research funds on publication charges just so people we've never met can read our papers.

Are there solutions? One reason for optimism is that changing how we pay the costs of disseminating research is not an all-or-nothing change like switching from qwerty to Dvorak keyboards. Some new open-access journals are very prestigious. Granting agencies are giving strong 'in-principle' support to open access publishing, and my last grant proposal's budget included a hefty amount for open-access publication charges. And libraries are looking for ways to escape the burden of subscription charges.

Why take the risk of writing a research blog?

Dave Ng at The World's Fair (part of the Science Blogs group) has written a post about our research blogs, and Boing Boing has picked it up. So this is a good time to try to answer the obvious question of why we do this. Several comments on Dave's post asks why we take the risk of being scooped. To quote one

"... isn't there a massive chance of one of her lab members getting scooped to a paper because they aired unpublished results to the world?"

This is the big fear that seems to stop researchers from even considering blogging about their work. But for most labs the risk is not very high, and there are benefits for everyone.

Benefits first. I'm a bit of an idealist about science - I think cooperation is more powerful than competition. NIH thinks so too - If you call them with a new research idea, they don't warn you to keep it under your hat because others are working on similar stuff. Rather they try to put you in touch with these people to encourage collaboration. Blogging about our ongoing research doesn't only actively promote interaction with other researchers, it helps me remember that science should be a community activity.

I also think the risks are overestimated. Although one dramatic scientific stereotype is of research groups competing for glory, in reality very few of us are engaged in fierce competition with groups trying to use the same methods to answer the same questions. If you are in such a competition, blogging about your research might not be a good idea. On the other hand, thinking about blogging might cause you to consider ways to could reduce the competition and promote collaboration instead.

Getting GeneSpring?

The post-docs have generated a lot of E. coli microarray data, so we need to reactivate our long-expired license to the GeneSpring software we use for array analysis. Unfortunately the rep won't return our calls.

GeneSpring has been bought out by Agilent. In the US a one-year license costs about $3300. But that's not the problem. In Canada it costs over $4000, even though our dollars are now at par because the US dollar has fallen against everything! The helpful GeneSpring/Aligent rep in the US tells us that we're forbidden to buy it directly from the US at the US price. But the Canadian rep won't return our calls or emails.

We could: 1. Buy it online through the US web site, paying the outrageously inflated Canadian price; 2. Wait for the Canadian rep to reply, hoping to be able to negotiate a better price; 3. Call Agilent in the US and complain (to someone higher than the nice rep) about the Canadian rep and price.

I think I'll start with 3 because it will make me feel less helpless, and then move on to 1.

Results on the Results

I spent yesterday continuing to sort out the Results section of our paper about how uptake sequences affect proteomes.

Because we've changed the order of topics several times, each time renumbering the new versions of the figures, the data files and figure files are a big mess. For example, data for one figure is in files variously named "Fig. 3", Fig. 5, "Fig. 6". "altFig5" ... you get the picture. The additional complication that I and my collaborator are on different sides of the continent has been mitigated by having a Google-Groups page where we have posted the recent files, albeit under a variety of names and figure-number attributions.

But now I have the Results in what I hope will be their final order. To keep the files straight I've created a folder for each section (Results-A, Results-B, etc) and put the associated data and figure files into it. (Previously I just had one folder for data and another for figures.) I'm hoping that this will let us keep the files together even if we do change the order of the sections.

Today it's checking over the Methods section (written by my collaborator - so should be fine) and the as-yet almost nonexistent Discussion (needs to be written by me).

Back to the USS manuscripts

I'm finally back to working on papers about uptake sequence evolution. Right now its the analysis of evolutionary interactions between each genome's uptake sequences and its proteome.

While I've been neglecting the manuscript my bioinformatics collaborator has been generating the final data and, I now discover, suggesting a different and more logical way to order the results. So I'm shuffling the sections around, rewriting the text that links them together and explains why we did each analysis. Well, that's not exactly true. Any scientist will admit that their papers don't always honestly explain the actual reasons why each experiment or analysis was done. That's because scientists often do good experiments for not-very-good reasons, and only later discover the logical thread that links our results together.

And sometimes, like now, we initially don't think to do experiments or analyses, only later realizing the contribution they will make to understanding or explaining other results. The reorganizing I've just done suggested two simple correlations I might look for, which might provide context for interpreting the result I had in mind. So I entered some of my collaborator's data on the tripeptides that uptake sequences specify into a new Excel file, plotted a couple of simple graphs, and presto, new results!

These aren't very important results in themselves. The relative frequencies of tripeptides specified by uptake sequences do correlate modestly (R2 = 0.54) with the total frequencies of those tripeptides in their proteomes. And the proportion of tripeptides usable by uptake sequences but not used correlates even more modestly (R2 - 0.4) with the tripeptides frequencies in their proteomes. But they provide a context for other results that makes them easier to understand.

and coming up with a couple of new simple analyses we had overlooked.

What people come here to read

I've just discovered that last year's post about the relative safety of the DNA stain ethidium bromide is now the #3 hit for Google searches on "ethidium bromide". Word must be getting around.

Data from the quorum sensing paper supports diffusion sensing

With the colleague who brought the flawed quorum sensing paper to my attention, I'm writing a 'Commentary' letter to Nature pointing out the paper's glaring flaws. Crafting this letter has been scientifically instructive in two ways.

First, the colleague is a biomathematician who uses bacterial cultures as models of evolutionary and ecological processes. He wasn't familiar with my diffusion sensing hypothesis, but once he read the paper he realized that work he'd been doing about communities containing exploiters (the Snowdrift model) nicely applied to this system. So he's been educating me about his model.

Second, I realized that the flawed paper provides excellent data supporting some assumptions of the diffusion sensing hypothesis. Although the paper's interpretation and conclusions are flawed, the experiments themselves look to have been carefully done and to have produced solid data. In particular, they determined the final cell densities of pure cultures of bacteria growing in rich medium, where protease production is not beneficial.

The diffusion sensing hypothesis assumes (sensibly) that synthesis and secretion of such effector molecules as proteases, antibiotics and siderophores is expensive, but that production and secretion of the autoinducers that regulate the effectors is cheap.

In one of the paper's experiments, cultures of protease-producing cells grew to about 35% lower density than cultures of non-producers. This was very nicely controlled by including cells that did not produce the autoinducer signal that activates protease production. Without activator these cells grew as well as cells that did not produce protease because they couldn't recognize the autoinducer signal, but when activator was provided externally they grew as poorly as wildtype cells.

This tells us two things. First, production of the protease is indeed quite costly - a 35% difference in final cell density means that natural selection will strongly favour cells that don't secrete proteases when they're not needed (confirmed by the paper's competition experiments). Second and more important, secretion of the autoinducer is very cheap. The final cell densities of the cells that didn't produce autoinducer and of the cells that produced it but couldn't respond to it were identical (within the resolution of the figure), so the cost of production must be very low. The cost is unlikely to be zero - this could be tested by competition experiments between the two strains.

I've never posted about quorum sensing?

About 5 years ago I wrote an opinion piece suggesting that the widespread phenomenon of bacterial quorum sensing had been misinterpreted.

Everyone had been assuming that bacteria secrete small autoinducer molecules and detect the concentration of these because this lets them estimate population density and thus predict the utility of investing in cooperative behaviour that benefits the whole population. Such behaviour is costly to individuals, and in populations containing such cooperators, individuals will do better by cheating (sitting back and letting others do the cooperative work). The difficulty of explaining how cooperation could evolve in the presence of cheaters is a serious problem for this 'quorum sensing' hypothesis. But most microbiologists have at best a very superficial understanding of evolution, and the appealing assumption that bacteria use autoinducers to talk to each other and act cooperatively spread like wildfire.

My radical suggestion (Redfield 2002, Trends in Microbiology 10: 365-370) was that bacteria instead secrete and detect autoinducers as a method of determining whether the benefits of secreting more expensive effector molecules will limited by diffusion and mixing. For example, it would be a waste of resources to secrete degradative enzymes such as proteases if they are going to immediately wash away. This 'diffusion sensing' proposal was welcomed by at least some evolutionary biologists but resolutely ignored by almost all microbiologists working on quorum sensing, I suspect because it removed the glamour from their research.

A paper on the evolution of quorum sensing has just appeared in Nature (Diggle et al. 2007 Nature 450:411-414). The authors evaluated the above-mentioned cheating problem using laboratory cultures, and then showed that cheaters do not prosper if they are prevented from associating with cooperators. This is hardly surprising. Here's what they did:

They used the bacterium Pseudomonas aeruginosa, which uses autoinducer secretion to induce production of a protease (among other things). They grew these bacteria in a culture medium whose major nutrient was a protein called BSA. Bacterial cells can't take up intact protein, but wildtype P. aeruginosa can grow well on the amino acids released when the protein is degraded by the protease it secretes.

1. They showed that mutant cells unable to produce the protease grew poorly in BSA medium, reaching only about 1/3 the cell density of wildtype cells after 24 hr. These mutants had normal protease genes but either could not produce or could not detect the autoinducer signal, so they could not turn this gene on. Their ability to grow at all in this medium was because they still could produce other proteases.

2. They showed that the mutants could grow fine if either autoinducer or protease were added to the medium. This confirmed that their poor growth was because of their regulatory defects, not some other factor.

3. They showed that the mutants grew fine on rich medium containing lots of amino acids where the protease was not needed. In fact the mutants grew better than the wildtype cells, which the authors logically interpret as being because they did not waste resources producing a protease they had no use for. They confirmed this by adding autoinducer to the cells that couldn't make their own - as predicted, this caused them to now grow poorly, presumably because they were now producing the expensive protease.

4. They then showed that the mutants acted as 'cheaters' when they were mixed with wildtype cells and grown in the BSA medium. Cultures that began with only 1-3% mutants had >40% mutants after 48 hr growth. The mutants outgrew the wildtype cells because they didn't expend resources making the protease. They got all the amino acids they needed because the wildtype cells around them produced protease. They thus prospered at the expense of the wildtype 'cooperators'. This is not at all surprising given the also-unsurprising results in points 1-3 above. The mutants never took over the cultures, because their advantage depended on the cultures also containing lots of wildtype cells.

The authors then said:

"Our results show that quorum sensing is a social trait, susceptible to exploitation and invasion by cheats. Given this, how is quorum sensing maintained in natural populations?"

This is rather sneaky. The authors have just nicely shown that, in mixed laboratory cultures optimized to make growth dependent on protease secretion, cells that use protease secreted by other cells are favoured. So it's probably legitimate to describe protease production under these conditions as a social trait. But they jump over the question of whether natural populations grow under comparable conditions. Instead they just allow us assume this, and go on to propose the explanation they want to test:

"The most likely explanation is kin selection - if neighbouring cells are close relatives they will have a shared interest in communicating honestly and cooperating."

What? No mention of the by-far more likely explanation that autoinducers exist mainly for a cell-autonomous function (diffusion sensing) that is not subject to cheating? This is the first big problem with this paper.

The other big problem is that the experiment that they claim tests kin selection is really just a repeat of the growth experiments they did in point 1 above. Here's a diagram of the experiment as it is described in their Methods section.

Treatment A (on the left) is a 6-cycle repeat of the mixed culture experiment described in point 4 above. In each step 12 tubes of mixed culture (only 6 are shown) were pooled together after 48 hr growth. This mixture was then inoculated into fresh tubes for another 48 hr growth. So the mutant cells (potential cheaters) were always growing with wildtype cells that produced the protease the mutants needed. It's not surprising that, at the end of 6 cycles of this mixed growth the culture still contained about 35% mutants.

Treatment B is claimed to provide conditions that allow kin selection. But after the first cycle it's really just a series of pure-wildtype or pure-mutant cultures like those described in point 1 above. This is because, after the overnight cultures were pooled, the cells were grown into single colonies on an agar plate, and a different colony was used to inoculate each new tube for overnight growth. So each of these tubes contained a pure clone of either a mutant or wildtype cell. As the cultures in point 1 showed, the mutant cells grow poorly when they have no wildtype cells to provide protease. It's not surprising that, at the end of 6 cycles, all of the 12 cultures are wildtype. I did the calculations and this is exactly the result predicted by the differences in growth seen in point 1.

The experiments are competently done and the data looks very solid. But to have gotten this into Nature the authors must both be masters of spin and have had very inept reviewers.

So why am I blogging about this? Partly because the authors are pushing an unjustified conclusion, and partly because I'm very annoyed that they completely ignored the point of my 2002 paper, even though the senior author and I have discussed it in some detail when he visited here last year (he gave no indication then that he thought I was wrong). Worse, the authors* do cite this paper, but for something it doesn't actually contain (evidence that cheaters can invade quorum-sensing populations). I suspect that a not-totally-inept reviewer told them that they should cite me, and they avoiding having to discuss the diffusion-sensing explanation by citing me for something I didn't do.

*One of the authors has emailed me expressing dismay over the harsher term I originally used, so I've changed it.

Shifting our perspective

Oh no, has it been a whole week since I last posted?

Among other things I've been working with one of the post-docs on her manuscript about the amount of variation in competence and transformability between different isolates of Haemophilus influenzae. Today we progressed to thinking about what we should say in the Discussion section, specifically about how selection on competence genes might have changed since the common ancestor of these strains.

(I'll use 'strains' interchangeably with 'isolates' in this post. In so doing I'm implicitly (here explicitly) assuming that the properties of the human-dwelling H. influenzae cell that gave rise to the original lab colony have not been changed by whatever laboratory propagation its descendants might have experienced.)

To discuss this variation we have to change how we've been thinking about variation. The data the post-doc has generated tell us about the ability of 34 present-day strains to take up DNA and recombine it into their chromosomes. To discuss the data's evolutionary implications we need to integrate it into the (unknown) history of these strains and of the species.

We don't know anything directly about the common ancestor of these strains, or of all the bacteria we call H. influenzae. But maybe we can start by making some inferences from a large published survey of the genetic variation in H. influenzae strains, and from the published genome sequences of some strains.

The large survey was a 'MLST' study, in which the same 7 genes were sequenced in each of more than 700 strains (Meats et al. 2003). I don't remember whether the authors were able to draw any specific conclusions about evolutionary history, but if they did we should certainly consider whether they can be applied to our analysis.

About 12 H. influenzae genomes have been sequenced (and the sequences are 'available'), but only a few of them have been analyzed in any detail. Much of the sequencing work is being done in the context of an explicit evolutionary hypothesis - that H. influenzae and other bacterial pathogens are best described as having a 'distributed genome'. This is Garth Ehrlich's idea; here's how one of his papers explains it:

The distributed genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium, and that these pathogens utilize genetic recombination and a large, non-core set of genes as a means of diversity generation.

Well, that's certainly very relevant to our analysis of the distribution of transformability! Now we just need to clarify, first for ourselves and then for potential readers of our manuscript, how having a diversity of competence and transformation phenotypes fits into this.

The Redfield Factor

The Redfield Factor: The number of kilobase pairs in a gram of DNA: 10^18.

The Inverse Redfield Factor: The weight in grams of 1000 base pairs of DNA: 10^-18.

(Inspired by The World's Fair)

Where are they now? (competence proteins in Bacillus)

Dave Dubnau's group has been doing excellent work on the molecular biology of competence and DNA uptake in Bacillus subtilis. Their latest paper (Naomi Kramer, Jeanette Hahn and David Dubnau, 2007. Mol. Micro. 65:454-464) looks at the subcellular locations of induced proteins in competent cells.

They find that many proteins co-localize at the poles of the cells, and that DNA is taken up preferably at the poles. Absence of one protein due to mutation pcauses perturbations in other proteins, confirming that they interact in some way. They interpret these interactions as forming a complex that exists to promote recombination between incoming DNA and the cell's chromosome.

However there's another interpretation. Another recent paper on B. subtilis competence showed time-lapse movies of cells developing and then losing competence (An excitable gene regulatory circuit induces transient cellular differentiation. Süel et al.Nature 440, 545-550; see supplementary movies 1 & 2). These show that competent cells form filaments - such suppression of cell division is typical of cells whose DNA replication has been arrested. Only on seeing these movies did I realize that B. subtilis cells arrest DNA replication when they become competent, although this has been known for a long time (papers published ca. 1970).

How does this replication arrest work (what's cause and what's effect)? It seems to be caused by accumulation of the competence transcription factor ComK, and released when MecA causes ComK to be degraded, but I don't know how ComK arrests replication. Possibilities come from looking at the lists of genes that are induced by ComK (Berka et al. 2002 Mol. Micro 43:1331-1345). There are a lot,a nd this paper (also from Dubnau's group) concludes that competence is only one aspect of a complex stress response. Here's the last apragraph of their Introduction:

Unexpectedly, we have found that the expression of at least 165 genes is upregulated in the presence of ComK. These include open reading frames (ORFs) that were previously shown to be ComK dependent as well as many for which there was no prior evidence of ComK control. In several cases, validation of the microarray data was achieved through the use of promoter fusions. This profound alteration in the expression programme involves many genes that appear to have no role in transformation. We propose that competence as usually defined is but one feature of a differentiated, growth-arrested state, which we propose to call the K-state.

And here are the last paragraphs from their Discussion:

It is certain that many (probably most) of the newly identified ComK-dependent genes are not required for competence, originally defined as receptivity to transformation (Lerman and Tolmach, 1957), nor for the recombination and recovery steps that follow DNA uptake. We have demonstrated that this is the case for pta and oxdC/yvrL, and it is certainly true for many of the 29 intermediary metabolism genes, the six sporulation genes and many of the newly identified ComK-dependent transcriptional regulators. It is therefore no longer appropriate to refer to the ComK-determined physiological state as 'competence', as more is involved than transformability. We propose to refer to this instead as the 'K-state', a neutral term with no functional connotation.

The cell shape and cell division genes are of particular interest, as the K-state is associated with inhibition of cell elongation and division (Hahn et al., 1995; Haijema et al., 2001). The competence gene comGA plays a role in the inhibition of these two processes. One ComK-dependent gene cluster (Fig. 2) includes the genes for Maf (an inhibitor of cell division that has also been implicated in DNA repair; Butler et al., 1993; Minasov et al., 2000), MreB, MreC, MreD (shape determining factors; Jones et al., 2001), MinC and D (inhibitors of cell division; Levin et al., 1992) and RadC, a probable DNA repair protein. In addition to this cluster, mbl, tuaF, tuaG, cwlH and cwlJ are activated. Mbl plays a role in cell shape determination (Henriques et al., 1998; Jones et al., 2001), TuaG and F are required for the synthesis of teichuronic acid (Soldo et al., 1999), and CwlH and J are cell wall hydrolases. It appears that the K-state is accompanied by a reprogramming of cell shape, cell division and cell wall synthesis genes.

A minority of the cells in a given population reach the K-state, and these cells are arrested in cell division and growth (Haijema et al., 2001). The reversal of this growth inhibition requires at least the degradation of ComK (Turgay et al., 1998). In this sense, the K-state appears to be in some respects a resting state and is associated with the induction of a number of genes (exoA, radC, recA, ssb, topA, maf and dinB) that are likely to be involved in DNA repair. The arrest of cell division and growth may be an advantage if the K-state has evolved in part to deal with DNA damage, or if DNA repair is required after transformation, as growth in the presence of DNA lesions may be detrimental. In E. coli, the SulA protein is induced as part of the SOS regulon and inhibits cell division (Bi and Lutkenhaus, 1993), presumably until DNA damage has been repaired. Several genes that are activated in the K-state might facilitate the assimilation of novel nutritional sources. These include malL, sucD, yoxD and ycgS and the putative transport genes pbuX, yckA, yckB, ycbN, ywfF, yvrO, yvrN, yvrM, yqiX, ywoG and yvrP. Several of these transport proteins, in particular ywfF and ywoG, might function instead as detoxifying efflux pumps. In this connection, it is worth mentioning some additional ComK-dependent genes. oxdC encodes an acid-induced oxalate dehydrogenase (Tanner and Bornemann, 2000), which has been suggested to play a role in pH homeostasis in response to acid stress. hxlA and hxlB encode enzymes of the ribulose monophosphate pathway (Yasueda et al., 1999), and hxlR encodes a positive activator of their expression. It was suggested that this pathway is involved in the detoxification of formaldehyde. ComK induces all three of these genes. Additional stress response genes that are apparently upregulated in response to ComK include groES and possibly yqxD. Finally, two genes are likely to be involved in the synthesis of antibiotics (sboA and cypC; Hosono and Suzuki, 1983; Matsunaga et al., 1999), which may serve to eliminate competitors.

In conclusion, we propose that the K-state is a global adaptation to stress, distinct from sporulation, which enables the cell to repair DNA damage, to acquire new fitness-enhancing genes by transformation, to use novel substrates (possibly including DNA; Redfield, 1993; Finkel and Kolter, 2001) and to detoxify environmental poisons. This view of the K-state suggests a reason for its expression in only a fraction of the cells in a given population. The K-state represents a specialized strategy for dealing with danger, but also carries with it inherent risks. Transient arrest of growth and cell division confer vulnerability to overgrowth by competing populations, and transformability opens the cell to invasion by foreign DNA. The genome may therefore activate alternative systems in subpopulations to deal with adversity, and the K-state may be one such system. This strategy maximizes the probability that the genome will survive when faced with changing environments, a valuable capability for a soil-dwelling organism.

Arresting DNA replication is a fairly desperate measure, and I'd like to know what makes the K-state worth the risk. Despite the statements above about competence being only one of the K-state's functions, Dubnau's group seems to have slipped back into assuming it's the only function. Here's a paragraph from the Discussion of a more recent paper on how the K-state is triggered and maintained or lost (Maamar & Dubnau 2005, 56:615-624). It assumes that competence is the function of the K-state and that transformation is the function of competence. It then constructs an evolutionary just-so story to explain why only small fractions of the cells in a lab culture enter this state:

It has been known for many years (Nester and Stocker, 1963; Hadden and Nester, 1968; Haseltine-Cahn and Fox, 1968) that competence in the domesticated laboratory strains of B. subtilis, is expressed in 10–20% of the cells in a given culture (Fig. 2B). In natural isolates of B. subtilis, the fraction of cells expressing competence is markedly lower than this, presumably because these strains have not been artificially selected for high transformability. In one such isolate, only about 1% of the cells express a comK–gfp fusion, but in these rare cells, expression is at a high level (J. Hahn, H. Maamar and D. Dubnau, unpubl.) This dramatic example of population heterogeneity may have evolved so that few cells in a clone will commit to a particular fitness-enhancing strategy. As the prolonged semidormancy that accompanies the K-state (Haijema et al., 2001) poses a potential challenge to survival, this strategy serves to minimize risks to the genotype. If, on the other hand, the few cells expressing the K-state happen to enjoy an advantage, the chances that the genotype will survive will be enhanced. Presumably the heterogeneity mechanism has evolved to maximize the benefit-to-risk ratio. There may be many examples of population heterogeneity selected by evolution in single celled organisms (see for instance Balaban et al., 2004), and an understanding of the mechanisms that regulate this heterogeneity would be of general interest.

rec-2 weirdness

The protein encoded by the H. influenzae rec-2 locus is needed to transport DNA across the cell's inner membrane, from the periplasm to the cytoplasm. Rec-2 homologs play similar roles in other gram-negative bacteria and in gram-positive bacteria, where they are need for transport across the homologous cytoplasmic membrane. However a H. influenzae rec-2 knockout also has unexpected pleiotropic effects; before considering the old evidence for these we need to consider the bizarre history of this mutation. The original isolation is reported in a 1971 paper by Ken Beattie and Jane Setlow (Nature New Biology 231:177-179)

Setlow had originally isolated the rec-1 mutant of H. influenzae. This strain has a mutation in the recA homolog; it takes up DNA normally but is unable to recombine it into its chromosome. Like other recA mutants it is also very sensitive to DNA damage. I think the mutant was isolated after nitrosoguanidine treatment of Rd cells and was identified by its sensitivity to UV irradiation; its strain name is DB117.

She wanted to find other mutants unable to transform. Because she had recently discovered that many H. influenzae cells died after taking up DNA from the related H. parainfluenzae, she decided to use this as a way to select for cells that couldn't take up DNA. So she and Ken Beattie repeatedly gave competent H. influenzae DNA from H. parainfluenzae, isolated the survivors, made them competent and gave them more of the same toxic DNA. (She later discovered that the 'toxicity' arises because heterologous DNA induces the SOS response to DNA damage, which induces a resident prophage that kills the cells.) However, even 20 repetitions of this treatment produced a population of cells with only a slight transformation defect.

So (I don't know why) they tried again, this time pretreating the cells by exposing them once to DNA from her rec-1 mutant, followed by 20 cycles of exposure to H. parainfluenzae DNA. Surprisingly (to me), this treatment gave populations with 1000-fold reductions in transformation. The pretreatment with rec-1 DNA was almost as effective as mutagenesis with nitrosoguanidine to 60% survival.

They then tested single colony isolates from these populations for DNA uptake and transformation. Almost all of them did not take up detectable amounts of DNA, but a few took up as much DNA as normal cells but did not produce any transformants. None of the 8 mutants isolated from the population treated with rec-1 mutant DNA had the DNA repair defects of the rec-1 mutant. (Note that these mutants are very likely to be multiple descendants of a single original mutant.)

Because of this they erroneously but only temporarily concluded that the rec-1 mutant's two defects (in transformational recombination and in DNA repair) were due to different mutations. They seem to have also invoked another mutation, mex, causing sensitivity to the DNA-damaging chemical MMS. They suggested that the new mutant (one of the 8) had this mex mutation as well as another mutation that prevented recombination, perhaps acquired from the H. parainfluenzae DNA it had been repeatedly exposed to. They initially called this mutant Rd(DB117)^rec but later simply called it rec-2.

They and others did a lot of work on the phenotype of this rec-2 mutant. They found that the mutant was a bit sensitive to MMS (attributed to the somewhat-hypothetical mex mutation). It took up DNA into a state that was resistant to externally added DNaseI and to the restriction enzymes known to be in the cytoplasm. This state was originally thought to be inside the vesicles then called 'transformasomes' (see this post about these) but we're now pretty sure it's just the periplasm. Making the mutant competent for DNA uptake across the outer membrane did not increase the ability of the cells to support phage recombination (see this post) as it did for wildtype cells. Competent cells of the rec-2 strain did not develop the single-strand DNA gaps detected in wildtype cells.

However interpretations were always confounded by uncertainty about its genotype. Did it carry a mex mutation (whatever that might be)? Did it contain any other DNA from the rec-1 strain? Did it contain any segments of H. parainfluenzae DNA? Did it have any loss-of-function mutations?

In 1989 Dave McCarthy tried to sort this mess out (McCarthy Gene 75:135-143). He isolated a transformation-preventing miniTn10kan insertion into a H. influenzae gene, and showed that a plasmid carrying the wild-type version of this gene restored transformability to Setlow's rec-2 mutant. By probing Southern blots with the cloned gene he showed that Setlow's rec-2 mutant contains a large rearrangement (later identified as a ~80kb insertion) in this gene. I'll call his rec-2:: miniTn10kan mutant rec-2*. This mutant had the same DNA uptake defect as Setlow's mutant.

With Doris Kupfer he then characterized the phenotype of the rec-2* rec-2, it took up DNA but could not mutant. Like Setlow'stranslocate it across the inner membrane. It was also just as defective in phage recombination, and examination of its DNA by electron microscopy showed that competence induction did not cause the increase in single-strand gaps or tails seen in wildtype cells.

The DNA translocation defect is consistent with the phenotypes of rec-2 homolog mutants in other bacteria. But the phage recombination and single-strand gap differences make no sense to me.

Do competent cells have weird DNA?

One of the postdocs suggested we work our way through the old H. influenzae competence literature, so we've been meeting more-or-less weekly to do that. We pick a time interval (e.g. 1975-79) and decide which papers look like they deserve serious attention.

Last time we considered two papers from Jane Setlow's lab, one reporting that DNA of competent cells contains single-stranded regions, and one analyzing single-stranded regions that appear in the DNA such cells take up. This reminded me of a more recent paper from David McCarthy (1987), and of some experiments I did. Here I'll psot about the competent-cell DNA issue. Later I'll post about the strandedness of incoming DNA,a nd about the weird history and behaviour of rec-2 mutants.

The Setlow research was done in the mid-1970s. This was before agarose gels came into general use, and they analyzed the sizes of DNA fragments using sedimentation in sucrose density gradients (big fragments are rapidly pushed to the bottom while small fragments move only partway down the gradient). In the first paper they used gradients containing NaOH to separate the strands of the DNA, and pulse-chase labeling with 3H to identify newly synthesized segments. They found that newly synthesized DNA from competent cells contained many more short single-stranded segments than DNA from log-phase cells. They also used columns of BND-cellulose, which DNA with single-stranded regions should stick to. More competent-cell DNA stuck than log-phase cell DNA. The confirmed that this DNA was enriched for single-stranded regions by digesting it with the nuclease SI, which preferentially cuts single-stranded segments. And they used CsCl 'isopycnic' density gradients to confirm that the strands were indeed newly synthesized.

But there are good reasons why sucrose gradients were discarded once agarose gels became available. These experiments have very poor resolution and lots of artefacts, and it's hard for me to understand what they showed. The very existence of newly synthesized strands in competent-cell DNA may be an artefact...

The McCarthy paper used a different technique, electron microscopy, to directly compare the structures of DNA from log-phase and competent cells. They found DNA from competent cells to contain more single-stranded regions and single-stranded tails. They also used cross-linking of DNA to prevent branch migration. So their results confirmed Setlow's interpretation of the sucrose gradient results.

Although we might expect cells to develop some aberrant DNA structures after having been abruptly transferred from a rich replication-supporting medium to a starvation medium lacking DNA precursors, the presence of gaps and tails is a bit surprising. At that time I was a post-doc in Ham Smith's lab and had been doing a lot of work with pulsed-field agarose gels that nicely resolved very large fragments of chromosomal DNA. So I devised some experiments to look for evidence of these gaps and tails.

The first experiments used a nuclease from mung beans. This is like S1 nuclease in preferentially cutting single-stranded DNA, but is less likely to also cut double-stranded DNA. These experiments showed two things. First, DNA from competent cells is no more sensitive to mung bean nuclease than is DNA from log phase cells. Second, DNA from competent cells is much more sensitive than log-phase DNA to being heated in the presence of the buffer used for the nuclease digestions. This buffer contains zinc and is at pH 5; I had been heating the DNA preps before loading them into the gels because the DNA had been prepared from cells embedded in low-melting-point agarose to protect it from shearing. The DNAs were all fine if I had simply put slices of this DNA-in-agarose into the wells of the pulsed-field gels, but if I had instead melted the slices at 65C and pipetted them into the wells the competent-cell DNA bands looked very faint. This effect was independent of any muclease treatment.

My second experiments used a DNA polymerase to find out whether DNAs from log-phase and competent cells had different numbers of gaps and tails. The 'Klenow' fragment of DNA polymerase can replicate single-stranded DNA in vitro, provided that adjacent double-stranded DNA has a 3' end that can serve as a primer. So chromosomal DNA with a single-stranded gap is a perfect template. I incubated my log-phase and competent cell DNAs with Klenow polymerase and precursor nucleotides (including 32P-dGTP), and ran them in a pulsed-field gel. The two DNAs incorporated the same amount of radioactivity and gave the same patterns in the gel. As a control I used DNA from cells that had been treated with chloramphenicol - this protein-synthesis inhibitor lets cells complete any DNA replication they have initiated but blocks initiation of new rounds of replication. This DNA incorporated only about 1/3 as much radioactivity.

So neither experiment provided any support for the presence of frequent single-strand gaps or tails in the DNA of competent cells. The sensitivity to the nuclease buffer did indicate there there is something funny about this DNA. I was left wondering whether the DNA might have incorporated ribonucleotides or other non-standard subunits. Because ribonucleotides are sensitive to alkali, segments containing them would become gaps if the DNA was denatured with NaOH. But this wouldn't explain McCarthy's results.

The other weird thing about both McCarthy's and Setlow's results was that they found DNA from 'competent' cells of the non-transformable mutant rec-2 to be like DNA from log phase cells. I'll do a separate post about this.

Controlling mutagenesis

I've made some more progress on the problem of how to score USSs in simulated DNA fragments, and written it up in our work-in-progress document of how our model works. But here I want to get back to thinking of how mutagenesis can be set up to allow control of both the base composition and the ratio of transition mutations to transversion mutations. The problem was introduced in a previous post.

I have a printout of the analysis done for us by my biomathematician colleague. I'll try to summarize here what I think it says, and then I'll ask her to look at this post and tell me if I've got it wrong.

Before our program does any mutagenesis it will need to first calculate the values of three parameters, alpha, beta and gamma. (spelled out because Blogger doesn't do Greek letters.) These parameters specify the rates at which the specific bases mutate to each of the other three bases, as indicated in the table ("Blaisdell's 1985 mutation matrix"). In our model we will normalize these by dividing by their sums, as indicated below the table, so we can use them as probabilities.

The values these parameters will take depend on the three parameters we give to the model; these are described in the blue box. The formulas in the green box were derived by our colleague - we will write Perl code that uses these to calculate alpha, beta and gamma from G, µ and R. The program will then put these values into the table. Then, at each mutation step, the program will determine whether the mutating base is an A, T, C or G, and look up the appropriate probabilities of bases it can mutate to.

If we were to begin the simulation with a genome that did not have the desired base composition, it would also not initially have the desired ratio of transitions to transversions. If no opposing forces were acting, the base composition and ratio would equilibrate at the desired values. This should not be an issue for us because our genomes will be created with the desired base compositions.

How should our USS model score USS sequences?

Our computer simulation model of USS evolution is coming together, but we've hit a snag. I'm hoping that trying to explain it here will give me a clearer understanding of how to get around it.

A critical step in the model is scoring each simulated DNA fragment for USS-like sequences; the score then determines the probability that this fragment will replace its homologous sequence in the evolving genome. Our basic plan is to use a 'sliding window' the width of the USS motif we're using. The window would begin at position 1 of the fragment sequence, score the sequence in the window, and then move over one position, score again, and continue until it reached the end of the fragment. The final score would be the sum of the scores of the sequences at each window position. I expected this to be time consuming ('computationally intensive') but straightforward, but I was wrong.

The simplest scoring scheme would be to just check, at each window position, whether the sequence in the window exactly matches the USS consensus. (For simplicity here I'll consider only the standard 9bp USS core AAGTGCGGT, but I want the model to consider the full 29bp USS motif.) If the sequence in the window exactly matches the core we add 1 to the running score, otherwise we don't. The final score would then tell us the number of perfect-match USSs in the fragment. How the model would use this number to decide the probability of uptake is still to be decided, but I'll defer this problem to another post.

A slightly more subtle scheme would give partial scores for window positions containing sequences that nearly match the perfect USS core consensus. For example, any sequence that was mismatched at only 1 of the 9 positions could score 0.3, and any that were mismatched at 2 positions would score 0.1. Worse mismatches would score 0.

Both of the above simple schemes would work, but they do so because they ignore differences in the importances of different components of the USS, and in the tolerance for specific alternative vases at different positions. I had been planning to use a scoring matrix giving a value to each base at each position. Here's a very simple version of such a matrix, with the preferred base at each position scoring 1 and the other bases scoring 0. With this matrix a prefect USS core in the window would score 9, a singly mismatched USS would score 8, a doubly mismatched USS would score 7, etc.

Even a sequence that matched the USS consensus at only 1 position would get a non-zero score - it would be 1/9 of the score of a perfect USS. This creates two kinds of problems. First, we want only reasonably USS-like sequences to get significant scores, but under this scheme a random 9bp sequence will, on average, get a score of 2.5. Second, because the window evaluates 1000 positions in a 1kb sequence, the average score of a random sequence will be about 2500, and the average score of a fragment containing a perfect USS will be about 2506.5. The scoring scheme thus is far too weak in its ability to discriminate between fragments with and without good matches to the USS. A bit of thinking shows that it doesn't matter how big or small we make the individual numbers in the matrix.

But maybe I see the solution. What if only the best base at each position has a positive value in the scoring matrix, and the other bases have negative values? We'd want to adjust the negative values so that the average random sequence would get a score of zero. I still see lots of potential problems here, but maybe this is the way to go.

Added a bit later: There's a much simpler solution. Use any matrix we like, and just calculate the average score expected for random sequences, and subtract this from the actual score for each window position. The calculation is simple - just add up all the fractional scores for the different bases, remembering to correct for base composition. And rather than doing this correction at each window position, do it after the window has completed scanning all the positions in the fragment (subtract the product of the expected average score for each window position times the number of window positions scored).

For random sequences this should give fragment scores centered on zero. I don't know how broad the distribution would be, so I don't know how strongly a single USS would shift the distribution. We would like it to move the score out beyond almost all of the random fragments.

One other problem is what to do with negative scores. For random sequences these should be as common as positive scores, but if we simply treat them as zero scores then the effective average score will be half of the mean of the positive scores. Maybe we should subtract a number bigger than the expected average score, and treat all negative scores as zero.

Modeling USS evolution

Lately we've been working on our new computer simulation model of USS evolution.

I asked my favourite mathematical colleague about how to model mutation so that (1) transitions occurred at a different frequency than transversions, and (2) a desired base composition was maintained. She whipped out Wen-Hsiung Li's Molecular Evolution book and opened it to a page of equations describing various mathematical models of mutation that can be used to infer the evolutionary history of DNA sequences. These models include up to five different parameters (5 different Greek letters!), depending on how many independent factors are included in the model.

The equations accomplish the reverse of what we want, but she confidently offered to solve the most appropriate equations for us in a way that would let us use the desired final base composition of our sequence to calculate the values of the mutation parameters our program, and within 30 minutes she'd emailed the solutions to me. I think she used a program called Mathematica rather than mysterious mathematical superpowers to get the solutions. I still have to work through what she's sent, to see how our program will best use it.

p.s. The cells are once again growing normally in new batches of our usual medium. Unfortunately we still don't know the cause(s) of our recent problems.

progress of a sort

One fof the post-docs tested the media bottles for toxic detergent residues by making one big batch of our standard BHI medium and pouring it into 20 different bottles of all the types we use for media. E. coli grew in the media from all these bottles.

I don't know whether she tested growth of H. influenzae over the weekend. If they all grew too then we have the unfortunate resolution that the problem has gone away without our ever discovering what causes it. In any case I think we're going to have our lab assistant thoroughly rinse all our bottles.

Modeling mutation with transition bias

As part of our new-improved Perl model of uptake sequence evolution, we had been intending to incorporate the usual transition:transversion bias into the part of the model that simulates mutation of the evolving sequence. But it's turning out to be HARD.

In the previous version, the mutation step incorporated a bias of the same strength as the user-specified base composition. For the H. influenzae genome (38% G+C), the routine we were using caused the mutagenesis to produce As and Ts each 31% of the time and to produce Gs and Cs each 19% of the time. This was perfectly satisfactory (or would have been if not for other components of the mutagenesis that were unnecessarily cumbersome).

At a recent planning session we thought we had figured out a way to also have transition mutations (A<->G and C<->T) occur twice as often as transversion mutations, while maintaining the specified base composition. But, when we implemented these steps into a sub-program, the base composition (initially 38%G+C) increased with each cycle of mutagenesis, leveling out at about 45% G+C. So we went back to the drawing board (the big whiteboards in the hall) and tried to understand what was wrong.

Several things were wrong. One was an error in the computer code. We fixed that, but there was another error in the implementation, so we fixed that too. Then it became clear that there was also a fundamental error in our planned steps. We had thought that we simply needed to specify the ratio of A+T to G+C and the transition bias (2-fold). But with transition bias the number of each type of mutation depends not only on the properties of the mutagenesis algorithm but on the proportions of the bases in the sequence. For example, mutagenesis of a genome with lots of As will produce more mutations to Gs than will the same mutagenesis steps acting on a genome with few As.

So I spent much of this afternoon doing algebra, trying to come up with a general relationship between the base composition bias of the mutagenesis steps and the equilibrium base composition it will produce. Unfortunately I only do algebra about once every 5 years and, although I remember the very basic rules I learned in grade 9, I have none of the skills and creativity that a regular user would have. Or maybe the problem I was trying to solve is just intrinsically messy. In any case, I covered two whiteboards with Xs and Fs and parentheses but the equations never simplified. I could call on a mathematician friend for help, or we could simply decide that incorporating a transition:transversion bias is an unnecessary refinement that actually won't make any difference to the outcome of our model.

For now we're going to take the latter approach, which will allow our programming assistant to create some working code. If we later figure out how to incorporate the transition:transversion bias, we can probably just add the necessary lines to the mutagenesis section of the program.

Cells are (sometimes) growing but understanding isn't

In the new batches of media I made, both H. influenzae and E. coli grew in A (our Difco BHI) and in C (borrowed BHI) and D (old HI). They didn't grow at all in B (MBP BHI). Reinoculation of media from these bottles by one of the post-docs and by me (using independent H. influenzae stocks) gave good growth in A, C and D but not B.

But there was surprising contamination in both our cultures. Surprising because previous uses of the identical stocks had no contamination, and the no-cells controls showed no contamination in two control media (indicating that my hemin and NAD supplements were uncontaminated. And an agar-plate streak of my H. influenzae inoculum showed no evidence of contamination. Even the contaminants couldn't grow in medium B, but this would make better sense if another post-doc hadn't previously found that H. influenzae grows just fine in MBP BHI.

To test whether the identify of the person doing the preparation matters, another post-doc made media that should have been identical to my new A, B, C and D. But H. influenzae grew in all of these batches, even the MBP BHI. We can now (almost) conclude that growth results are reproducible for any specific bottle of media we prepare, but not for different preparations made from the same powder. (I write 'almost' because of a couple of pesky exceptions observed last week.)

I'm wondering if there's something wrong with our clean bottles....

Cells are growing but we don't know why

The dilution test showed that adding as little as one part sBHI to 4 parts sLB was enough to prevent H. influenzae cells from growing. The plating test showed that H. influenzae cells would form colonies on the surface of an sBHI agar plate, but wold not form colonies if they are embedded in a layer of sBHI top agar. None of these results make any sense.

So we decided to go back and retest all the variables, in the assumption that at least one of our previous tests gave a misleading result. The first thing to retest was the effect of different sources of BHI powder, because this is the most likely culprit. We have (A) the Difco/Bacto stock we've been using (a nearly-empty 2.5kg tub), (B) a newly opened bottle of Marine BioProducts brand, and (C) a Difco BHI bottle from another lab (dated 2003). We also have (D) an ancient bottle of Difco heart infusion powder (HI) that I bought when I started the lab in 1990 but never opened.

I made 200 ml of each, weighing the powder directly into the bottle (no weighboat) and filling the bottle directly from the water carboy (no measuring cylinder). I inoculated 5 ml of each with 200 ul of H. influenzae cells from the LB overnight. As controls I used the reproducibly toxic BHI stock E I had been previously using, and the LB that the cells reproducibly grew in.

After a few hours it was clear that the cells were growing slowly in medium A, faster in media C, D and LB, and not at all in B and E. The growth in A is unexpected, as A should be identical to E. We also inoculated E. coli into these media - we'll see the growth results this morning.

Not the size of the inoculum, and not the water...

I tested the effect of inoculum size by trying to grow serial dilutions of both E. coli and H. influenzae in a variety of media, with 10-fold dilutions ranging from 10^-2 to 10^-9 of a turbid resuspension. All of the E. coli dilutions grew fine in LB and none grew in BHI or sBHI (testing the effect of hemin and NAD on E. coli in BHI). And all of the H. influenzae dilutions grew fine in sLB but not at all in sBHI. So the growth failure doesn't depend on the number of cells inoculated.

So today we made up BHI using water obtained from various other labs nearby (labs with fancy water-purification systems) and from tap water. None supported growth of H. influenzae, and they only weakly supported growth of E. coli. Because I had realized that the LB that always supported growth had been made up about 6 weeks ago, we also made fresh LB with our current stock of distilled water. It supported growth just as well as the old LB. So the problem isn't something in our water.

My tests of the different agar plates confirmed that media that doesn't support growth in broth does support growth just fine when solidified with agar. And plating of the previous day's E. coli 'didn't grow' cultures showed that the broth contained about 10^6 cfu/ml, which is not very different from the cfu it was inoculated with.

One of the post-docs has set up a test of whether the growth state of the inoculated cells matters. She took H. influenzae cells left from last-night's test (the same ones that failed to grow in the various water tests but did grow in sLB), and from the newly-growing sLB culture.

I've now set up overnight tests of whether the agar-solidified medium supports H. influenzae growth if the cells are embedded in a layer of top agar (more dilute agar) on top of the sBHI, or are embedded in 10ml of sBHI agar. I tested both top agar made with medium E and made with half medium E and half LB.

And to test whether added sLB restores growth to H. influenzae cells in sBHI, or whether added sBHI poisons cells in sLB, I've inoculated cells into various mixtures (5:0, 4:1, 3:2, 2:3, 1:4, 0:5).

The situation is getting scary, as we're fast running out of variables to test.

Curiouser and curiouser

So yesterday I did almost exactly the same test of medium A I had done the day before. The H. influenzae results were the same - cells grew in sLB but not in sBHI. But this time the E. coli didn't grow in sBHI either (it did grow in sLB), whereas yesterday it had grown well in both media. The growth/no growth distinctions were made by both microscopic examinations and turbidity checks. This confirms that the post-docs' earlier result of E. coli sometimes not growing in sBHI was not due to something odd they had done on that day. So whatever the problem is, it can affect both E. coli and H. influenzae.

The only differences I'm aware of are that I inoculated the cultures with fewer cells (both E. coli and H. influenzae), and, it being a different day, I used cells taken from newly grown colonies on agar plates. I had the same results with the other batches of media the post-docs had tested (B, C and D), as did the post-docs. And I had the same result with medium batch E, freshly prepared by one of the post-docs.

I also tested survival of the cells by plating the inoculum (H. influenzae only) before it was added to the test media, and the cultures at different times (H. influenzae after 30 minutes and 3 hours, and E. coli after 3 hours). I used both old plates that were known to work fine and new plates made with batch E BHI, so this will also tell me whether the problem occurs only in liquid medium.

Today I'm going to test whether the size of the inoculum makes a difference. This will require lots of plates, so I hope the test batch E plates worked fine. It would be hard to do this test properly using just the microscope and turbidity checks. Inoculum size can make a difference - I remember once when we had to use a borrowed shaker for our culture tubes (our roller wheel being broken) we found that a too-small inoculum of H. influenzae gave no growth in sBHI. It was as if the medium contained a limited amount of something toxic that was removed from the medium when it was absorbed by the cells, so that when only few cells were present each received a toxic dose of this hypothetical substance, whereas when more cells were present each received a proportionally smaller and thus non-toxic dose. We didn't bother tracking down this mystery because we stopped using the borrowed shaker as soon as our roller wheel was fixed.

The plating I've done should tell me whether the BHI kills the cells or just doesn't allow them to grow. I may also do the 50:50 (sLB:sBHI) mixing test, which I didn't have time to do yesterday - maybe even testing different ratios of the two media.

I may also seek out another source of pure water (from another lab), just in case something has gone weird with either our still or the carboys we store its output in.

BHI problems

Yesterday one of the post-docs took me through the various tests they'd done trying to find out what's wrong with our brain-heart infusion medium (BHI).

First some background: The standard procedure is to supplement BHI with hemin and NAD from stock supplies that are already made up to standard concentrations (giving sBHI), to put 5ml of sBHI into a 25ml glass culture tube, to add H. influenzae cells either from a frozen stock or a previous culture (usually a 1/100 dilution), and to incubate the culture overnight on a roller wheel in the 37°C incubator. H. influenzae will normally also grow in the E. coli medium LB, provided it has been supplemented with hemin and NAD (sLB). E. coli doesn't need hemin or NAD and will grow well in both LB and BHI, better in BHI because it's a richer (and more expensive) medium. Growth can be checked various ways. The easiest is simply looking at the turbidity of the culture, but we can also measure the turbidity in a spectrophotometer, dilute the culture and plate the cells on agar to count the colonies, or look at the cells under a microscope. The last is not very useful for H. influenzae, mainly because the cells are so tiny but partly because our very expensive microscope is badly out of alignment. (We are arranging for a long-overdue visit from the serviceman.)

The first tests, done with the original problematic batch of BHI, found that inoculation of sBHI culture tubes with H. influenzae produced no growth but inoculation of E. coli directly into the bottles of BHI produced abundant growth.

The lab assistant then made four bottles of new BHI using combinations of the two stocks of BHI powder (old Difco brand and new MBL brand) and two sources of distilled water (secondary carboy and source carboy); this gave four bottles, labeled A, B, C, and D. One of the post-docs supervised this to make sure she wasn't making any errors. The post-docs then inoculated tubes of these media with H. influenzae from a freezer stock (amounts not carefully controlled, and with E. coli from a fresh culture. All of the H. influenzae tubes grew but none of the E. coli ones did! They then inoculated H. influenzae from the tubes that had grown into fresh tubes of media A-D, and this time the cells didn't grow!

I wanted to find out what the medium was doing to the cells by looking at them under the microscope. So yesterday I inoculated tubes of sBHI (from bottle A) and sLB with measured amounts of H. influenzae and of E. coli. Both inocula were prepared by resuspending cells taken from fresh colonies on plates into a small amount of LB, and then adding 50 microliters to 2.5ml of medium. I used disposable plastic culture tubes rather than our standard glass culture tubes because we wanted to exclude the possibility that dirty tubes were causing the problem.

I looked at the cells immediately after inoculation and after 30', 60' and 120' in the incubator. The E. coli cells in both sLB and sBHI did what healthy cells do - they gradually became longer and divided so that, after 120' the culture was very cloudy and each microscope 'field of view' contained 5-10 times more cells than it had at the start. The H. influenzae cells in sLB also grew. Because they're so little they looked like tiny specks and threads, but the number and proportion of threads got higher, indicating that the cells were elongating and dividing into new cells. But the cells in sBHI just sat there, continuing to look like a mixture of specks and short threads. The post-doc measured the culture turbidities in the spectrophotometer, confirming that E. coli was at high density in both media (higher density in the sBHI) and that H. influenzae was at higher density in sLB than in sBHI.

What did I learn? First, the problem is reproducible. Second, it isn't dirty culture tubes. Third, the problem manifests itself quite quickly. It isn't that the cells grow initially and then run out of some key nutrient - rather they don't grow at all. Fourth, the problem isn't the hemin or NAD. These results reinforce my notion that we should focus on the inability of H. influenzae to grow in BHI medium that does allow growth of E. coli, and not worry for now about the time when H. influenzae did grow and E. coli didn't.

So what will I do today? The post-doc made a big batch of BHI and BHI agar yesterday for me to do tests with, and streaked out H. influenzae and E. coli on agar plates.

I think I'll first repeat yesterday's experiment with media A-D and the new batch (call it E), this time measuring the turbidities of all cultures at the start as well as after 2 hours.
I'll also use oil-immersion to look at the H. influenzae cells under the microscope - this is a bit more hassle but gives higher resolution.
I'll also dilute and plate the H. influenzae cells that are in the medium they wouldn't grow in yesterday. By doing this I can find out whether the cells die (and how quickly) or just fail to grow. Finding that the cells die would suggest that the medium contains something toxic, whereas finding that they just fail to grow would suggest that the medium is lacking an important nutrient.
I'll also try mixing the sBHI 50:50 with LB. This might also show whether there's something missing from the BHI (if H. influenzae grows in the mixture) or something toxic in the BHI (if H. influenzae doesn't grow in the mixture).
The batch E BHI agar was made with the same medium as batch E broth, so I'll pour plates of this and see if cells grow into colonies overnight. So far the problem has been found only with cells in liquid culture, but the liquid medium and agar have been from batches made on different days. (I can do my other plating on another batch of plates that one of the post-docs poured on Monday - we know cells do grow into colonies on these plates.)

This is all very tiresome but it's part of normal science, reinforcing my adage that "Most scientists spend most of their time trying to figure out why their experiments won't work."

Why has our culture medium suddenly become toxic?!?

This week we're wrestling with a practical mystery - for unknown reasons the 'brain heart infusion' culture medium we usually use for H. influenzae no longer supports growth.

The problem first surfaced on Monday, when one of the post-docs found that cultures she had inoculated the night before had no growth. Suspecting that she'd made a mistake, she carefully reinoculated them, only to find no growth again on Tuesday morning. On Wednesday another post-doc found that her cultures hadn't grown either.

The medium had been prepared by our new lab assistant, so naturally we wondered if she'd made a mistake. This seemed unlikely because (1) she's very careful, and (2) making it is very simple (just dissolve 37 g of BHI powder in 1 liter of water, pour it into bottles and autoclave, and (3) the medium looked pretty normal (clear, golden brown) and we had a hard time imagining what could entirely prevent bacteria from growing in it. When she came in she confirmed that the medium had been made from the nearly-empty container of Difco BHI powder we've been using for months now, not the other-brand bottle we were going to try out someday soon.

A measuring error might have given less growth, but not none. Making the medium up with the 'wrong' water shouldn't have mattered - I'm pretty sure cells would grow fine even if the medium was made with plain tap water instead of the distilled water we meticulously use. The usual supplements of hemin and NAD were from stock tubes that had worked fine in previous cultures.

Simple tests didn't find a problem - the BHI smelled normal, and had a reasonable pH of about 8. The post-docs tried inoculating E. coli into it (less fussy than H. influenzae), but these cells didn't grow either. As well as inoculating the E. coli into 5ml of BHI in our usual culture tubes they also cleverly tried inoculating it directly into the stock bottle, and surprisingly these cells grew fine. So they suspected that maybe the problem was with the culture tubes rather than the BHI. In a parallel test they had made a fresh batch of BHI, and tested this in culture tubes with H. influenzae and E. coli. This time the E. coli again didn't grow, but the H. influenzae did, again directing suspicion to the culture tubes.

They did some more tests yesterday, so maybe today we'll figure out what the problem is...

The sxy manuscript is done!

The long gestation period of our sxy manuscript is finally over. It's been accepted, we sent in the signed open access forms, and yesterday we sent our corrections to the proofs back to the copy editors. So it should appear soon on the Advance Access page of Nucleic Acids Research.

This first results reported by this paper (the existence of sxy mutations that might cause hypercompetence by disrupting base pairing in sxy mRNA) were generated about 12 years ago. If we'd properly gotten our act together we could have published a less thorough analysis a long time ago. But we didn't, for various reasons, and I'm happy they're finally appearing in such a fine paper.