- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
-
Political pollsters are pretending they know what's happening. They don't.5 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections6 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
What people come here to read
I've just discovered that last year's post about the relative safety of the DNA stain ethidium bromide is now the #3 hit for Google searches on "ethidium bromide". Word must be getting around.
Data from the quorum sensing paper supports diffusion sensing
With the colleague who brought the flawed quorum sensing paper to my attention, I'm writing a 'Commentary' letter to Nature pointing out the paper's glaring flaws. Crafting this letter has been scientifically instructive in two ways.
First, the colleague is a biomathematician who uses bacterial cultures as models of evolutionary and ecological processes. He wasn't familiar with my diffusion sensing hypothesis, but once he read the paper he realized that work he'd been doing about communities containing exploiters (the Snowdrift model) nicely applied to this system. So he's been educating me about his model.
Second, I realized that the flawed paper provides excellent data supporting some assumptions of the diffusion sensing hypothesis. Although the paper's interpretation and conclusions are flawed, the experiments themselves look to have been carefully done and to have produced solid data. In particular, they determined the final cell densities of pure cultures of bacteria growing in rich medium, where protease production is not beneficial.
The diffusion sensing hypothesis assumes (sensibly) that synthesis and secretion of such effector molecules as proteases, antibiotics and siderophores is expensive, but that production and secretion of the autoinducers that regulate the effectors is cheap.
In one of the paper's experiments, cultures of protease-producing cells grew to about 35% lower density than cultures of non-producers. This was very nicely controlled by including cells that did not produce the autoinducer signal that activates protease production. Without activator these cells grew as well as cells that did not produce protease because they couldn't recognize the autoinducer signal, but when activator was provided externally they grew as poorly as wildtype cells.
This tells us two things. First, production of the protease is indeed quite costly - a 35% difference in final cell density means that natural selection will strongly favour cells that don't secrete proteases when they're not needed (confirmed by the paper's competition experiments). Second and more important, secretion of the autoinducer is very cheap. The final cell densities of the cells that didn't produce autoinducer and of the cells that produced it but couldn't respond to it were identical (within the resolution of the figure), so the cost of production must be very low. The cost is unlikely to be zero - this could be tested by competition experiments between the two strains.
First, the colleague is a biomathematician who uses bacterial cultures as models of evolutionary and ecological processes. He wasn't familiar with my diffusion sensing hypothesis, but once he read the paper he realized that work he'd been doing about communities containing exploiters (the Snowdrift model) nicely applied to this system. So he's been educating me about his model.
Second, I realized that the flawed paper provides excellent data supporting some assumptions of the diffusion sensing hypothesis. Although the paper's interpretation and conclusions are flawed, the experiments themselves look to have been carefully done and to have produced solid data. In particular, they determined the final cell densities of pure cultures of bacteria growing in rich medium, where protease production is not beneficial.
The diffusion sensing hypothesis assumes (sensibly) that synthesis and secretion of such effector molecules as proteases, antibiotics and siderophores is expensive, but that production and secretion of the autoinducers that regulate the effectors is cheap.
In one of the paper's experiments, cultures of protease-producing cells grew to about 35% lower density than cultures of non-producers. This was very nicely controlled by including cells that did not produce the autoinducer signal that activates protease production. Without activator these cells grew as well as cells that did not produce protease because they couldn't recognize the autoinducer signal, but when activator was provided externally they grew as poorly as wildtype cells.
This tells us two things. First, production of the protease is indeed quite costly - a 35% difference in final cell density means that natural selection will strongly favour cells that don't secrete proteases when they're not needed (confirmed by the paper's competition experiments). Second and more important, secretion of the autoinducer is very cheap. The final cell densities of the cells that didn't produce autoinducer and of the cells that produced it but couldn't respond to it were identical (within the resolution of the figure), so the cost of production must be very low. The cost is unlikely to be zero - this could be tested by competition experiments between the two strains.
I've never posted about quorum sensing?
About 5 years ago I wrote an opinion piece suggesting that the widespread phenomenon of bacterial quorum sensing had been misinterpreted.
Everyone had been assuming that bacteria secrete small autoinducer molecules and detect the concentration of these because this lets them estimate population density and thus predict the utility of investing in cooperative behaviour that benefits the whole population. Such behaviour is costly to individuals, and in populations containing such cooperators, individuals will do better by cheating (sitting back and letting others do the cooperative work). The difficulty of explaining how cooperation could evolve in the presence of cheaters is a serious problem for this 'quorum sensing' hypothesis. But most microbiologists have at best a very superficial understanding of evolution, and the appealing assumption that bacteria use autoinducers to talk to each other and act cooperatively spread like wildfire.
My radical suggestion (Redfield 2002, Trends in Microbiology 10: 365-370) was that bacteria instead secrete and detect autoinducers as a method of determining whether the benefits of secreting more expensive effector molecules will limited by diffusion and mixing. For example, it would be a waste of resources to secrete degradative enzymes such as proteases if they are going to immediately wash away. This 'diffusion sensing' proposal was welcomed by at least some evolutionary biologists but resolutely ignored by almost all microbiologists working on quorum sensing, I suspect because it removed the glamour from their research.
A paper on the evolution of quorum sensing has just appeared in Nature (Diggle et al. 2007 Nature 450:411-414). The authors evaluated the above-mentioned cheating problem using laboratory cultures, and then showed that cheaters do not prosper if they are prevented from associating with cooperators. This is hardly surprising. Here's what they did:
They used the bacterium Pseudomonas aeruginosa, which uses autoinducer secretion to induce production of a protease (among other things). They grew these bacteria in a culture medium whose major nutrient was a protein called BSA. Bacterial cells can't take up intact protein, but wildtype P. aeruginosa can grow well on the amino acids released when the protein is degraded by the protease it secretes.
1. They showed that mutant cells unable to produce the protease grew poorly in BSA medium, reaching only about 1/3 the cell density of wildtype cells after 24 hr. These mutants had normal protease genes but either could not produce or could not detect the autoinducer signal, so they could not turn this gene on. Their ability to grow at all in this medium was because they still could produce other proteases.
2. They showed that the mutants could grow fine if either autoinducer or protease were added to the medium. This confirmed that their poor growth was because of their regulatory defects, not some other factor.
3. They showed that the mutants grew fine on rich medium containing lots of amino acids where the protease was not needed. In fact the mutants grew better than the wildtype cells, which the authors logically interpret as being because they did not waste resources producing a protease they had no use for. They confirmed this by adding autoinducer to the cells that couldn't make their own - as predicted, this caused them to now grow poorly, presumably because they were now producing the expensive protease.
4. They then showed that the mutants acted as 'cheaters' when they were mixed with wildtype cells and grown in the BSA medium. Cultures that began with only 1-3% mutants had >40% mutants after 48 hr growth. The mutants outgrew the wildtype cells because they didn't expend resources making the protease. They got all the amino acids they needed because the wildtype cells around them produced protease. They thus prospered at the expense of the wildtype 'cooperators'. This is not at all surprising given the also-unsurprising results in points 1-3 above. The mutants never took over the cultures, because their advantage depended on the cultures also containing lots of wildtype cells.
The authors then said:
The other big problem is that the experiment that they claim tests kin selection is really just a repeat of the growth experiments they did in point 1 above. Here's a diagram of the experiment as it is described in their Methods section.
Treatment A (on the left) is a 6-cycle repeat of the mixed culture experiment described in point 4 above. In each step 12 tubes of mixed culture (only 6 are shown) were pooled together after 48 hr growth. This mixture was then inoculated into fresh tubes for another 48 hr growth. So the mutant cells (potential cheaters) were always growing with wildtype cells that produced the protease the mutants needed. It's not surprising that, at the end of 6 cycles of this mixed growth the culture still contained about 35% mutants.
Treatment B is claimed to provide conditions that allow kin selection. But after the first cycle it's really just a series of pure-wildtype or pure-mutant cultures like those described in point 1 above. This is because, after the overnight cultures were pooled, the cells were grown into single colonies on an agar plate, and a different colony was used to inoculate each new tube for overnight growth. So each of these tubes contained a pure clone of either a mutant or wildtype cell. As the cultures in point 1 showed, the mutant cells grow poorly when they have no wildtype cells to provide protease. It's not surprising that, at the end of 6 cycles, all of the 12 cultures are wildtype. I did the calculations and this is exactly the result predicted by the differences in growth seen in point 1.
The experiments are competently done and the data looks very solid. But to have gotten this into Nature the authors must both be masters of spin and have had very inept reviewers.
So why am I blogging about this? Partly because the authors are pushing an unjustified conclusion, and partly because I'm very annoyed that they completely ignored the point of my 2002 paper, even though the senior author and I have discussed it in some detail when he visited here last year (he gave no indication then that he thought I was wrong). Worse, the authors* do cite this paper, but for something it doesn't actually contain (evidence that cheaters can invade quorum-sensing populations). I suspect that a not-totally-inept reviewer told them that they should cite me, and they avoiding having to discuss the diffusion-sensing explanation by citing me for something I didn't do.
*One of the authors has emailed me expressing dismay over the harsher term I originally used, so I've changed it.
Everyone had been assuming that bacteria secrete small autoinducer molecules and detect the concentration of these because this lets them estimate population density and thus predict the utility of investing in cooperative behaviour that benefits the whole population. Such behaviour is costly to individuals, and in populations containing such cooperators, individuals will do better by cheating (sitting back and letting others do the cooperative work). The difficulty of explaining how cooperation could evolve in the presence of cheaters is a serious problem for this 'quorum sensing' hypothesis. But most microbiologists have at best a very superficial understanding of evolution, and the appealing assumption that bacteria use autoinducers to talk to each other and act cooperatively spread like wildfire.
My radical suggestion (Redfield 2002, Trends in Microbiology 10: 365-370) was that bacteria instead secrete and detect autoinducers as a method of determining whether the benefits of secreting more expensive effector molecules will limited by diffusion and mixing. For example, it would be a waste of resources to secrete degradative enzymes such as proteases if they are going to immediately wash away. This 'diffusion sensing' proposal was welcomed by at least some evolutionary biologists but resolutely ignored by almost all microbiologists working on quorum sensing, I suspect because it removed the glamour from their research.
A paper on the evolution of quorum sensing has just appeared in Nature (Diggle et al. 2007 Nature 450:411-414). The authors evaluated the above-mentioned cheating problem using laboratory cultures, and then showed that cheaters do not prosper if they are prevented from associating with cooperators. This is hardly surprising. Here's what they did:
They used the bacterium Pseudomonas aeruginosa, which uses autoinducer secretion to induce production of a protease (among other things). They grew these bacteria in a culture medium whose major nutrient was a protein called BSA. Bacterial cells can't take up intact protein, but wildtype P. aeruginosa can grow well on the amino acids released when the protein is degraded by the protease it secretes.
1. They showed that mutant cells unable to produce the protease grew poorly in BSA medium, reaching only about 1/3 the cell density of wildtype cells after 24 hr. These mutants had normal protease genes but either could not produce or could not detect the autoinducer signal, so they could not turn this gene on. Their ability to grow at all in this medium was because they still could produce other proteases.
2. They showed that the mutants could grow fine if either autoinducer or protease were added to the medium. This confirmed that their poor growth was because of their regulatory defects, not some other factor.
3. They showed that the mutants grew fine on rich medium containing lots of amino acids where the protease was not needed. In fact the mutants grew better than the wildtype cells, which the authors logically interpret as being because they did not waste resources producing a protease they had no use for. They confirmed this by adding autoinducer to the cells that couldn't make their own - as predicted, this caused them to now grow poorly, presumably because they were now producing the expensive protease.
4. They then showed that the mutants acted as 'cheaters' when they were mixed with wildtype cells and grown in the BSA medium. Cultures that began with only 1-3% mutants had >40% mutants after 48 hr growth. The mutants outgrew the wildtype cells because they didn't expend resources making the protease. They got all the amino acids they needed because the wildtype cells around them produced protease. They thus prospered at the expense of the wildtype 'cooperators'. This is not at all surprising given the also-unsurprising results in points 1-3 above. The mutants never took over the cultures, because their advantage depended on the cultures also containing lots of wildtype cells.
The authors then said:
"Our results show that quorum sensing is a social trait, susceptible to exploitation and invasion by cheats. Given this, how is quorum sensing maintained in natural populations?"This is rather sneaky. The authors have just nicely shown that, in mixed laboratory cultures optimized to make growth dependent on protease secretion, cells that use protease secreted by other cells are favoured. So it's probably legitimate to describe protease production under these conditions as a social trait. But they jump over the question of whether natural populations grow under comparable conditions. Instead they just allow us assume this, and go on to propose the explanation they want to test:
"The most likely explanation is kin selection - if neighbouring cells are close relatives they will have a shared interest in communicating honestly and cooperating."What? No mention of the by-far more likely explanation that autoinducers exist mainly for a cell-autonomous function (diffusion sensing) that is not subject to cheating? This is the first big problem with this paper.
The other big problem is that the experiment that they claim tests kin selection is really just a repeat of the growth experiments they did in point 1 above. Here's a diagram of the experiment as it is described in their Methods section.
Treatment A (on the left) is a 6-cycle repeat of the mixed culture experiment described in point 4 above. In each step 12 tubes of mixed culture (only 6 are shown) were pooled together after 48 hr growth. This mixture was then inoculated into fresh tubes for another 48 hr growth. So the mutant cells (potential cheaters) were always growing with wildtype cells that produced the protease the mutants needed. It's not surprising that, at the end of 6 cycles of this mixed growth the culture still contained about 35% mutants.
Treatment B is claimed to provide conditions that allow kin selection. But after the first cycle it's really just a series of pure-wildtype or pure-mutant cultures like those described in point 1 above. This is because, after the overnight cultures were pooled, the cells were grown into single colonies on an agar plate, and a different colony was used to inoculate each new tube for overnight growth. So each of these tubes contained a pure clone of either a mutant or wildtype cell. As the cultures in point 1 showed, the mutant cells grow poorly when they have no wildtype cells to provide protease. It's not surprising that, at the end of 6 cycles, all of the 12 cultures are wildtype. I did the calculations and this is exactly the result predicted by the differences in growth seen in point 1.
The experiments are competently done and the data looks very solid. But to have gotten this into Nature the authors must both be masters of spin and have had very inept reviewers.
So why am I blogging about this? Partly because the authors are pushing an unjustified conclusion, and partly because I'm very annoyed that they completely ignored the point of my 2002 paper, even though the senior author and I have discussed it in some detail when he visited here last year (he gave no indication then that he thought I was wrong). Worse, the authors* do cite this paper, but for something it doesn't actually contain (evidence that cheaters can invade quorum-sensing populations). I suspect that a not-totally-inept reviewer told them that they should cite me, and they avoiding having to discuss the diffusion-sensing explanation by citing me for something I didn't do.
*One of the authors has emailed me expressing dismay over the harsher term I originally used, so I've changed it.
Shifting our perspective
Oh no, has it been a whole week since I last posted?
Among other things I've been working with one of the post-docs on her manuscript about the amount of variation in competence and transformability between different isolates of Haemophilus influenzae. Today we progressed to thinking about what we should say in the Discussion section, specifically about how selection on competence genes might have changed since the common ancestor of these strains.
(I'll use 'strains' interchangeably with 'isolates' in this post. In so doing I'm implicitly (here explicitly) assuming that the properties of the human-dwelling H. influenzae cell that gave rise to the original lab colony have not been changed by whatever laboratory propagation its descendants might have experienced.)
To discuss this variation we have to change how we've been thinking about variation. The data the post-doc has generated tell us about the ability of 34 present-day strains to take up DNA and recombine it into their chromosomes. To discuss the data's evolutionary implications we need to integrate it into the (unknown) history of these strains and of the species.
We don't know anything directly about the common ancestor of these strains, or of all the bacteria we call H. influenzae. But maybe we can start by making some inferences from a large published survey of the genetic variation in H. influenzae strains, and from the published genome sequences of some strains.
The large survey was a 'MLST' study, in which the same 7 genes were sequenced in each of more than 700 strains (Meats et al. 2003). I don't remember whether the authors were able to draw any specific conclusions about evolutionary history, but if they did we should certainly consider whether they can be applied to our analysis.
About 12 H. influenzae genomes have been sequenced (and the sequences are 'available'), but only a few of them have been analyzed in any detail. Much of the sequencing work is being done in the context of an explicit evolutionary hypothesis - that H. influenzae and other bacterial pathogens are best described as having a 'distributed genome'. This is Garth Ehrlich's idea; here's how one of his papers explains it:
Among other things I've been working with one of the post-docs on her manuscript about the amount of variation in competence and transformability between different isolates of Haemophilus influenzae. Today we progressed to thinking about what we should say in the Discussion section, specifically about how selection on competence genes might have changed since the common ancestor of these strains.
(I'll use 'strains' interchangeably with 'isolates' in this post. In so doing I'm implicitly (here explicitly) assuming that the properties of the human-dwelling H. influenzae cell that gave rise to the original lab colony have not been changed by whatever laboratory propagation its descendants might have experienced.)
To discuss this variation we have to change how we've been thinking about variation. The data the post-doc has generated tell us about the ability of 34 present-day strains to take up DNA and recombine it into their chromosomes. To discuss the data's evolutionary implications we need to integrate it into the (unknown) history of these strains and of the species.
We don't know anything directly about the common ancestor of these strains, or of all the bacteria we call H. influenzae. But maybe we can start by making some inferences from a large published survey of the genetic variation in H. influenzae strains, and from the published genome sequences of some strains.
The large survey was a 'MLST' study, in which the same 7 genes were sequenced in each of more than 700 strains (Meats et al. 2003). I don't remember whether the authors were able to draw any specific conclusions about evolutionary history, but if they did we should certainly consider whether they can be applied to our analysis.
About 12 H. influenzae genomes have been sequenced (and the sequences are 'available'), but only a few of them have been analyzed in any detail. Much of the sequencing work is being done in the context of an explicit evolutionary hypothesis - that H. influenzae and other bacterial pathogens are best described as having a 'distributed genome'. This is Garth Ehrlich's idea; here's how one of his papers explains it:
The distributed genome hypothesis (DGH) states that pathogenic bacteria possess a supragenome that is much larger than the genome of any single bacterium, and that these pathogens utilize genetic recombination and a large, non-core set of genes as a means of diversity generation.Well, that's certainly very relevant to our analysis of the distribution of transformability! Now we just need to clarify, first for ourselves and then for potential readers of our manuscript, how having a diversity of competence and transformation phenotypes fits into this.
The Redfield Factor
The Redfield Factor: The number of kilobase pairs in a gram of DNA: 10^18.
The Inverse Redfield Factor: The weight in grams of 1000 base pairs of DNA: 10^-18.
(Inspired by The World's Fair)
The Inverse Redfield Factor: The weight in grams of 1000 base pairs of DNA: 10^-18.
(Inspired by The World's Fair)
Where are they now? (competence proteins in Bacillus)
Dave Dubnau's group has been doing excellent work on the molecular biology of competence and DNA uptake in Bacillus subtilis. Their latest paper (Naomi Kramer, Jeanette Hahn and David Dubnau, 2007. Mol. Micro. 65:454-464) looks at the subcellular locations of induced proteins in competent cells.
They find that many proteins co-localize at the poles of the cells, and that DNA is taken up preferably at the poles. Absence of one protein due to mutation pcauses perturbations in other proteins, confirming that they interact in some way. They interpret these interactions as forming a complex that exists to promote recombination between incoming DNA and the cell's chromosome.
However there's another interpretation. Another recent paper on B. subtilis competence showed time-lapse movies of cells developing and then losing competence (An excitable gene regulatory circuit induces transient cellular differentiation. Süel et al.Nature 440, 545-550; see supplementary movies 1 & 2). These show that competent cells form filaments - such suppression of cell division is typical of cells whose DNA replication has been arrested. Only on seeing these movies did I realize that B. subtilis cells arrest DNA replication when they become competent, although this has been known for a long time (papers published ca. 1970).
How does this replication arrest work (what's cause and what's effect)? It seems to be caused by accumulation of the competence transcription factor ComK, and released when MecA causes ComK to be degraded, but I don't know how ComK arrests replication. Possibilities come from looking at the lists of genes that are induced by ComK (Berka et al. 2002 Mol. Micro 43:1331-1345). There are a lot,a nd this paper (also from Dubnau's group) concludes that competence is only one aspect of a complex stress response. Here's the last apragraph of their Introduction:
They find that many proteins co-localize at the poles of the cells, and that DNA is taken up preferably at the poles. Absence of one protein due to mutation pcauses perturbations in other proteins, confirming that they interact in some way. They interpret these interactions as forming a complex that exists to promote recombination between incoming DNA and the cell's chromosome.
However there's another interpretation. Another recent paper on B. subtilis competence showed time-lapse movies of cells developing and then losing competence (An excitable gene regulatory circuit induces transient cellular differentiation. Süel et al.Nature 440, 545-550; see supplementary movies 1 & 2). These show that competent cells form filaments - such suppression of cell division is typical of cells whose DNA replication has been arrested. Only on seeing these movies did I realize that B. subtilis cells arrest DNA replication when they become competent, although this has been known for a long time (papers published ca. 1970).
How does this replication arrest work (what's cause and what's effect)? It seems to be caused by accumulation of the competence transcription factor ComK, and released when MecA causes ComK to be degraded, but I don't know how ComK arrests replication. Possibilities come from looking at the lists of genes that are induced by ComK (Berka et al. 2002 Mol. Micro 43:1331-1345). There are a lot,a nd this paper (also from Dubnau's group) concludes that competence is only one aspect of a complex stress response. Here's the last apragraph of their Introduction:
Unexpectedly, we have found that the expression of at least 165 genes is upregulated in the presence of ComK. These include open reading frames (ORFs) that were previously shown to be ComK dependent as well as many for which there was no prior evidence of ComK control. In several cases, validation of the microarray data was achieved through the use of promoter fusions. This profound alteration in the expression programme involves many genes that appear to have no role in transformation. We propose that competence as usually defined is but one feature of a differentiated, growth-arrested state, which we propose to call the K-state.And here are the last paragraphs from their Discussion:
It is certain that many (probably most) of the newly identified ComK-dependent genes are not required for competence, originally defined as receptivity to transformation (Lerman and Tolmach, 1957), nor for the recombination and recovery steps that follow DNA uptake. We have demonstrated that this is the case for pta and oxdC/yvrL, and it is certainly true for many of the 29 intermediary metabolism genes, the six sporulation genes and many of the newly identified ComK-dependent transcriptional regulators. It is therefore no longer appropriate to refer to the ComK-determined physiological state as 'competence', as more is involved than transformability. We propose to refer to this instead as the 'K-state', a neutral term with no functional connotation.Arresting DNA replication is a fairly desperate measure, and I'd like to know what makes the K-state worth the risk. Despite the statements above about competence being only one of the K-state's functions, Dubnau's group seems to have slipped back into assuming it's the only function. Here's a paragraph from the Discussion of a more recent paper on how the K-state is triggered and maintained or lost (Maamar & Dubnau 2005, 56:615-624). It assumes that competence is the function of the K-state and that transformation is the function of competence. It then constructs an evolutionary just-so story to explain why only small fractions of the cells in a lab culture enter this state:
The cell shape and cell division genes are of particular interest, as the K-state is associated with inhibition of cell elongation and division (Hahn et al., 1995; Haijema et al., 2001). The competence gene comGA plays a role in the inhibition of these two processes. One ComK-dependent gene cluster (Fig. 2) includes the genes for Maf (an inhibitor of cell division that has also been implicated in DNA repair; Butler et al., 1993; Minasov et al., 2000), MreB, MreC, MreD (shape determining factors; Jones et al., 2001), MinC and D (inhibitors of cell division; Levin et al., 1992) and RadC, a probable DNA repair protein. In addition to this cluster, mbl, tuaF, tuaG, cwlH and cwlJ are activated. Mbl plays a role in cell shape determination (Henriques et al., 1998; Jones et al., 2001), TuaG and F are required for the synthesis of teichuronic acid (Soldo et al., 1999), and CwlH and J are cell wall hydrolases. It appears that the K-state is accompanied by a reprogramming of cell shape, cell division and cell wall synthesis genes.
A minority of the cells in a given population reach the K-state, and these cells are arrested in cell division and growth (Haijema et al., 2001). The reversal of this growth inhibition requires at least the degradation of ComK (Turgay et al., 1998). In this sense, the K-state appears to be in some respects a resting state and is associated with the induction of a number of genes (exoA, radC, recA, ssb, topA, maf and dinB) that are likely to be involved in DNA repair. The arrest of cell division and growth may be an advantage if the K-state has evolved in part to deal with DNA damage, or if DNA repair is required after transformation, as growth in the presence of DNA lesions may be detrimental. In E. coli, the SulA protein is induced as part of the SOS regulon and inhibits cell division (Bi and Lutkenhaus, 1993), presumably until DNA damage has been repaired. Several genes that are activated in the K-state might facilitate the assimilation of novel nutritional sources. These include malL, sucD, yoxD and ycgS and the putative transport genes pbuX, yckA, yckB, ycbN, ywfF, yvrO, yvrN, yvrM, yqiX, ywoG and yvrP. Several of these transport proteins, in particular ywfF and ywoG, might function instead as detoxifying efflux pumps. In this connection, it is worth mentioning some additional ComK-dependent genes. oxdC encodes an acid-induced oxalate dehydrogenase (Tanner and Bornemann, 2000), which has been suggested to play a role in pH homeostasis in response to acid stress. hxlA and hxlB encode enzymes of the ribulose monophosphate pathway (Yasueda et al., 1999), and hxlR encodes a positive activator of their expression. It was suggested that this pathway is involved in the detoxification of formaldehyde. ComK induces all three of these genes. Additional stress response genes that are apparently upregulated in response to ComK include groES and possibly yqxD. Finally, two genes are likely to be involved in the synthesis of antibiotics (sboA and cypC; Hosono and Suzuki, 1983; Matsunaga et al., 1999), which may serve to eliminate competitors.
In conclusion, we propose that the K-state is a global adaptation to stress, distinct from sporulation, which enables the cell to repair DNA damage, to acquire new fitness-enhancing genes by transformation, to use novel substrates (possibly including DNA; Redfield, 1993; Finkel and Kolter, 2001) and to detoxify environmental poisons. This view of the K-state suggests a reason for its expression in only a fraction of the cells in a given population. The K-state represents a specialized strategy for dealing with danger, but also carries with it inherent risks. Transient arrest of growth and cell division confer vulnerability to overgrowth by competing populations, and transformability opens the cell to invasion by foreign DNA. The genome may therefore activate alternative systems in subpopulations to deal with adversity, and the K-state may be one such system. This strategy maximizes the probability that the genome will survive when faced with changing environments, a valuable capability for a soil-dwelling organism.
It has been known for many years (Nester and Stocker, 1963; Hadden and Nester, 1968; Haseltine-Cahn and Fox, 1968) that competence in the domesticated laboratory strains of B. subtilis, is expressed in 10–20% of the cells in a given culture (Fig. 2B). In natural isolates of B. subtilis, the fraction of cells expressing competence is markedly lower than this, presumably because these strains have not been artificially selected for high transformability. In one such isolate, only about 1% of the cells express a comK–gfp fusion, but in these rare cells, expression is at a high level (J. Hahn, H. Maamar and D. Dubnau, unpubl.) This dramatic example of population heterogeneity may have evolved so that few cells in a clone will commit to a particular fitness-enhancing strategy. As the prolonged semidormancy that accompanies the K-state (Haijema et al., 2001) poses a potential challenge to survival, this strategy serves to minimize risks to the genotype. If, on the other hand, the few cells expressing the K-state happen to enjoy an advantage, the chances that the genotype will survive will be enhanced. Presumably the heterogeneity mechanism has evolved to maximize the benefit-to-risk ratio. There may be many examples of population heterogeneity selected by evolution in single celled organisms (see for instance Balaban et al., 2004), and an understanding of the mechanisms that regulate this heterogeneity would be of general interest.
rec-2 weirdness
The protein encoded by the H. influenzae rec-2 locus is needed to transport DNA across the cell's inner membrane, from the periplasm to the cytoplasm. Rec-2 homologs play similar roles in other gram-negative bacteria and in gram-positive bacteria, where they are need for transport across the homologous cytoplasmic membrane. However a H. influenzae rec-2 knockout also has unexpected pleiotropic effects; before considering the old evidence for these we need to consider the bizarre history of this mutation. The original isolation is reported in a 1971 paper by Ken Beattie and Jane Setlow (Nature New Biology 231:177-179)
Setlow had originally isolated the rec-1 mutant of H. influenzae. This strain has a mutation in the recA homolog; it takes up DNA normally but is unable to recombine it into its chromosome. Like other recA mutants it is also very sensitive to DNA damage. I think the mutant was isolated after nitrosoguanidine treatment of Rd cells and was identified by its sensitivity to UV irradiation; its strain name is DB117.
She wanted to find other mutants unable to transform. Because she had recently discovered that many H. influenzae cells died after taking up DNA from the related H. parainfluenzae, she decided to use this as a way to select for cells that couldn't take up DNA. So she and Ken Beattie repeatedly gave competent H. influenzae DNA from H. parainfluenzae, isolated the survivors, made them competent and gave them more of the same toxic DNA. (She later discovered that the 'toxicity' arises because heterologous DNA induces the SOS response to DNA damage, which induces a resident prophage that kills the cells.) However, even 20 repetitions of this treatment produced a population of cells with only a slight transformation defect.
So (I don't know why) they tried again, this time pretreating the cells by exposing them once to DNA from her rec-1 mutant, followed by 20 cycles of exposure to H. parainfluenzae DNA. Surprisingly (to me), this treatment gave populations with 1000-fold reductions in transformation. The pretreatment with rec-1 DNA was almost as effective as mutagenesis with nitrosoguanidine to 60% survival.
They then tested single colony isolates from these populations for DNA uptake and transformation. Almost all of them did not take up detectable amounts of DNA, but a few took up as much DNA as normal cells but did not produce any transformants. None of the 8 mutants isolated from the population treated with rec-1 mutant DNA had the DNA repair defects of the rec-1 mutant. (Note that these mutants are very likely to be multiple descendants of a single original mutant.)
Because of this they erroneously but only temporarily concluded that the rec-1 mutant's two defects (in transformational recombination and in DNA repair) were due to different mutations. They seem to have also invoked another mutation, mex, causing sensitivity to the DNA-damaging chemical MMS. They suggested that the new mutant (one of the 8) had this mex mutation as well as another mutation that prevented recombination, perhaps acquired from the H. parainfluenzae DNA it had been repeatedly exposed to. They initially called this mutant Rd(DB117)^rec but later simply called it rec-2.
They and others did a lot of work on the phenotype of this rec-2 mutant. They found that the mutant was a bit sensitive to MMS (attributed to the somewhat-hypothetical mex mutation). It took up DNA into a state that was resistant to externally added DNaseI and to the restriction enzymes known to be in the cytoplasm. This state was originally thought to be inside the vesicles then called 'transformasomes' (see this post about these) but we're now pretty sure it's just the periplasm. Making the mutant competent for DNA uptake across the outer membrane did not increase the ability of the cells to support phage recombination (see this post) as it did for wildtype cells. Competent cells of the rec-2 strain did not develop the single-strand DNA gaps detected in wildtype cells.
However interpretations were always confounded by uncertainty about its genotype. Did it carry a mex mutation (whatever that might be)? Did it contain any other DNA from the rec-1 strain? Did it contain any segments of H. parainfluenzae DNA? Did it have any loss-of-function mutations?
In 1989 Dave McCarthy tried to sort this mess out (McCarthy Gene 75:135-143). He isolated a transformation-preventing miniTn10kan insertion into a H. influenzae gene, and showed that a plasmid carrying the wild-type version of this gene restored transformability to Setlow's rec-2 mutant. By probing Southern blots with the cloned gene he showed that Setlow's rec-2 mutant contains a large rearrangement (later identified as a ~80kb insertion) in this gene. I'll call his rec-2:: miniTn10kan mutant rec-2*. This mutant had the same DNA uptake defect as Setlow's mutant.
With Doris Kupfer he then characterized the phenotype of the rec-2* rec-2, it took up DNA but could not mutant. Like Setlow'stranslocate it across the inner membrane. It was also just as defective in phage recombination, and examination of its DNA by electron microscopy showed that competence induction did not cause the increase in single-strand gaps or tails seen in wildtype cells.
The DNA translocation defect is consistent with the phenotypes of rec-2 homolog mutants in other bacteria. But the phage recombination and single-strand gap differences make no sense to me.
Setlow had originally isolated the rec-1 mutant of H. influenzae. This strain has a mutation in the recA homolog; it takes up DNA normally but is unable to recombine it into its chromosome. Like other recA mutants it is also very sensitive to DNA damage. I think the mutant was isolated after nitrosoguanidine treatment of Rd cells and was identified by its sensitivity to UV irradiation; its strain name is DB117.
She wanted to find other mutants unable to transform. Because she had recently discovered that many H. influenzae cells died after taking up DNA from the related H. parainfluenzae, she decided to use this as a way to select for cells that couldn't take up DNA. So she and Ken Beattie repeatedly gave competent H. influenzae DNA from H. parainfluenzae, isolated the survivors, made them competent and gave them more of the same toxic DNA. (She later discovered that the 'toxicity' arises because heterologous DNA induces the SOS response to DNA damage, which induces a resident prophage that kills the cells.) However, even 20 repetitions of this treatment produced a population of cells with only a slight transformation defect.
So (I don't know why) they tried again, this time pretreating the cells by exposing them once to DNA from her rec-1 mutant, followed by 20 cycles of exposure to H. parainfluenzae DNA. Surprisingly (to me), this treatment gave populations with 1000-fold reductions in transformation. The pretreatment with rec-1 DNA was almost as effective as mutagenesis with nitrosoguanidine to 60% survival.
They then tested single colony isolates from these populations for DNA uptake and transformation. Almost all of them did not take up detectable amounts of DNA, but a few took up as much DNA as normal cells but did not produce any transformants. None of the 8 mutants isolated from the population treated with rec-1 mutant DNA had the DNA repair defects of the rec-1 mutant. (Note that these mutants are very likely to be multiple descendants of a single original mutant.)
Because of this they erroneously but only temporarily concluded that the rec-1 mutant's two defects (in transformational recombination and in DNA repair) were due to different mutations. They seem to have also invoked another mutation, mex, causing sensitivity to the DNA-damaging chemical MMS. They suggested that the new mutant (one of the 8) had this mex mutation as well as another mutation that prevented recombination, perhaps acquired from the H. parainfluenzae DNA it had been repeatedly exposed to. They initially called this mutant Rd(DB117)^rec but later simply called it rec-2.
They and others did a lot of work on the phenotype of this rec-2 mutant. They found that the mutant was a bit sensitive to MMS (attributed to the somewhat-hypothetical mex mutation). It took up DNA into a state that was resistant to externally added DNaseI and to the restriction enzymes known to be in the cytoplasm. This state was originally thought to be inside the vesicles then called 'transformasomes' (see this post about these) but we're now pretty sure it's just the periplasm. Making the mutant competent for DNA uptake across the outer membrane did not increase the ability of the cells to support phage recombination (see this post) as it did for wildtype cells. Competent cells of the rec-2 strain did not develop the single-strand DNA gaps detected in wildtype cells.
However interpretations were always confounded by uncertainty about its genotype. Did it carry a mex mutation (whatever that might be)? Did it contain any other DNA from the rec-1 strain? Did it contain any segments of H. parainfluenzae DNA? Did it have any loss-of-function mutations?
In 1989 Dave McCarthy tried to sort this mess out (McCarthy Gene 75:135-143). He isolated a transformation-preventing miniTn10kan insertion into a H. influenzae gene, and showed that a plasmid carrying the wild-type version of this gene restored transformability to Setlow's rec-2 mutant. By probing Southern blots with the cloned gene he showed that Setlow's rec-2 mutant contains a large rearrangement (later identified as a ~80kb insertion) in this gene. I'll call his rec-2:: miniTn10kan mutant rec-2*. This mutant had the same DNA uptake defect as Setlow's mutant.
With Doris Kupfer he then characterized the phenotype of the rec-2* rec-2, it took up DNA but could not mutant. Like Setlow'stranslocate it across the inner membrane. It was also just as defective in phage recombination, and examination of its DNA by electron microscopy showed that competence induction did not cause the increase in single-strand gaps or tails seen in wildtype cells.
The DNA translocation defect is consistent with the phenotypes of rec-2 homolog mutants in other bacteria. But the phage recombination and single-strand gap differences make no sense to me.
Do competent cells have weird DNA?
One of the postdocs suggested we work our way through the old H. influenzae competence literature, so we've been meeting more-or-less weekly to do that. We pick a time interval (e.g. 1975-79) and decide which papers look like they deserve serious attention.
Last time we considered two papers from Jane Setlow's lab, one reporting that DNA of competent cells contains single-stranded regions, and one analyzing single-stranded regions that appear in the DNA such cells take up. This reminded me of a more recent paper from David McCarthy (1987), and of some experiments I did. Here I'll psot about the competent-cell DNA issue. Later I'll post about the strandedness of incoming DNA,a nd about the weird history and behaviour of rec-2 mutants.
The Setlow research was done in the mid-1970s. This was before agarose gels came into general use, and they analyzed the sizes of DNA fragments using sedimentation in sucrose density gradients (big fragments are rapidly pushed to the bottom while small fragments move only partway down the gradient). In the first paper they used gradients containing NaOH to separate the strands of the DNA, and pulse-chase labeling with 3H to identify newly synthesized segments. They found that newly synthesized DNA from competent cells contained many more short single-stranded segments than DNA from log-phase cells. They also used columns of BND-cellulose, which DNA with single-stranded regions should stick to. More competent-cell DNA stuck than log-phase cell DNA. The confirmed that this DNA was enriched for single-stranded regions by digesting it with the nuclease SI, which preferentially cuts single-stranded segments. And they used CsCl 'isopycnic' density gradients to confirm that the strands were indeed newly synthesized.
But there are good reasons why sucrose gradients were discarded once agarose gels became available. These experiments have very poor resolution and lots of artefacts, and it's hard for me to understand what they showed. The very existence of newly synthesized strands in competent-cell DNA may be an artefact...
The McCarthy paper used a different technique, electron microscopy, to directly compare the structures of DNA from log-phase and competent cells. They found DNA from competent cells to contain more single-stranded regions and single-stranded tails. They also used cross-linking of DNA to prevent branch migration. So their results confirmed Setlow's interpretation of the sucrose gradient results.
Although we might expect cells to develop some aberrant DNA structures after having been abruptly transferred from a rich replication-supporting medium to a starvation medium lacking DNA precursors, the presence of gaps and tails is a bit surprising. At that time I was a post-doc in Ham Smith's lab and had been doing a lot of work with pulsed-field agarose gels that nicely resolved very large fragments of chromosomal DNA. So I devised some experiments to look for evidence of these gaps and tails.
The first experiments used a nuclease from mung beans. This is like S1 nuclease in preferentially cutting single-stranded DNA, but is less likely to also cut double-stranded DNA. These experiments showed two things. First, DNA from competent cells is no more sensitive to mung bean nuclease than is DNA from log phase cells. Second, DNA from competent cells is much more sensitive than log-phase DNA to being heated in the presence of the buffer used for the nuclease digestions. This buffer contains zinc and is at pH 5; I had been heating the DNA preps before loading them into the gels because the DNA had been prepared from cells embedded in low-melting-point agarose to protect it from shearing. The DNAs were all fine if I had simply put slices of this DNA-in-agarose into the wells of the pulsed-field gels, but if I had instead melted the slices at 65C and pipetted them into the wells the competent-cell DNA bands looked very faint. This effect was independent of any muclease treatment.
My second experiments used a DNA polymerase to find out whether DNAs from log-phase and competent cells had different numbers of gaps and tails. The 'Klenow' fragment of DNA polymerase can replicate single-stranded DNA in vitro, provided that adjacent double-stranded DNA has a 3' end that can serve as a primer. So chromosomal DNA with a single-stranded gap is a perfect template. I incubated my log-phase and competent cell DNAs with Klenow polymerase and precursor nucleotides (including 32P-dGTP), and ran them in a pulsed-field gel. The two DNAs incorporated the same amount of radioactivity and gave the same patterns in the gel. As a control I used DNA from cells that had been treated with chloramphenicol - this protein-synthesis inhibitor lets cells complete any DNA replication they have initiated but blocks initiation of new rounds of replication. This DNA incorporated only about 1/3 as much radioactivity.
So neither experiment provided any support for the presence of frequent single-strand gaps or tails in the DNA of competent cells. The sensitivity to the nuclease buffer did indicate there there is something funny about this DNA. I was left wondering whether the DNA might have incorporated ribonucleotides or other non-standard subunits. Because ribonucleotides are sensitive to alkali, segments containing them would become gaps if the DNA was denatured with NaOH. But this wouldn't explain McCarthy's results.
The other weird thing about both McCarthy's and Setlow's results was that they found DNA from 'competent' cells of the non-transformable mutant rec-2 to be like DNA from log phase cells. I'll do a separate post about this.
Last time we considered two papers from Jane Setlow's lab, one reporting that DNA of competent cells contains single-stranded regions, and one analyzing single-stranded regions that appear in the DNA such cells take up. This reminded me of a more recent paper from David McCarthy (1987), and of some experiments I did. Here I'll psot about the competent-cell DNA issue. Later I'll post about the strandedness of incoming DNA,a nd about the weird history and behaviour of rec-2 mutants.
The Setlow research was done in the mid-1970s. This was before agarose gels came into general use, and they analyzed the sizes of DNA fragments using sedimentation in sucrose density gradients (big fragments are rapidly pushed to the bottom while small fragments move only partway down the gradient). In the first paper they used gradients containing NaOH to separate the strands of the DNA, and pulse-chase labeling with 3H to identify newly synthesized segments. They found that newly synthesized DNA from competent cells contained many more short single-stranded segments than DNA from log-phase cells. They also used columns of BND-cellulose, which DNA with single-stranded regions should stick to. More competent-cell DNA stuck than log-phase cell DNA. The confirmed that this DNA was enriched for single-stranded regions by digesting it with the nuclease SI, which preferentially cuts single-stranded segments. And they used CsCl 'isopycnic' density gradients to confirm that the strands were indeed newly synthesized.
But there are good reasons why sucrose gradients were discarded once agarose gels became available. These experiments have very poor resolution and lots of artefacts, and it's hard for me to understand what they showed. The very existence of newly synthesized strands in competent-cell DNA may be an artefact...
The McCarthy paper used a different technique, electron microscopy, to directly compare the structures of DNA from log-phase and competent cells. They found DNA from competent cells to contain more single-stranded regions and single-stranded tails. They also used cross-linking of DNA to prevent branch migration. So their results confirmed Setlow's interpretation of the sucrose gradient results.
Although we might expect cells to develop some aberrant DNA structures after having been abruptly transferred from a rich replication-supporting medium to a starvation medium lacking DNA precursors, the presence of gaps and tails is a bit surprising. At that time I was a post-doc in Ham Smith's lab and had been doing a lot of work with pulsed-field agarose gels that nicely resolved very large fragments of chromosomal DNA. So I devised some experiments to look for evidence of these gaps and tails.
The first experiments used a nuclease from mung beans. This is like S1 nuclease in preferentially cutting single-stranded DNA, but is less likely to also cut double-stranded DNA. These experiments showed two things. First, DNA from competent cells is no more sensitive to mung bean nuclease than is DNA from log phase cells. Second, DNA from competent cells is much more sensitive than log-phase DNA to being heated in the presence of the buffer used for the nuclease digestions. This buffer contains zinc and is at pH 5; I had been heating the DNA preps before loading them into the gels because the DNA had been prepared from cells embedded in low-melting-point agarose to protect it from shearing. The DNAs were all fine if I had simply put slices of this DNA-in-agarose into the wells of the pulsed-field gels, but if I had instead melted the slices at 65C and pipetted them into the wells the competent-cell DNA bands looked very faint. This effect was independent of any muclease treatment.
My second experiments used a DNA polymerase to find out whether DNAs from log-phase and competent cells had different numbers of gaps and tails. The 'Klenow' fragment of DNA polymerase can replicate single-stranded DNA in vitro, provided that adjacent double-stranded DNA has a 3' end that can serve as a primer. So chromosomal DNA with a single-stranded gap is a perfect template. I incubated my log-phase and competent cell DNAs with Klenow polymerase and precursor nucleotides (including 32P-dGTP), and ran them in a pulsed-field gel. The two DNAs incorporated the same amount of radioactivity and gave the same patterns in the gel. As a control I used DNA from cells that had been treated with chloramphenicol - this protein-synthesis inhibitor lets cells complete any DNA replication they have initiated but blocks initiation of new rounds of replication. This DNA incorporated only about 1/3 as much radioactivity.
So neither experiment provided any support for the presence of frequent single-strand gaps or tails in the DNA of competent cells. The sensitivity to the nuclease buffer did indicate there there is something funny about this DNA. I was left wondering whether the DNA might have incorporated ribonucleotides or other non-standard subunits. Because ribonucleotides are sensitive to alkali, segments containing them would become gaps if the DNA was denatured with NaOH. But this wouldn't explain McCarthy's results.
The other weird thing about both McCarthy's and Setlow's results was that they found DNA from 'competent' cells of the non-transformable mutant rec-2 to be like DNA from log phase cells. I'll do a separate post about this.
Controlling mutagenesis
I've made some more progress on the problem of how to score USSs in simulated DNA fragments, and written it up in our work-in-progress document of how our model works. But here I want to get back to thinking of how mutagenesis can be set up to allow control of both the base composition and the ratio of transition mutations to transversion mutations. The problem was introduced in a previous post.
I have a printout of the analysis done for us by my biomathematician colleague. I'll try to summarize here what I think it says, and then I'll ask her to look at this post and tell me if I've got it wrong.
Before our program does any mutagenesis it will need to first calculate the values of three parameters, alpha, beta and gamma. (spelled out because Blogger doesn't do Greek letters.) These parameters specify the rates at which the specific bases mutate to each of the other three bases, as indicated in the table ("Blaisdell's 1985 mutation matrix"). In our model we will normalize these by dividing by their sums, as indicated below the table, so we can use them as probabilities.
The values these parameters will take depend on the three parameters we give to the model; these are described in the blue box. The formulas in the green box were derived by our colleague - we will write Perl code that uses these to calculate alpha, beta and gamma from G, µ and R. The program will then put these values into the table. Then, at each mutation step, the program will determine whether the mutating base is an A, T, C or G, and look up the appropriate probabilities of bases it can mutate to.
If we were to begin the simulation with a genome that did not have the desired base composition, it would also not initially have the desired ratio of transitions to transversions. If no opposing forces were acting, the base composition and ratio would equilibrate at the desired values. This should not be an issue for us because our genomes will be created with the desired base compositions.
I have a printout of the analysis done for us by my biomathematician colleague. I'll try to summarize here what I think it says, and then I'll ask her to look at this post and tell me if I've got it wrong.
Before our program does any mutagenesis it will need to first calculate the values of three parameters, alpha, beta and gamma. (spelled out because Blogger doesn't do Greek letters.) These parameters specify the rates at which the specific bases mutate to each of the other three bases, as indicated in the table ("Blaisdell's 1985 mutation matrix"). In our model we will normalize these by dividing by their sums, as indicated below the table, so we can use them as probabilities.
The values these parameters will take depend on the three parameters we give to the model; these are described in the blue box. The formulas in the green box were derived by our colleague - we will write Perl code that uses these to calculate alpha, beta and gamma from G, µ and R. The program will then put these values into the table. Then, at each mutation step, the program will determine whether the mutating base is an A, T, C or G, and look up the appropriate probabilities of bases it can mutate to.
If we were to begin the simulation with a genome that did not have the desired base composition, it would also not initially have the desired ratio of transitions to transversions. If no opposing forces were acting, the base composition and ratio would equilibrate at the desired values. This should not be an issue for us because our genomes will be created with the desired base compositions.
How should our USS model score USS sequences?
Our computer simulation model of USS evolution is coming together, but we've hit a snag. I'm hoping that trying to explain it here will give me a clearer understanding of how to get around it.
A critical step in the model is scoring each simulated DNA fragment for USS-like sequences; the score then determines the probability that this fragment will replace its homologous sequence in the evolving genome. Our basic plan is to use a 'sliding window' the width of the USS motif we're using. The window would begin at position 1 of the fragment sequence, score the sequence in the window, and then move over one position, score again, and continue until it reached the end of the fragment. The final score would be the sum of the scores of the sequences at each window position. I expected this to be time consuming ('computationally intensive') but straightforward, but I was wrong.
The simplest scoring scheme would be to just check, at each window position, whether the sequence in the window exactly matches the USS consensus. (For simplicity here I'll consider only the standard 9bp USS core AAGTGCGGT, but I want the model to consider the full 29bp USS motif.) If the sequence in the window exactly matches the core we add 1 to the running score, otherwise we don't. The final score would then tell us the number of perfect-match USSs in the fragment. How the model would use this number to decide the probability of uptake is still to be decided, but I'll defer this problem to another post.
A slightly more subtle scheme would give partial scores for window positions containing sequences that nearly match the perfect USS core consensus. For example, any sequence that was mismatched at only 1 of the 9 positions could score 0.3, and any that were mismatched at 2 positions would score 0.1. Worse mismatches would score 0.
Both of the above simple schemes would work, but they do so because they ignore differences in the importances of different components of the USS, and in the tolerance for specific alternative vases at different positions. I had been planning to use a scoring matrix giving a value to each base at each position. Here's a very simple version of such a matrix, with the preferred base at each position scoring 1 and the other bases scoring 0. With this matrix a prefect USS core in the window would score 9, a singly mismatched USS would score 8, a doubly mismatched USS would score 7, etc.
Even a sequence that matched the USS consensus at only 1 position would get a non-zero score - it would be 1/9 of the score of a perfect USS. This creates two kinds of problems. First, we want only reasonably USS-like sequences to get significant scores, but under this scheme a random 9bp sequence will, on average, get a score of 2.5. Second, because the window evaluates 1000 positions in a 1kb sequence, the average score of a random sequence will be about 2500, and the average score of a fragment containing a perfect USS will be about 2506.5. The scoring scheme thus is far too weak in its ability to discriminate between fragments with and without good matches to the USS. A bit of thinking shows that it doesn't matter how big or small we make the individual numbers in the matrix.
But maybe I see the solution. What if only the best base at each position has a positive value in the scoring matrix, and the other bases have negative values? We'd want to adjust the negative values so that the average random sequence would get a score of zero. I still see lots of potential problems here, but maybe this is the way to go.
Added a bit later: There's a much simpler solution. Use any matrix we like, and just calculate the average score expected for random sequences, and subtract this from the actual score for each window position. The calculation is simple - just add up all the fractional scores for the different bases, remembering to correct for base composition. And rather than doing this correction at each window position, do it after the window has completed scanning all the positions in the fragment (subtract the product of the expected average score for each window position times the number of window positions scored).
For random sequences this should give fragment scores centered on zero. I don't know how broad the distribution would be, so I don't know how strongly a single USS would shift the distribution. We would like it to move the score out beyond almost all of the random fragments.
One other problem is what to do with negative scores. For random sequences these should be as common as positive scores, but if we simply treat them as zero scores then the effective average score will be half of the mean of the positive scores. Maybe we should subtract a number bigger than the expected average score, and treat all negative scores as zero.
A critical step in the model is scoring each simulated DNA fragment for USS-like sequences; the score then determines the probability that this fragment will replace its homologous sequence in the evolving genome. Our basic plan is to use a 'sliding window' the width of the USS motif we're using. The window would begin at position 1 of the fragment sequence, score the sequence in the window, and then move over one position, score again, and continue until it reached the end of the fragment. The final score would be the sum of the scores of the sequences at each window position. I expected this to be time consuming ('computationally intensive') but straightforward, but I was wrong.
The simplest scoring scheme would be to just check, at each window position, whether the sequence in the window exactly matches the USS consensus. (For simplicity here I'll consider only the standard 9bp USS core AAGTGCGGT, but I want the model to consider the full 29bp USS motif.) If the sequence in the window exactly matches the core we add 1 to the running score, otherwise we don't. The final score would then tell us the number of perfect-match USSs in the fragment. How the model would use this number to decide the probability of uptake is still to be decided, but I'll defer this problem to another post.
A slightly more subtle scheme would give partial scores for window positions containing sequences that nearly match the perfect USS core consensus. For example, any sequence that was mismatched at only 1 of the 9 positions could score 0.3, and any that were mismatched at 2 positions would score 0.1. Worse mismatches would score 0.
Both of the above simple schemes would work, but they do so because they ignore differences in the importances of different components of the USS, and in the tolerance for specific alternative vases at different positions. I had been planning to use a scoring matrix giving a value to each base at each position. Here's a very simple version of such a matrix, with the preferred base at each position scoring 1 and the other bases scoring 0. With this matrix a prefect USS core in the window would score 9, a singly mismatched USS would score 8, a doubly mismatched USS would score 7, etc.
Even a sequence that matched the USS consensus at only 1 position would get a non-zero score - it would be 1/9 of the score of a perfect USS. This creates two kinds of problems. First, we want only reasonably USS-like sequences to get significant scores, but under this scheme a random 9bp sequence will, on average, get a score of 2.5. Second, because the window evaluates 1000 positions in a 1kb sequence, the average score of a random sequence will be about 2500, and the average score of a fragment containing a perfect USS will be about 2506.5. The scoring scheme thus is far too weak in its ability to discriminate between fragments with and without good matches to the USS. A bit of thinking shows that it doesn't matter how big or small we make the individual numbers in the matrix.
But maybe I see the solution. What if only the best base at each position has a positive value in the scoring matrix, and the other bases have negative values? We'd want to adjust the negative values so that the average random sequence would get a score of zero. I still see lots of potential problems here, but maybe this is the way to go.
Added a bit later: There's a much simpler solution. Use any matrix we like, and just calculate the average score expected for random sequences, and subtract this from the actual score for each window position. The calculation is simple - just add up all the fractional scores for the different bases, remembering to correct for base composition. And rather than doing this correction at each window position, do it after the window has completed scanning all the positions in the fragment (subtract the product of the expected average score for each window position times the number of window positions scored).
For random sequences this should give fragment scores centered on zero. I don't know how broad the distribution would be, so I don't know how strongly a single USS would shift the distribution. We would like it to move the score out beyond almost all of the random fragments.
One other problem is what to do with negative scores. For random sequences these should be as common as positive scores, but if we simply treat them as zero scores then the effective average score will be half of the mean of the positive scores. Maybe we should subtract a number bigger than the expected average score, and treat all negative scores as zero.
Modeling USS evolution
Lately we've been working on our new computer simulation model of USS evolution.
I asked my favourite mathematical colleague about how to model mutation so that (1) transitions occurred at a different frequency than transversions, and (2) a desired base composition was maintained. She whipped out Wen-Hsiung Li's Molecular Evolution book and opened it to a page of equations describing various mathematical models of mutation that can be used to infer the evolutionary history of DNA sequences. These models include up to five different parameters (5 different Greek letters!), depending on how many independent factors are included in the model.
The equations accomplish the reverse of what we want, but she confidently offered to solve the most appropriate equations for us in a way that would let us use the desired final base composition of our sequence to calculate the values of the mutation parameters our program, and within 30 minutes she'd emailed the solutions to me. I think she used a program called Mathematica rather than mysterious mathematical superpowers to get the solutions. I still have to work through what she's sent, to see how our program will best use it.
p.s. The cells are once again growing normally in new batches of our usual medium. Unfortunately we still don't know the cause(s) of our recent problems.
I asked my favourite mathematical colleague about how to model mutation so that (1) transitions occurred at a different frequency than transversions, and (2) a desired base composition was maintained. She whipped out Wen-Hsiung Li's Molecular Evolution book and opened it to a page of equations describing various mathematical models of mutation that can be used to infer the evolutionary history of DNA sequences. These models include up to five different parameters (5 different Greek letters!), depending on how many independent factors are included in the model.
The equations accomplish the reverse of what we want, but she confidently offered to solve the most appropriate equations for us in a way that would let us use the desired final base composition of our sequence to calculate the values of the mutation parameters our program, and within 30 minutes she'd emailed the solutions to me. I think she used a program called Mathematica rather than mysterious mathematical superpowers to get the solutions. I still have to work through what she's sent, to see how our program will best use it.
p.s. The cells are once again growing normally in new batches of our usual medium. Unfortunately we still don't know the cause(s) of our recent problems.
Subscribe to:
Posts (Atom)