Field of Science

Why GTA genes can't be maintained by 'selfish' transmission

Below is the line of reasoning showing that the genes responsible for producing GTA particles cannot maintain themselves or spread into new populations by GTA-mediated transfer of themselves into new cells.  I initially worked this out with a rigorous set of mathematical equations, but then realized that the problem was so glaringly obvious that math isn't needed.

The main GTA gene cluster is too big to fit inside a single GTA particle, so GTA particles can't transmit DNA that converts a GTA- cell into a GTA+ cell.  Some genes outside the main cluster are also required for GTA production.


But GTA particles can (and do) contain one or more individual GTA genes.  If a fragment containing a particular GTA gene is injected into a formerly-GTA+ cell that is now GTA- because it has a mutated version of this gene, the resulting recombination can restore the cell's original GTA+ genotype.

But these transfer events would not allow GTA+ cells to invade a GTA- population, or to maintain themselves in the face of loss of GTA function by mutation.  That's true for all known GTA systems, even in the simplest (imaginary) case where production of GTA particles requires only a single gene that could easily fit into a GTA particle, as illustrated below.  

Why?  Three factors together require that production of GTA particles reduces the total number of GTA+ cells in the population:

Problem 1:  GTA particles can only be released to the environment if the GTA+ producer cell lyses.  So each production event removes one GTA+ cell from the population.

Problem 2:  The GTA genes in the producer cell are not over-replicated as a phage genome would be, so each production event can produce at most one G+ particle (containing the GTA gene or cluster).  

If all steps occurred with 100% efficiency, problems 1 and 2 would allow, at best, replacement of the lost GTA+ cell with a new one created by GTA-mediated recombination.  But this would not maintain the numbers of GTA+ cells in the face of occasional loss of GTA genes by mutation or deletion.  Nor would it allow GTA+ cells to invade a GTA- population.

Problem 3:   Production of GTA particle production, transmission of their DNA to recipient cells, and recombination with the recipient genome are all likely to be at least moderately inefficient.  Here's a partial list of expected inefficiencies:
  1. Burst size:  Actual burst sizes are unknown, but packaging all the DNA in a R.capsulatus. genome would need 841 particles, which is much larger than typical burst sizes for DNA phages.  Capsid proteins may be limiting, since they would be produced from single-copy GTA genes rather than replicated phage genomes.
  2. Dispersion:  The GTA particles will disperse in the environment, and many will probably not find cells to attach to.
  3. Stability:  Lab preps of GTA particles are unstable in non-optimal storage conditions, so many particles will likely fall apart.
  4. Recombination efficiency:  Only one DNA strand enters the cytoplasm, and some DNA degradation is likely.  The highest observed transduction frequency is only ~4^-4, (theor. max: 1.2^-3) so recombination efficiency is probably only ~0.3.  Recombining in a novel gene will be less efficient than simple strand replacement
  5. Self-conversion:  Some G+ particles may attach to cells that are already GTA+.

Might GTA be a vaccination system for infecting phages?

My work at Dartmouth (to be described in upcoming posts) showed conclusively that genes encoding Gene Transfer Agents (such as the GTA system of Rhodobacter capsulatus) cannot be maintained by 'selfish' transfer of either whole GTA gene clusters or single GTA genes into GA- recipients.  Neither can the GTA genes be maintained by general recombination benefits that can arise when fragments of chromosomal DNA are transferred into new cells.  So, although 'gene transfer agent' does accurately describe one activity of these genes, it cannot be the activity for which they are selected.


The main obstacle to the maintenance of GTA genes, which applies to all the benefits is that any GTA+ cell that actively produces GTA particles cells must die, since cell lysis is needed to release their particles into the environment.  Another obstacle, applying to selfish transfer, is that GTA genes are not over-replicated during GTA production (and are not preferentially packaged), so each cell death can produce only one GTA+ particle. 


I presented these results at the Analytical Genetics conference last week, and asked the other participants if they could think of alternative benefits of producing GTA particles.  Sanna Koskiniemi from Uppsala University made the very interesting suggestion that GTA particles could serve as a syringe, packaging DNA fragments from a phage that's infecting the producer cell and transferring these fragments into other as-yet-uninfected cells, where they could trigger development of CRISPR immunity.

I love this idea and want to test it.  It doesn't overcome the cell-death obstacle, but it does overcome the selfish-transfer obstacle since a single producer cell could produce many particles of phage DNA from a single phage genome, and more if the phage genome is replicated.


One way to see if this could provide sufficient benefits to maintain the GTA genes is by simulation modeling like that I used to examine the recombination benefits.  This could clairfy the important factors that would need to be examined.

Here I want to start considering experimental tests of this hypothesis.

The ideal test would be to infect the GTA-producing strain with a phage, preferably under low-growth conditions where phage infections are often abortive.  (Luckily R. capsulatus produces most of its GTA under such conditions.)  Then some recipient cultures would be exposed to the GTA-containing culture medium (and some not, as controls), and then all exposed to a lysate of the phage.

"But wait!", you say.  "Won't the GTA-containing culture medium also contain some phage?"  Yes, probably.  I don't think there's any way to inactivate the phage particles without also inactivating the GTA particles, or vice versa.  We might be able to come up with either perfectly-abortive infection conditions (where infected cells don't produce any phage), or a cellular mutation that prevents phage production.  If not, we might have to combine the GTA-exposure and phage-infection steps.

"And won't any phage lysate also contain some GTA particles?"  Yes, probably.  But we could use a GTA- mutant as the host for lysate production.  Not the mutant that can't lyse, but the one with the main GTA gene cluster completely deleted.

What resources are available for this project?  First I checked with my GTA colleagues, who confirm that R. capsulatus does have a CRISPR-Cas9 system.  Then I asked if there were any well-characterized phage systems able to infect R. capsulatus.  Until quite recently the answer would have been 'No', but a recent paper reported the isolation and sequences of 4 R. capsulatus phages.  A Mu-like phage of R. capsulatus has also been characterized, but it did not form plaques on SB1003.

The report about the 4 new phages used a different host strain (YW1-derived, not SB1003), so the first thing I'll need to do is check whether they form plaques on SB1003.  Then I'll need to play around with infection and plating conditions...  My idea of fun!

Model of GTA evolution by infectious transfer

Here's the description of my model addressing Explanation 1 for GTA persistence.  For now I've just pasted in the text of a Word file I prepared about 10 days ago.

A constant-population-size model of large-head GTA transmission
(Based on Xin Chen’s model, but with stepwise generations and without logistic growth.)

Assumptions:
The population:
1.     Population size is constant.  Loss of GTA+ cells due to lysis during GTA production is made up by growth of all cells after the transduction step.
2.     Dense, well-mixed culture in liquid medium (so cells frequently encounter GTA particles)
GTA production:
3.     GTA particles come in two sizes.  Small particles contain 4 kb DNA fragments.  The hypothetical large particles contain fragments that must be at least 14 kb (the size of the GTA gene cluster) but could be as big as 50 kb. 
4.     The number of GTA particles a cell produces does not depend on the proportion of small and large particles.
5.     DNA packaging by GTA is random; all parts of the cell’s genome are equally represented.  But in this model we only consider the particles containing the full-length GTA cluster.
6.     This is the killer:  If the cell’s chromosome is 5 MB and the large-particle capacity is 15 kb, only 2x10-4 of large particles will contain complete GTA gene clusters (will be G+ particles).  If we change the large-particle capacity to 20 kb, then about 1x10-3 of large particles will contain a complete cluster.  A 50 kb capacity and a 3 MB chromosome would probably get it up to about 10-2.  (And this ignores the recombination machinery’s need for homologous DNA flanking the GTA cluster to promote recombination.)
Transduction:
7.     GTA- cells completely lack the main GTA gene cluster.  They can only be converted to GTA+ by homologous recombination with GTA-containing DNA from G+ particles.
8.     GTA particles cannot tell the difference between GTA+ and GTA- recipients.  Particles capable of transducing GTA- cells to GTA+ can also ‘transduce’ GTA+ cells to GTA+.
9.     All GTA particles produced in one cycle are taken up by and transduce cells in that cycle.  (The efficiency of infection and recombination is 1.) 
10.  The model ignores large and small GTA particles that don’t transduce GTA+.
11.  Each cell takes up only one G+ particle (or none).  This is reasonable, since the number of G+ particles is always going to be much smaller than the number of cells.

Parameters:
F    Initial frequency of GTA+ cells (we want to consider a wide range)
c    Fraction of GTA+ cells producing GTA particles (and consequently lysing).  (In wildtype lab cultures this is <3 o:p="">
b    Number of GTA particles produced by each burst.  Default value is 100.  (We have no actual measurements.)
µ    Fraction of GTA particles that are large.  (We expect this fraction to be small, since large particles have not been observed.)
T    Fraction of large GTA particles that are G+ particles (able to transduce GTA).  (This is limited by genome size, GTA gene cluster size, and the DNA capacity of these hypothetical particles.  Plausible values are between 10-2 and 10-4.)
G   µ * T Fraction of GTA particles that contain complete GTA genes.

What happens in one generation:
GTA production and cell lysis:
N   Proportion of GTA particles to cells remaining in the medium after GTA+ cells have burst. 
      = (Fcb)/(1 – Fc)  (Note: Fcb is the GTA production per original cell.  1 – Fc normalizes this to the number of cells remaining after lysis.)
N+  Proportion of GTA particles, per remaining cell, that carry the complete GTA gene cluster (are ‘G+’ particles able to transduce the GTA-production genotype to GTA- cells). 
= NµT   =  NG
Fraction of surviving GTA+ cells per original cell (will be normalized to remaining cells later): = F(1 – c)
Transduction:
Fraction of GTA- cells transduced to GTA+: N+(1 – F).  {Note: the 1 – F corrects for the G+ particles that attach to and ‘transduce’ GTA+ cells.) 
Fraction of GTA+ cells (per original cell) after transduction:  F(1 – c) + N+(1 – F).  (Note: F(1 – c) removes cells killed by lysis, N+(1 – F) adds cells gained by transduction.)
Fraction of GTA- cells (per original cell) remaining after transduction:  (1 – F) – N+(1 – F).  (Note: 1 – F is the original fraction of GTA- cells, N+(1 – F) removes cells lost by transduction to GTA+.)
Cell growth:
Now we normalize the cell numbers to ‘per remaining cell’:
Total fraction of cells remaining after GTA production and transduction:
            1 – (Fc)  (Note: To normalize, divide the above cell fractions by this value.)
Fraction of GTA+ cells after one complete cycle:
F’ = F(1 – c) + N+(1 – F) / 1 – Fc

How to evaluate the change in the proportion of GTA+ cells?
We can expand N+ and pull out the F, then look at the before/after ratio:
F’   = F * (1 – c) + c * b * F * µ * T * (1 – F) / 1 – (F * c)
      = F * ((1 – c) + C * b * µ * T * (1 – F) / 1 – (F * c)

F’ / F = (1 – c) + c * b * µ * T * (1 – F) / 1 – (F * c)

When the value of this expression is greater than 1, GTA+ is increasing; when it is less than 1, GTA+ is decreasing.
For simplicity, below I combine b, µ & T as the compound variable W.

What happens if we vary F, holding everything else constant?
Increase of GTA+ depends only on W.  If W is >1, GTA+ increases.  If W is <1 decreases.="" gta="" o:p="">
The rate of change is very slow when F is close to 1 (when almost all cells are GTA+), and fast when F is close to 0 (when almost all cells are GTA-).
What happens if we vary c, holding everything else constant?
C affects how fast change happens, but not its direction.  If W>1, GTA+ still spreads; if W<1 decreases="" gta="" o:p="" still="">
What happens if we vary W, holding everything else constant?
If W<1 always="" be="" denominator.="" numerator="" o:p="" smaller="" than="" the="" will="">
If W>1, the numerator will always be smaller than the denominator.
In both cases., all the other parameters cancel out.  This confirms that the direction of selection o GTA+ depends only on whether W is higher or lower than 1.
Would the result change if the population were growing?
I don’t think so, since GTA+ and GTA- cells grow at the same rate.

Since plausible values of W are all much lower than 1, I conclude that GTA+ cells cannot increase by GTA-mediated transduction of GTA- cells to GTA+.

GTA could spread by transduction if it did preferentially package the GTA gene cluster into its particles.  Of course, then it would be a phage.
How the model’s assumptions affect this outcome:
Basically, all the assumptions are either neutral or increase the chance that GTA+ will spread. Making the simulation more realistic would just make things worse for GTA+, not better.
The population:
1.  Population size is constant.  Loss of GTA+ cells due to lysis during GTA production is made up by growth of all cells after the transduction step.
I don’t think adding growth would affect the outcome.
2.  Dense, well-mixed culture in liquid medium (so cells frequently encounter GTA particles).
If the culture were more dilute or poorly mixed, some GTA particles would not find new cells to attach to.  This would reduce the amount of transduction (effectively reducing W).
GTA production:
3.  GTA particles come in two sizes.  Small particles contain 4 kb DNA fragments.  The hypothetical large particles contain fragments that must be at least 14 kb (the size of the GTA gene cluster) but could be as big as 50 kb. 
This is the central assumption of the model.  The size of the small particles is known.  The hypothesized large particles could be as small as 15 kb (allows a bit of homologous sequence on each side of the cluster to promote recombination).  Phage capsids can in principle be very large, but it’s parsimonious to assume a modest size.
4.  The number of GTA particles a cell produces does not depend on the proportion of small and large particles.
Large capsids will require more capsid protein molecules.
5.  DNA packaging by GTA is random; all parts of the cell’s genome are equally represented.  But in this model we only consider the particles containing the full-length GTA cluster.
Experimental results show slightly less packaging of GTA sequences.  If this applies to the hypothetical large particles it would reduce production of G+ particles.  If particles preferentially package GTA, GTA would be a phage.
6.  This is the killer:  If the cell’s chromosome is 5 MB and the large-particle capacity is 15 kb, only 2x10-4 of large particles will contain complete GTA gene clusters (will be G+ particles).  If we change the large-particle capacity to 20 kb, then about 1x10-3 of large particles will contain a complete cluster.  A 50 kb capacity and a 3 MB chromosome would probably get it up to about 10-2.  (And this ignores the recombination machinery’s need for homologous DNA flanking the GTA cluster to promote recombination.)
See point 3 above.
Transduction:
7.  GTA- cells completely lack the main GTA gene cluster.  They can only be converted to GTA+ by G+ particles.
Transduction depends on homologous recombination.  Small GTA particles can transduce functional alleles of individual GTA genes, replacing versions that became mutated or even deleted in an ancestor of the recipient cell.  But they cannot introduce GTA genes into cells that completely lack the GTA cluster, because there will be no homologous sequences to recombine with.
8.  GTA particles cannot tell the difference between GTA+ and GTA- recipients.  Particles capable of transducing GTA- cells to GTA+ can also ‘transduce’ GTA+ cells to GTA+.
I think some phages and conjugative plasmids may be able to detect whether potential hosts/recipients already have the element, but we have no evidence that transduction frequencies differ between GTA+ and GTA- recipients.  Wall et al (1975) surveyed 33 strains and found wide variation in both GA production and transduction, but no correlation between these abilities.
9.  All GTA particles produced in one cycle are taken up by and transduce cells in that cycle.  (The efficiency of infection and recombination is 1.) 
This is unlikely to be true, but assuming this increases the chance that each G+ particle successfully transduces a GTA- cell to GTA+.
If we were to relax this assumption the model would need to include an explicit uptake process and to specify what happens to particles that are not taken up.
10. The model ignores large and small GTA particles that don’t transduce GTA+. 
This should be OK, since these should not interfere with transduction by G+ particles, especially because their total number per cell will be small. Removing this assumption would make GTA + spread less likely.
11. Each cell takes up only one G+ particle (or none). 
This is a reasonable assumption, since the number of G+ particles is always going to be much smaller than the number of cells.  If the number of G+ particles were high, sometimes two G+ particles might inject their DNAs into the same s=cell, which would reduce the efficiency of transduction.



-->

Thinking about Gene Transfer Agent

I'm at Dartmouth for three months, working with Olga Zhaxybayeva's group to improve our evolutionary understanding of Gene Transfer Agent.  I'm writing an R-script simulation of the genetic exchange it causes (finally learning R), but my control runs with epistasis don't give the expected results.  So I'm writing this post and creating a Powerpoint deck to clarify my thinking.

First, what's Gene Transfer Agent?  A number of different kinds of bacteria produce 'transducing particles' called Gene Transfer Agents.  These look line small phage capsids but they don't usually contain phage DNA; instead they contain random fragments of chromosomal DNA.  In the best-characterized GTA ('RcGTA'), these are all 4.4 kb in length, which appears to be the DNA capacity of the tiny GTA heads.  Like phage, GTA particles inject their DNA into recipient cells (usually of the same species), where it often recombines with the chromosome and can change the cell's genotype.



GTA particles aren't infectious like phages are, both because they don't preferentially package the DNA that encodes them and because their heads are too small to contain this DNA.  The RcGTA head and tail proteins are encoded by a 14 kb gene cluster.  The sequences and organization of these genes strongly resemble that of homologous phage genes, so the known GTA systems are generally thought to have descended from what were integrated prophages. 

In lab cultures of cells with the RcGTA genes (Rhodobacter capsulatus cells), GTA is produced mainly after exponential growth has ceased, and only produced by a small subset of cells. Like release of phage particles from infected cells, release of GTA requires lysis of the cell, and the genes for the holin and endolysin proteins are encoded separately from the main RcGTA cluster.



There are good reasons to think that GTAs are not simply defective prophages that still can package small DNA fragments:
  1. The main RcGTA gene cluster has been somewhat stably inherited over a very long time, maybe a more than a billion years.  Some descendants have lost all the genes, but about 25% of the 225 alpha-proteobacterial genomes examined have retained versions of a single large cluster, typically containing 14-17 co-transcribed genes, most of which encode capsid head and tail proteins.
  2. Expression of this gene cluster is at least partly controlled by cellular regulatory mechanisms.  
  3. Other genes, at other chromosomal locations, are also needed for efficient RcGTA production.
I just crunched some numbers from a detailed phylogenetic tree for the alpha-proteobacteria showing which taxa have GTA.  The large GTA cluster is only found in a subclade (148 taxa, 109 distinct species names); the authors estimate that this subclade is 1.0 - 1.4 billion years old.  57% of the taxa in this subclade have the large GTA gene cluster.

My goal for these three months is to generate models of GTA evolution (probably computer simulations) that evaluate the following candidate explanations for its persistence:

  1. Infectious spread of GTA by rare large-head particles that package the 14 kb gene cluster.
  2. Restoration of mutated GTA genes by unidirectional recombination with functional alleles from GTA-producing cells.
  3. Beneficial recombination of chromosomal genes.
Flawed model for Explanation 1:  Nobody has seen the large heads postulated by Explanation 1, but nobody has explicitly looked for them.  The Zhaxybayeva lab already has an unpublished mathematical model that addresses this exp lanati on, created by a mathematically-inclined former post-doc.  It asks how frequent such heads would need to be in order to maintain GTA-producing cells in a mixed population of GTA+ cells and GTA- cells lacking the gene cluster.  The model assumes that  large heads are produced at frequency µ, and that these inject the GTA gene cluster into GTA- cells, converting them into GTA+ cells.  Only a small fraction of GTA+ cells are activated to produce GTA in any one generation, and these lyse after GTA production.  

The conclusion from this model is that GTA+ cells can persist at high frequency even if they only make large particle for every 10^5 normal small particles.  Because the model assumed a reasonable 'burst size' of 100 GTA particles per producer cell, this means that GTA+ can persist if only one cell in a thousand produces a single large particle.    

But I didn't think this result could be correct.  Since each cell lysis destroys a GTA+ cell and only one in a thousand creates a new GTA+ cell from a GTA- cell, the GTA+ population should be continually decreasing.  Production of new GTA+ cells only compensates for 0.1% of the loss of GTA+ cells.  

I initially had a hard time fully understanding the mathematics of this model.  It included expressions for logistic growth, which complicated the math without adding anything to its utility.  So I created my own version of this model, which gave a very different answer.

New model for Explanation 1:  I'm going to put the description of this model into another post, because here I want to get on to my beneficial recombination model.  Bottom line: the model's result is that transduction of the GTA gene cluster by large-head GTA particles can't come close to maintaining GTA+ cells in a mixed population even if every cell produces a large-head particle.  This is because:

  1. All cells that produce GTA die; 
  2. Only a small fraction of large-head particles will contain a complete gene cluster (maybe 0.1 to 1%); 
  3. Except when GTA+ cells are rare, many particles will attach to GTA+ cells rather than to GTA- cells; 
  4. In a natural environment many GTA particles will fail to find recipients.  (This issue isn't part of the model.)
  5. To overcome these obstacles each GTA-producing cell would need to produce more than 1000 (10,000? 100,000?) large-head particles.
Finding the flaw in the lab's model:  Assuming that I understand the lab's model correctly, the main error is that it 'corrects' for the probability that a GTA particle will attach to a GTA+ cell rather than a GTA0- cell by multiplying by the number of GTA- cells rather than by their frequency.  Since the model assumes populations of 10^7 to 10^9 cells, this overestimates the amount of transduction by orders of magnitude, leading to a comparable underestimate of the frequency of large heads needed to maintain GTA+.

Model for Explanation 2:  I modified the basic structure of my Explanation 1 model to consider a related hypothesis.  Defective alleles of GTA genes are expected to arise by random mutation.  At least some of these will also prevent the cell from lysing when GTA production is induced.  These cells can still receive functional alleles of their defective genes from GTA particles produced by 'wildtype' cells, but they can't transmit their defective alleles to the wildtype cells because they can't produce GTA.  This asymmetry favours spread of functional alleles, and might be able to maintain GTA, although it wouldn't allow GTA+ to spread to cells that completely lack the GTA genes.

Like the model for Explanation 1, the result is a strong NO.  Because the models are very similar, it's not surprising (in retrospect) that spread of functional alleles faces the same obstacles

  1. All cells that produce GTA die; 
  2. Only a small fraction (about 0.1%) of particles will contain whatever GTA gene is mutated in a recipient cell; 
  3. Except when GTA+ cells are rare, many particles will attach to cells with the functional allele rather than to those with mutated allele; 
  4. In a natural environment many GTA particles will fail to find recipients.  (This issue isn't part of the model.)
  5. To overcome these obstacles each GTA-producing cell would need to produce more than 1000 (10,000? 100,000?) large-head particles.
Models for Explanation 3:  Most microbiologists assume that GTAs are maintained in their genomes by selection for presumed benefits of chromosomal recombination.   They implicitly assume that randomizing the combinations of chromosomal alleles in a population creates a benefit strong enough to overcome the cost of the cell death associated with GTA production.  They don't explicitly assume this, because they're not used to thinking rigorously about evolutionary processes.  Instead their explanation usually relies on GTA-mediated recombination creating some specific beneficial new combination, and ignores the selective costs associated with other combinations.

In fact, many very smart people have spent many years looking for conditions where random chromosomal recombination creates benefits strong enough to maintain the genes that cause it.  These 'evolution of sex' models have identified some conditions, but usually these benefits are small and occur only under special circumstances.  Most of the time recombination appears to be a waste of time at best.

Recombination Model 1:  Way back when I was a new post-doc spending a year in Dick Lewontin's lab, I developed a computer-simulation model of recombination by natural transformation (Redfield 1988, Evolution of bacterrial transformaiton: Is sex with dead cells ever better than no sex at all?).  In this model I applied a relatively simple model of the evolution of sex to a population of naturally competent bacteria.  My first goal for addressing Explanation 3 is to adapt this model so it applies to recombination caused b GTA rather than by natural transformation.  I'll describe my progress (and current deadlock) in the next post.

Recombination Model 2:  Model 1 is 'deterministic'; it ignores random ('stochastic') events, effectively assuming that the population is infinitely large.  But the strongest benefits of recombination are now thought to arise from precisely the stochastic effects Model 1 ignores.  So I also want to make a stochastic model that tracks individual cells, or at least a model that takes stochastic processes into account.  I haven't started writing this model yet, but I might pattern it on the transformation model described by Takeuchi et al, 2014.