Model of GTA evolution by infectious transfer

Here's the description of my model addressing Explanation 1 for GTA persistence.  For now I've just pasted in the text of a Word file I prepared about 10 days ago.

A constant-population-size model of large-head GTA transmission
(Based on Xin Chen’s model, but with stepwise generations and without logistic growth.)

Assumptions:
The population:
1.     Population size is constant.  Loss of GTA+ cells due to lysis during GTA production is made up by growth of all cells after the transduction step.
2.     Dense, well-mixed culture in liquid medium (so cells frequently encounter GTA particles)
GTA production:
3.     GTA particles come in two sizes.  Small particles contain 4 kb DNA fragments.  The hypothetical large particles contain fragments that must be at least 14 kb (the size of the GTA gene cluster) but could be as big as 50 kb. 
4.     The number of GTA particles a cell produces does not depend on the proportion of small and large particles.
5.     DNA packaging by GTA is random; all parts of the cell’s genome are equally represented.  But in this model we only consider the particles containing the full-length GTA cluster.
6.     This is the killer:  If the cell’s chromosome is 5 MB and the large-particle capacity is 15 kb, only 2x10-4 of large particles will contain complete GTA gene clusters (will be G+ particles).  If we change the large-particle capacity to 20 kb, then about 1x10-3 of large particles will contain a complete cluster.  A 50 kb capacity and a 3 MB chromosome would probably get it up to about 10-2.  (And this ignores the recombination machinery’s need for homologous DNA flanking the GTA cluster to promote recombination.)
Transduction:
7.     GTA- cells completely lack the main GTA gene cluster.  They can only be converted to GTA+ by homologous recombination with GTA-containing DNA from G+ particles.
8.     GTA particles cannot tell the difference between GTA+ and GTA- recipients.  Particles capable of transducing GTA- cells to GTA+ can also ‘transduce’ GTA+ cells to GTA+.
9.     All GTA particles produced in one cycle are taken up by and transduce cells in that cycle.  (The efficiency of infection and recombination is 1.) 
10.  The model ignores large and small GTA particles that don’t transduce GTA+.
11.  Each cell takes up only one G+ particle (or none).  This is reasonable, since the number of G+ particles is always going to be much smaller than the number of cells.

Parameters:
F    Initial frequency of GTA+ cells (we want to consider a wide range)
c    Fraction of GTA+ cells producing GTA particles (and consequently lysing).  (In wildtype lab cultures this is <3 o:p="">
b    Number of GTA particles produced by each burst.  Default value is 100.  (We have no actual measurements.)
µ    Fraction of GTA particles that are large.  (We expect this fraction to be small, since large particles have not been observed.)
T    Fraction of large GTA particles that are G+ particles (able to transduce GTA).  (This is limited by genome size, GTA gene cluster size, and the DNA capacity of these hypothetical particles.  Plausible values are between 10-2 and 10-4.)
G   µ * T Fraction of GTA particles that contain complete GTA genes.

What happens in one generation:
GTA production and cell lysis:
N   Proportion of GTA particles to cells remaining in the medium after GTA+ cells have burst. 
      = (Fcb)/(1 – Fc)  (Note: Fcb is the GTA production per original cell.  1 – Fc normalizes this to the number of cells remaining after lysis.)
N+  Proportion of GTA particles, per remaining cell, that carry the complete GTA gene cluster (are ‘G+’ particles able to transduce the GTA-production genotype to GTA- cells). 
= NµT   =  NG
Fraction of surviving GTA+ cells per original cell (will be normalized to remaining cells later): = F(1 – c)
Transduction:
Fraction of GTA- cells transduced to GTA+: N+(1 – F).  {Note: the 1 – F corrects for the G+ particles that attach to and ‘transduce’ GTA+ cells.) 
Fraction of GTA+ cells (per original cell) after transduction:  F(1 – c) + N+(1 – F).  (Note: F(1 – c) removes cells killed by lysis, N+(1 – F) adds cells gained by transduction.)
Fraction of GTA- cells (per original cell) remaining after transduction:  (1 – F) – N+(1 – F).  (Note: 1 – F is the original fraction of GTA- cells, N+(1 – F) removes cells lost by transduction to GTA+.)
Cell growth:
Now we normalize the cell numbers to ‘per remaining cell’:
Total fraction of cells remaining after GTA production and transduction:
            1 – (Fc)  (Note: To normalize, divide the above cell fractions by this value.)
Fraction of GTA+ cells after one complete cycle:
F’ = F(1 – c) + N+(1 – F) / 1 – Fc

How to evaluate the change in the proportion of GTA+ cells?
We can expand N+ and pull out the F, then look at the before/after ratio:
F’   = F * (1 – c) + c * b * F * µ * T * (1 – F) / 1 – (F * c)
      = F * ((1 – c) + C * b * µ * T * (1 – F) / 1 – (F * c)

F’ / F = (1 – c) + c * b * µ * T * (1 – F) / 1 – (F * c)

When the value of this expression is greater than 1, GTA+ is increasing; when it is less than 1, GTA+ is decreasing.
For simplicity, below I combine b, µ & T as the compound variable W.

What happens if we vary F, holding everything else constant?
Increase of GTA+ depends only on W.  If W is >1, GTA+ increases.  If W is <1 decreases.="" gta="" o:p="">
The rate of change is very slow when F is close to 1 (when almost all cells are GTA+), and fast when F is close to 0 (when almost all cells are GTA-).
What happens if we vary c, holding everything else constant?
C affects how fast change happens, but not its direction.  If W>1, GTA+ still spreads; if W<1 decreases="" gta="" o:p="" still="">
What happens if we vary W, holding everything else constant?
If W<1 always="" be="" denominator.="" numerator="" o:p="" smaller="" than="" the="" will="">
If W>1, the numerator will always be smaller than the denominator.
In both cases., all the other parameters cancel out.  This confirms that the direction of selection o GTA+ depends only on whether W is higher or lower than 1.
Would the result change if the population were growing?
I don’t think so, since GTA+ and GTA- cells grow at the same rate.

Since plausible values of W are all much lower than 1, I conclude that GTA+ cells cannot increase by GTA-mediated transduction of GTA- cells to GTA+.

GTA could spread by transduction if it did preferentially package the GTA gene cluster into its particles.  Of course, then it would be a phage.
How the model’s assumptions affect this outcome:
Basically, all the assumptions are either neutral or increase the chance that GTA+ will spread. Making the simulation more realistic would just make things worse for GTA+, not better.
The population:
1.  Population size is constant.  Loss of GTA+ cells due to lysis during GTA production is made up by growth of all cells after the transduction step.
I don’t think adding growth would affect the outcome.
2.  Dense, well-mixed culture in liquid medium (so cells frequently encounter GTA particles).
If the culture were more dilute or poorly mixed, some GTA particles would not find new cells to attach to.  This would reduce the amount of transduction (effectively reducing W).
GTA production:
3.  GTA particles come in two sizes.  Small particles contain 4 kb DNA fragments.  The hypothetical large particles contain fragments that must be at least 14 kb (the size of the GTA gene cluster) but could be as big as 50 kb. 
This is the central assumption of the model.  The size of the small particles is known.  The hypothesized large particles could be as small as 15 kb (allows a bit of homologous sequence on each side of the cluster to promote recombination).  Phage capsids can in principle be very large, but it’s parsimonious to assume a modest size.
4.  The number of GTA particles a cell produces does not depend on the proportion of small and large particles.
Large capsids will require more capsid protein molecules.
5.  DNA packaging by GTA is random; all parts of the cell’s genome are equally represented.  But in this model we only consider the particles containing the full-length GTA cluster.
Experimental results show slightly less packaging of GTA sequences.  If this applies to the hypothetical large particles it would reduce production of G+ particles.  If particles preferentially package GTA, GTA would be a phage.
6.  This is the killer:  If the cell’s chromosome is 5 MB and the large-particle capacity is 15 kb, only 2x10-4 of large particles will contain complete GTA gene clusters (will be G+ particles).  If we change the large-particle capacity to 20 kb, then about 1x10-3 of large particles will contain a complete cluster.  A 50 kb capacity and a 3 MB chromosome would probably get it up to about 10-2.  (And this ignores the recombination machinery’s need for homologous DNA flanking the GTA cluster to promote recombination.)
See point 3 above.
Transduction:
7.  GTA- cells completely lack the main GTA gene cluster.  They can only be converted to GTA+ by G+ particles.
Transduction depends on homologous recombination.  Small GTA particles can transduce functional alleles of individual GTA genes, replacing versions that became mutated or even deleted in an ancestor of the recipient cell.  But they cannot introduce GTA genes into cells that completely lack the GTA cluster, because there will be no homologous sequences to recombine with.
8.  GTA particles cannot tell the difference between GTA+ and GTA- recipients.  Particles capable of transducing GTA- cells to GTA+ can also ‘transduce’ GTA+ cells to GTA+.
I think some phages and conjugative plasmids may be able to detect whether potential hosts/recipients already have the element, but we have no evidence that transduction frequencies differ between GTA+ and GTA- recipients.  Wall et al (1975) surveyed 33 strains and found wide variation in both GA production and transduction, but no correlation between these abilities.
9.  All GTA particles produced in one cycle are taken up by and transduce cells in that cycle.  (The efficiency of infection and recombination is 1.) 
This is unlikely to be true, but assuming this increases the chance that each G+ particle successfully transduces a GTA- cell to GTA+.
If we were to relax this assumption the model would need to include an explicit uptake process and to specify what happens to particles that are not taken up.
10. The model ignores large and small GTA particles that don’t transduce GTA+. 
This should be OK, since these should not interfere with transduction by G+ particles, especially because their total number per cell will be small. Removing this assumption would make GTA + spread less likely.
11. Each cell takes up only one G+ particle (or none). 
This is a reasonable assumption, since the number of G+ particles is always going to be much smaller than the number of cells.  If the number of G+ particles were high, sometimes two G+ particles might inject their DNAs into the same s=cell, which would reduce the efficiency of transduction.



-->

Thinking about Gene Transfer Agent

I'm at Dartmouth for three months, working with Olga Zhaxybayeva's group to improve our evolutionary understanding of Gene Transfer Agent.  I'm writing an R-script simulation of the genetic exchange it causes (finally learning R), but my control runs with epistasis don't give the expected results.  So I'm writing this post and creating a Powerpoint deck to clarify my thinking.

First, what's Gene Transfer Agent?  A number of different kinds of bacteria produce 'transducing particles' called Gene Transfer Agents.  These look line small phage capsids but they don't usually contain phage DNA; instead they contain random fragments of chromosomal DNA.  In the best-characterized GTA ('RcGTA'), these are all 4.4 kb in length, which appears to be the DNA capacity of the tiny GTA heads.  Like phage, GTA particles inject their DNA into recipient cells (usually of the same species), where it often recombines with the chromosome and can change the cell's genotype.



GTA particles aren't infectious like phages are, both because they don't preferentially package the DNA that encodes them and because their heads are too small to contain this DNA.  The RcGTA head and tail proteins are encoded by a 14 kb gene cluster.  The sequences and organization of these genes strongly resemble that of homologous phage genes, so the known GTA systems are generally thought to have descended from what were integrated prophages. 

In lab cultures of cells with the RcGTA genes (Rhodobacter capsulatus cells), GTA is produced mainly after exponential growth has ceased, and only produced by a small subset of cells. Like release of phage particles from infected cells, release of GTA requires lysis of the cell, and the genes for the holin and endolysin proteins are encoded separately from the main RcGTA cluster.



There are good reasons to think that GTAs are not simply defective prophages that still can package small DNA fragments:
  1. The main RcGTA gene cluster has been somewhat stably inherited over a very long time, maybe a more than a billion years.  Some descendants have lost all the genes, but about 25% of the 225 alpha-proteobacterial genomes examined have retained versions of a single large cluster, typically containing 14-17 co-transcribed genes, most of which encode capsid head and tail proteins.
  2. Expression of this gene cluster is at least partly controlled by cellular regulatory mechanisms.  
  3. Other genes, at other chromosomal locations, are also needed for efficient RcGTA production.
I just crunched some numbers from a detailed phylogenetic tree for the alpha-proteobacteria showing which taxa have GTA.  The large GTA cluster is only found in a subclade (148 taxa, 109 distinct species names); the authors estimate that this subclade is 1.0 - 1.4 billion years old.  57% of the taxa in this subclade have the large GTA gene cluster.

My goal for these three months is to generate models of GTA evolution (probably computer simulations) that evaluate the following candidate explanations for its persistence:

  1. Infectious spread of GTA by rare large-head particles that package the 14 kb gene cluster.
  2. Restoration of mutated GTA genes by unidirectional recombination with functional alleles from GTA-producing cells.
  3. Beneficial recombination of chromosomal genes.
Flawed model for Explanation 1:  Nobody has seen the large heads postulated by Explanation 1, but nobody has explicitly looked for them.  The Zhaxybayeva lab already has an unpublished mathematical model that addresses this exp lanati on, created by a mathematically-inclined former post-doc.  It asks how frequent such heads would need to be in order to maintain GTA-producing cells in a mixed population of GTA+ cells and GTA- cells lacking the gene cluster.  The model assumes that  large heads are produced at frequency µ, and that these inject the GTA gene cluster into GTA- cells, converting them into GTA+ cells.  Only a small fraction of GTA+ cells are activated to produce GTA in any one generation, and these lyse after GTA production.  

The conclusion from this model is that GTA+ cells can persist at high frequency even if they only make large particle for every 10^5 normal small particles.  Because the model assumed a reasonable 'burst size' of 100 GTA particles per producer cell, this means that GTA+ can persist if only one cell in a thousand produces a single large particle.    

But I didn't think this result could be correct.  Since each cell lysis destroys a GTA+ cell and only one in a thousand creates a new GTA+ cell from a GTA- cell, the GTA+ population should be continually decreasing.  Production of new GTA+ cells only compensates for 0.1% of the loss of GTA+ cells.  

I initially had a hard time fully understanding the mathematics of this model.  It included expressions for logistic growth, which complicated the math without adding anything to its utility.  So I created my own version of this model, which gave a very different answer.

New model for Explanation 1:  I'm going to put the description of this model into another post, because here I want to get on to my beneficial recombination model.  Bottom line: the model's result is that transduction of the GTA gene cluster by large-head GTA particles can't come close to maintaining GTA+ cells in a mixed population even if every cell produces a large-head particle.  This is because:

  1. All cells that produce GTA die; 
  2. Only a small fraction of large-head particles will contain a complete gene cluster (maybe 0.1 to 1%); 
  3. Except when GTA+ cells are rare, many particles will attach to GTA+ cells rather than to GTA- cells; 
  4. In a natural environment many GTA particles will fail to find recipients.  (This issue isn't part of the model.)
  5. To overcome these obstacles each GTA-producing cell would need to produce more than 1000 (10,000? 100,000?) large-head particles.
Finding the flaw in the lab's model:  Assuming that I understand the lab's model correctly, the main error is that it 'corrects' for the probability that a GTA particle will attach to a GTA+ cell rather than a GTA0- cell by multiplying by the number of GTA- cells rather than by their frequency.  Since the model assumes populations of 10^7 to 10^9 cells, this overestimates the amount of transduction by orders of magnitude, leading to a comparable underestimate of the frequency of large heads needed to maintain GTA+.

Model for Explanation 2:  I modified the basic structure of my Explanation 1 model to consider a related hypothesis.  Defective alleles of GTA genes are expected to arise by random mutation.  At least some of these will also prevent the cell from lysing when GTA production is induced.  These cells can still receive functional alleles of their defective genes from GTA particles produced by 'wildtype' cells, but they can't transmit their defective alleles to the wildtype cells because they can't produce GTA.  This asymmetry favours spread of functional alleles, and might be able to maintain GTA, although it wouldn't allow GTA+ to spread to cells that completely lack the GTA genes.

Like the model for Explanation 1, the result is a strong NO.  Because the models are very similar, it's not surprising (in retrospect) that spread of functional alleles faces the same obstacles

  1. All cells that produce GTA die; 
  2. Only a small fraction (about 0.1%) of particles will contain whatever GTA gene is mutated in a recipient cell; 
  3. Except when GTA+ cells are rare, many particles will attach to cells with the functional allele rather than to those with mutated allele; 
  4. In a natural environment many GTA particles will fail to find recipients.  (This issue isn't part of the model.)
  5. To overcome these obstacles each GTA-producing cell would need to produce more than 1000 (10,000? 100,000?) large-head particles.
Models for Explanation 3:  Most microbiologists assume that GTAs are maintained in their genomes by selection for presumed benefits of chromosomal recombination.   They implicitly assume that randomizing the combinations of chromosomal alleles in a population creates a benefit strong enough to overcome the cost of the cell death associated with GTA production.  They don't explicitly assume this, because they're not used to thinking rigorously about evolutionary processes.  Instead their explanation usually relies on GTA-mediated recombination creating some specific beneficial new combination, and ignores the selective costs associated with other combinations.

In fact, many very smart people have spent many years looking for conditions where random chromosomal recombination creates benefits strong enough to maintain the genes that cause it.  These 'evolution of sex' models have identified some conditions, but usually these benefits are small and occur only under special circumstances.  Most of the time recombination appears to be a waste of time at best.

Recombination Model 1:  Way back when I was a new post-doc spending a year in Dick Lewontin's lab, I developed a computer-simulation model of recombination by natural transformation (Redfield 1988, Evolution of bacterrial transformaiton: Is sex with dead cells ever better than no sex at all?).  In this model I applied a relatively simple model of the evolution of sex to a population of naturally competent bacteria.  My first goal for addressing Explanation 3 is to adapt this model so it applies to recombination caused b GTA rather than by natural transformation.  I'll describe my progress (and current deadlock) in the next post.

Recombination Model 2:  Model 1 is 'deterministic'; it ignores random ('stochastic') events, effectively assuming that the population is infinitely large.  But the strongest benefits of recombination are now thought to arise from precisely the stochastic effects Model 1 ignores.  So I also want to make a stochastic model that tracks individual cells, or at least a model that takes stochastic processes into account.  I haven't started writing this model yet, but I might pattern it on the transformation model described by Takeuchi et al, 2014.