Field of Science

Unexpected differences in uptake of DNA from two closely related strains

The PhD student's long careful reanalysis of the DNA uptake data has finally produced uptake ratio plots.  These confirm a surprising difference between the DNAs from two closely related strains, 86-028NP ('NP') and PittGG ('GG').  We also saw this difference in our preliminary analysis, but we thought it might be an artefact of how the analysis was done.

In the experiment underlying this data, cells of a third strain, KW20, took up DNA that had been purified from NP or GG cells.  We recovered the taken-up DNA and sequenced it, comparing how well each position in the ~1,800,000 bp genome was represented in the 'uptake' DNA relative to parallel sequencing of the 'input' NP or GG DNA.

We expected to see peaks and valleys of high and low DNA uptake, because we knew:

  • that the DNA of each strain contains many occurrences of a short sequence that's strongly preferred by the DNA uptake machinery *'uptake sequences'),
  • that the DNA had been broken into fragments so small that most of them wouldn't contain this sequence.

The two strains' DNA sequences are only 2-3% different, and we've found that uptake sequences are usually less variable between strains than other sequences,.  Thus we expected the overall pattern of uptake to be very similar between the two strains (approximately the same number of peaks, and approximately the same distribution of peak heights).

We don't know what causes this difference.  We'd expect it to be differences in the sequences of the two genomes, since both DNAs were highly purified before use.  But it could be a methylation difference, since the two strains might contain different methylation systems, especially those associated with restriction-modification genes.

The graphs below show that the numbers of peaks are quite similar, but their height distributions are not.  For DNA from strain NP (upper graph), most of the peaks have quite similar heights, and almost all are between 3.5 and 4.5.  But DNA from strain GG (lower graph) has much more variation, with many peaks below 2.5 and many higher than 5 or even 10.
Below is the same data, this time plotted on log scales.  This lets you see how deep the valleys are, and how high the highest GG peaks are.



Cause of the strain differences?

We don't know what causes this difference.  We'd expect it to be differences in the sequences of the two genomes, since both DNAs were highly purified before use.  But it could be a methylation difference, since the two strains might contain different methylation systems, especially those associated with restriction-modification genes.

In principle, sequence differences in the uptake sequences could accumulate over evolutionary time if one strain had lost the ability to take up DNA.  But in lab experiments strains GG and NP both transform poorly relative to the highly transformable lab workhouse strain KW20 (NP a bit worse than GG).

How to find out the cause?

In his preliminary analysis the PhD student examined uptake sequences associated with the high and low GG peaks and didn't see any obvious differences.  We'll want to do this again with the improved datasets.

We can do this at a more detailed level, examining specific uptake sequence occurrences at positions of high and low uptake.  We should particularly focus on parts of the genomes where the NP and GG genomes are 'syntenic' - where they have homologous sequences in homologous locations.  That will let us compare pairs of NP and GG uptake sequences that we know share a recent evolutionary ancestor.

Let's not rush into this

I 'm keen to find out what's going on, but I think it's important to exercise restraint.  We should proceed systematically through the analyses we've planned, rather than jumping onto this tempting problem.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS