RRResearch: Planning the DNA sequencing part of the PhD student's project

The former post-doc (I'll call him the FPD) visited yesterday afternoon, and we had intense discussions of how to proceed with both the RNAseq work (summarized here on our Sense Strand blog) and with the PhD student's planned DNA uptake experiments.

His planned experiments take advantage of the phenotype of a rec2 knockout mutation. These cells take up DNA normally across the outer membrane, into the periplasmic space, but they cannot transport it across the inner cell membrane. This allows him to recover intact DNA that has been taken up, and to use DNA sequencing to compare it to the input DNA the ∆rec2 cells were given.

Some of the experiments will use genomic DNA of the species being tested, fragmented to appropriate length distributions, and some will use synthetic DNA fragments (~200 bp) containing a 30-50 bp stretch of random sequence (see figure).

The FPD, who developed the synthetic fragment protocol, pointed out that his experiments had used full lanes of Illumina sequencing only because it was not then possible for us to 'barcode' our different DNA samples and mix them for sequencing as a single lane. The sequencing depth he obtained was useful, but it will be extreme overkill for the experiments the PhD student plans. So we need to design barcoding into our analyses, so we can mix up to 24 samples in one lane for sequencing, and then separate the resulting sets of sequence reads by their different barcodes. We'll still need to use two lanes, because each 'recovered' sample will need to have a corresponding identical 'input' sample. Because these samples will have the same barcode they could not be distinguished if they were sequenced in the same lane.

So rather than doing one very-deeply sequenced experiment, he'll be able to do multiple replicates, each sequenced at a moderate but entirely adequate depth. If he uses a HiSeq machine for the sequencing, he'll be able to get 1.6 x 10^8 reads for each of 12 samples; with a NextSeq this would give 4 x 10^8 reads per sample. (Is that right, per sample, not per lane?).

One issue to keep in mind is that it would be foolish to save all the sequencing for one big batch at the end of the thesis work. Instead the work needs to be designed with an initial set of samples to be sequenced, so he can (1) tell whether everything is working as it should, and (2) begin analyzing sequence data from one part of the project while generating additional samples for other parts. For a preliminary batch of sequencing, it might be better to use a MiSeq machine, whose smaller capacity would let us sequence a few samples more economically.

We also talked about how long the random-sequence segments should be in the 200 bp fragments, and about where to locate the barcode segments. These consist of an independent sequencing primer followed by 8 bp that identify the source experiment. Putting these to the right of the random segment will let him efficiently create the double-stranded 200 bp fragments, using the same long left-side oligo (containing the random segment) with many different right-side oligos, each containing a different barcode.

2 comments:

MoritzMarch 23, 2015 at 1:04 AM
So, of the 200bp DNa you want to give to the bacteria, 150 bp will always be the same?
I'd fear to get a strong bias there. What if one barcode happens to be a stretch of DNA the bacteria preferentially take up (or don't like to take up)? This would skew all experiments with different barcodes.
Or the bacteria like the flowcell priming sequence -maybe so much that the random sequence becomes completely irrelevant!

I'd use only the DNA you are interested in as an input and add all the DNa needed for the sequencing afterwards.
James@cancerApril 9, 2015 at 6:45 AM
Rosie, you should get over 160M from HiSeq running V3 chemistry and up to 250M from V4, talk to your privoder about the version they are using. It is unclear to me why the two samples(random and gDNA) need to get the same barcodes? Pooling 12 samples per lane is likely to return 13-20M reads per sample on HiSeq; this wil only be the case if you gett the balance of samples spot-on. I'd recommend qPCR of the individual libraries if this is important (KAPA). Sounds like a fun PhD project!

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS

Field of Science

Planning the DNA sequencing part of the PhD student's project

2 comments: