The former post-doc (I'll call him the FPD) visited yesterday afternoon, and we had intense discussions of how to proceed with both the RNAseq work (summarized here on our Sense Strand blog) and with the PhD student's planned DNA uptake experiments.
His planned experiments take advantage of the phenotype of a rec2 knockout mutation. These cells take up DNA normally across the outer membrane, into the periplasmic space, but they cannot transport it across the inner cell membrane. This allows him to recover intact DNA that has been taken up, and to use DNA sequencing to compare it to the input DNA the ∆rec2 cells were given.
Some of the experiments will use genomic DNA of the species being tested, fragmented to appropriate length distributions, and some will use synthetic DNA fragments (~200 bp) containing a 30-50 bp stretch of random sequence (see figure).
The FPD, who developed the synthetic fragment protocol, pointed out that his experiments had used full lanes of Illumina sequencing only because it was not then possible for us to 'barcode' our different DNA samples and mix them for sequencing as a single lane. The sequencing depth he obtained was useful, but it will be extreme overkill for the experiments the PhD student plans. So we need to design barcoding into our analyses, so we can mix up to 24 samples in one lane for sequencing, and then separate the resulting sets of sequence reads by their different barcodes. We'll still need to use two lanes, because each 'recovered' sample will need to have a corresponding identical 'input' sample. Because these samples will have the same barcode they could not be distinguished if they were sequenced in the same lane.
So rather than doing one very-deeply sequenced experiment, he'll be able to do multiple replicates, each sequenced at a moderate but entirely adequate depth. If he uses a HiSeq machine for the sequencing, he'll be able to get 1.6 x 10^8 reads for each of 12 samples; with a NextSeq this would give 4 x 10^8 reads per sample. (Is that right, per sample, not per lane?).
One issue to keep in mind is that it would be foolish to save all the sequencing for one big batch at the end of the thesis work. Instead the work needs to be designed with an initial set of samples to be sequenced, so he can (1) tell whether everything is working as it should, and (2) begin analyzing sequence data from one part of the project while generating additional samples for other parts. For a preliminary batch of sequencing, it might be better to use a MiSeq machine, whose smaller capacity would let us sequence a few samples more economically.
We also talked about how long the random-sequence segments should be in the 200 bp fragments, and about where to locate the barcode segments. These consist of an independent sequencing primer followed by 8 bp that identify the source experiment. Putting these to the right of the random segment will let him efficiently create the double-stranded 200 bp fragments, using the same long left-side oligo (containing the random segment) with many different right-side oligos, each containing a different barcode.
16 minutes ago in Variety of Life