For each of our 12 genuinely contaminated uptake samples we want to create multiple replicate fake-contaminated input samples, each fake-contaminated with an independent set of Rd reads at that sample's level of contamination.
For example, our UP01 sample has 5.3% Rd contamination. Its corresponding input sample is UP13. UP13 has about 2.7 x10^6 reads, so to make a fake-contaminated sample for UP13 we need to add (2.7x10^6 * 0.053)/(1-0.053) = 1.5x10^5 Rd reads to the UP13 reads.Since our Rd sample contains 4,088,620 reads, for UP01 we could make 27 such fake-contaminated sets. Other samples might need more Rd reads per set, and we wouldn't be able to make so many sets.
We'd like to use the same number of replicate sets for each of our 12 uptake samples, so we need to identify the uptake sample that needs the most reads per set, and thus has the lowest number of possible sets. The table below shows that this is sample UP08, which needs 943,299 reads per set and thus allows creation of only 4 independent sets, and thus 4 independent fake-contaminated input samples. This value is lowest because UP08 has a very high level of Rd contamination (16.6%) and its corresponding input control sample (UP15) is quite large (4.7x10^6 reads). It's not the largest control sample (that's UP16 with 1.0x10^7 reads), but the uptake samples corresponding to UP16 have much less contamination than UP08 does.
So we should plan on creating 4 independent fake-contaminated input samples for each uptake sample, and then using the average coverage of these 4 samples as the denominator in the uptake ratio calculation.
Rosie? for UP01 the table says that the contamination is 5.3% not 2.7%. Did you divided the contamination by 2?
ReplyDeleteor maybe you were referring to UP11?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteOops! Corrected, thanks.
ReplyDelete