He used two DNA preps, one sheared to an average length of about 6 kb (the long-fragment prep) and one sheared to an average length of about 250 bp (the short-fragment prep). He analyzed both with a Bioanalyzer belonging to a neighbouring lab (thanks neighbours!). This produced intensity traces for each sample (red line), with size-standard peaks (blue).
The intensity traces reflect the number of base pairs at each position in the gel, not the number of fragments, so the values needed to be normalized to fragment length to get the size distribution. The purple line is the final distribution of fragment sizes. We see that most fragments are between about 75 and 300 bp.
Now, how do we use this information to predict the shape of the expected uptake peak around an uptake-promoting sequence (a USS)?
We first need to calculate the probability that the position we're looking at will be on the same fragment as a USS (call this value 'U').
This is our expected peak shape, if all that matters is whether a USS is present anywhere on the DNA fragment. We'll compare this to the average shape of well-isolated uptake peaks in the short-fragment dataset - the PhD student has already made a list of this subset of the peaks.
To do the comparison properly we'll need to take peak height into consideration too. So we should do separate comparisons for different peak-height classes. If the prediction nicely overlays the observed peaks we'll conclude that a USS anywhere on the fragment is equally effective.
If the location of the USS on the fragment matters, or its orientation, the peak would have a different shape. For example, if USSs near the ends of fragments don't promote uptake very well, the observed average peak would be narrower than predicted by fragment sizes.
For another example, if USS in the forward orientation promote uptake well when they're near the left end of the fragment but poorly when they're near the right end, we might see different peak shapes for the two orientations - skewed right for 'forward' USSs and skewed left for reverse' USSs. (Or is that backwards?) If we only looked at the combined set of USSs in both orientations we might miss this effect.
Is there any other factor we could investigate using this analysis? And what about the large-fragment data - should we treat it the same way?