RRResearch: Understanding the results of the first analysis

The grad student did the analysis I had described in this post. Here's what I had said I expected:

And here's what he found:

His data extends over a larger scale, and there is no empty space on the left below the main peak of points, perhaps just because the dots are too big to resolve. A few uptake ratios are as high as 10, which is also expected. Some of the distances to the nearest 'USS' (position on the USS list) were surprisingly large - outside of the common fragment sizes in the 'short' DNA prep, but these might represent the several places in the genome where USS are widely separated.

The most surprising aspect was the appearance of well-defined lines of points forming peaks at distances longer than the fragment sizes, and the absence of the clusters of points I'd originally hypothesized.

These long-distance peaks made sense once the grad student identified the positions responsible for them and checked their assigned USS scores. At the site of the peak he found a position with a USS score only slightly lower than the cutoff he'd used when generating his list. When he checked the USS scores for the positions of the other long-distance peaks he again found scores that were locally high but below the list cutoff.

The figure below illustrates what we think is going on. First consider the top graph, which is a simpler schematic version of the uptake-ratio graph in the earlier post. It shows two local peaks in uptake, one at the site of a USS on the list, and one at the site of another uptake promoting sequence. In principle this sequence could be a lower-scoring USS, or it could be an unrelated sequence that also promotes uptake.

The lower graph shows what we expect when this data is replotted with the distance to the nearest 'USS' on the X axis. As I originally expected, points close to the recognized USS give two lines heading down and away from position 0 (the position of that USS). But because the other uptake-promoting position isn't recognized as a 'USS', its points show up farther along the x axis, according to their distance from the position-zero USS.

Are USS that fell below the list cutoff responsible for all of the long-distance peaks? One simple test is to reduce the cutoff for the USS list, and see if the peaks go away. Sure enough, when the grad student reduced his USS-score cutoff from 19.04 to 18, all but one of the peaks disappeared. I'm a bit surprised that the long-distance low-uptake points disappeared too; I guess this means that they weren't just due to gaps in the genomic distribution of USSs.

Does this result mean that the genome doesn't contain any non-USS sequences that promote DNA uptake? No. There's still that one remaining peak at about 800 bp, whose USS scores need to be checked. And there are all the points in the black part of the graph, where non-USS peaks may be obscured by all the other points.

Field of Science

RRResearch

Understanding the results of the first analysis

No comments:

Post a Comment