This time I'll start from the biology of DNA uptake and the consequent accumulation of preferred sequences in the genome.
When we, as researchers, examine the sequence patterns of the preferred-uptake sequences and the overrepresented-in-the-genome sequences, we first compare the single-position pattern and then compare the interaction-effect pattern. Because the two sets of sequences are the same, we will find the same patterns.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Now let's consider a real research situation, where we don't know anything about the underlying biology. We identify a set of sequences that are preferred by the uptake machinery, and another set that are overrepresented in the genome. We analyze each set, and find that the single-position patterns are different.
What can we infer about the underlying biology? We must conclude that the two sets of sequences have different properties, and thus that the sequences preferred by the uptake machinery are not the same as the sequences overrepresented in the genome. The differences must be due to post-uptake forces that alter the sequences accumulating in the genome.
We might go on to analyze the interaction effects in the uptake set of sequences and in the genomic set of sequences. These analyses may give us insights into the uptake process and the post-uptake forces, but they won't change the fact that the two sequence sets are different.
The differences could also be due to pre-uptake forces. For example, if there was a bias in the sequences which were available to be uptaken. The uptake machinery may *really* prefer sequence X, but if there is never any sequence X in the environment, you'll never get uptake and incorporation of that sequence. (Though if you exogenously add sequence X, you'll see good uptake.)
ReplyDeleteNow is this likely? Probably not. Post-uptake forces seem much more likely to me, but you can't completely discount biases from pre-uptake end of things.
BTW, I agree that if uptake itself was the only relevant effect, then the profiles for uptake should match those in the genome - at least the uptake profile for the biologically relevant conditions.
There may be threshold effects which skew the relative profiles. The genomic incorporation reflects low-level uptake over long periods, whereas your uptake assays were probably done at high levels over short periods. If the uptake of all sequences don't follow the same linear scaling over concentration/time changes, the uptake profiles at assay conditions might not reflect the historically relevant uptake profiles. Again, this is probably not likely.