I've been using the motif patterns identified by the Gibbs motif sampler to reanalyze old DNA uptake data. My goal is to see if the published uptake differences correlate well with motif scores that reflect match to the pattern of genomic USSs. That is, how similar is the uptake bias to the sequences that accumulate in the genome. We've independently done our own experiments to examine this correlation, but I like the notion of finding more value in old data. (A good scientist is a lazy scientist.)
So I've been using Patser (on RSAtools) to generate a motif score for each 'suitable' fragment. To be suitable, the sequence of the fragment is available (either from the original authors or from subsequent sequencing projects) and the uptake data must be quantitative. It's not enough that the original paper provided images of gels showing that some fragments were taken up better than others; I need numerical measures of uptake. In the best data (from Sol Goodgal and Marylin Mitchell) the uptake is reported as numbers of molecules taken up per cell under standard conditions. In another experiment (Danner et al 1982) uptake is presented as % relative to a standard fragment. Several other papers describe uptake results qualitatively ('strong', 'weak', 'undetectable'), and may show gels, but I can't use these.
Here are the results. The blue and red points are data from Goodgal and Mitchell, and the green ones from Danner et al.
Goodgal tested a set of 28 plasmids for uptake, reporting the results as plasmid molecules taken up per cell. These are the blue points.; they form two clear clusters. Cells take up fewer than 30 molecules of plasmids with USS scores less than about 8, and take up between 65 and 80 molecules of plasmids with USS scores better than 9.
(Oops, I forgot to take into account the sequence of the plasmid vector the genome fragments were cloned into. This is pUC18. So I just got the pUC18 DNA sequence (Googled it), tidied it up by removing spaces and line breaks, and ran it through Patser to see if it has any USS-like sequences. This gave four surprisingly high scoring sequences - the scores are all between 10-5 and 11.5, but these sequences don't look anything like USSs to me. I'll need to think about this some more, but as all the tested plasmids were in the same vector I don't think this compromises the results.)
The red points are from the same data set. Goodgal and Mitchell purified the insert fragments from 15 of their plasmids and tested them again, this time measuring the % of the DNA that was taken up but not converting this to the number of molecules per cell. (Hmm, I wonder if I could do that myself? They say they used about 0.01µg of fragment per 0.1ml cells (about 10^8 cells, because they used 0.1ml of competent cells, and these are usually at about 10^9 cells/ml). Using Rosie's universal constant (a 1kb fragment weighs 10^-18g), and some dimensional analysis, I see that molecules taken up per cell = 1000 x %uptake/fragment size in bp.) OK, I'll go back later and make the conversion.
The green points are from Danner et al'.s analysis of synthetic USSs they constructed using the then-new ability to sequence DNA oligomers of desired sequence. Their results were reported as the amount of uptake relative to a DNA fragment that had already been shown to be taken up quite efficiently (they used this fragment as an internal control in each uptake assay they did). To conveniently fit their numbers on the graph above I multiplied each relative-uptake value by 0.5.
Does it matter that I threw in this fudge-factor of 0.5? I think not, because the numbers are all relative to an arbitrary internal standard for which I have no absolute uptake data. My main goal is to see whether better uptake correlates with a better score, and in general it does. I'm not going to try to draw any more detailed conclusions.
Idiosyncratic Thinking: a computer heuristics lecture
44 minutes ago in Doc Madhattan