Comments on RRResearch: Analysis of uptake sequences by score

@Tim: I don't think the background base compo...

2009-07-27T13:25:55.938-07:00

@Tim: I don't think the background base composition is the cause, because the changes are position-specific (e.g. some USS Gs get much weaker than others).

@Torsten: The reason I'm doing the Gibb analysis is to get away from the erroneous repeat/mismatch view. The position-weight-matrix produced by the Gibbs analysis is much more consistent with how we think uptake sequences evolve. I'll try to clarify this in my next post.

Could it help to use some more simple way to score...

2009-07-27T12:41:57.018-07:00

Could it help to use some more simple way to score the DUS sequences. I would focus on the 12 bp of the Neisseria DUS and than make lists of sequences allowing zero, one, two ... missmatches (use for example fuzznuc ). Than one could sort the sequences and see if there is a high number of a certain sequence that does not fully resemble the consensus. At least than one deals with real sequences and not with some strange scores. I guess Neisseria itself would understand this way better than the Gibbs scores.

What is the "background" A+T content tha...

2009-07-27T11:51:22.723-07:00

What is the "background" A+T content that Gibbs is using when calculating the strength of your motifs? Some Gibbs servers default to 60% A+T because Gibbs is mostly used to analyze promoter DNA. If this is the case, the overrepresentation of G+C in your weak Neisseria motifs may be an artifact of G+C bases being scored higher than A+Ts, when in fact the weak DUS sites in the genome vary equally at all positions. In other words, Gibbs may preferentially favour weak sites that differ at the A+T positions but maintain the G+C positions.

I think I'll do another post on how to interpr...

2009-07-27T06:38:25.477-07:00

I think I'll do another post on how to interpret this, to clarify my muddled thinking about how the motif perspective fits with this data.

In Lindsay's data, changing the two Gs doesn't strongly affect uptake, but the effect is still less than some other positions that show stronger conservation in the low-score logos. (There are errors in the base labels in this figure; I'll need to check her notebooks.)

I think this is telling us that there is no clear ...

2009-07-27T06:03:23.561-07:00

I think this is telling us that there is no clear score cut-off for defining a non-degenerate US. Does this also tell us which positions are the most important for uptake and how well do they agree with Lindsay's data?