I spent a long time suspecting that these repeats were 'correia' elements, a very short but complex transposable element common in Neisseria genomes. But I couldn't find a clear illustration of the correia consensus, and I couldn't find a good match between the correia sequences I could find and the sequences of the stray motif in my DUS dataset.
Finally I realized that I could try using the Gibbs motif sampler to characterize the motif. So I took my set of intergenic sequences, used Word to delete all the perfect DUS (both orientations), and asked Gibbs to find a long motif. I didn't know how long the stray motif actually was, so I tried guessing 20 bp, then 30, then 40. But this didn't seem to be working - instead of finding a couple of hundred long correia-like motifs it would find a couple of thousand occurrences of something with what looked like a very poor consensus. So I seeded the sequence set with about 20 occurrences of the motif taken from the dataset where I'd first noticed it.
Gibbs again returned about 1500 of what looked like poor-consensus occurrences, but this time I had a bit more confidence that this might be what I was looking for, so I trimmed away all the notation and posted them into WebLogo. This gave me a palindromic repeat that I'll paste below later, and a bit of Google Scholar searching showed me that this isn't correia at all, but a short repeat called RS3, known to be especially common in intergenic sequences of the N. meningitidis strain I'm using.
So now I can write a sensible manuscript sentence explaining what these repeats are and why I'm justified in removing them from the dataset.
Don't you love it when a piece of work just comes together? Good for you and good luck with writing the manuscript. I had earlier read through some of your posts outlining the difficulties you had with elsevier. I would suggest that in the future, perhaps with this manuscript, you consider open access journals under the PLoS or Biomedcentral publishers.
ReplyDeleteWe've had bad experiences with BMC too. Not with the open-access issue, but with excessive delays and bad judgement. We like PLoS a lot.
ReplyDelete