RRResearch: Maybe we CAN do this analysis...

One of the reviewers of our USS manuscript asked that we compare the sequences of the USSs we'd found in the Haemophilus influenzae Rd genome with the homologous sequences in the genomes of other sequenced H. influenzae isolates. We've been planning to dismiss this as both too difficult ("beyond the scope of the present work") and unlikely to yield any relevant information. I still expect that such a comparison will probably show that the USS sequences are just like the rest of the genome, with a few differences here and there. But I've just realized that the analysis might be much easier than I had thought.

The Gibbs motif search provided us with a file containing about 2000 sequences, each about 40bp centered on a USS. In principle it should be straightforward to use BLAST to search the other available H. influenzae genome sequences for matches to each of these USS sequences. The USS sequences are long enough that truly homologous sequences will be found; I expect most of these to be perfectly matched, with the occasional single-base mismatch.

I think I can easily use my highly developed 'find and replace' MS Word skills to convert the list of USS sequences to the FASTA format needed by BLAST. I don't know how to set up a search that uses multiple input sequences to search a few genomes, but I'm pretty sure that the guy in the next office will be able to help me. I'm not sure what form the results will take, and this may be the complicating factor. The standard BLAST display isn't designed for this kind of analysis. But output will probably be provided as a text file..

OK, I'm doing a test run, searching for one fake USS sequence against the 13 (wow!) H. influenzae genome sequences in the BLAST microbial genomes database. Result: "No significant similarity found. For reasons why click here." OK, I should be using the 'Search for short nearly exact matches' rather than regular BLAST. Hmmm..., various complications. I think I need to try this using a real USS sequence rather than a made up one. I'll play around more later.

2 comments:

AnonymousJuly 7, 2007 at 8:18 AM
If I'm following what you want to do with the BLAST search, it would almost certainly be easier to do using BLAST on your desktop (e.g. WU-BLAST, blast.wustl.edu). Certainly that can run multiple searches from a FastA file and there's plenty of room to tweak the search.

I'm by no means an expert but I do have Blast set up on my computer and have worked out some relatively similar (I think) search conditions. Let me know if this would help.
HeatherJuly 7, 2007 at 3:47 PM
Rosie,

You could also consider using Jeffrey Lawrence's DNAMaster program to search for these sequences. How long are they? I deal in 8 base pair sequences with degeneracy all the time in my work and I could run your sequences through our program and out put the sort of bar code and ennumerated location plots that I use in my research (http://www.pitt.edu/~heh1/Research.html)
We published on our sequence stuff last year in JME.

thoughts?

Heather Hendrickson

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS