One of the reviewers of our USS manuscript asked that we compare the sequences of the USSs we'd found in the Haemophilus influenzae Rd genome with the homologous sequences in the genomes of other sequenced H. influenzae isolates. We've been planning to dismiss this as both too difficult ("beyond the scope of the present work") and unlikely to yield any relevant information. I still expect that such a comparison will probably show that the USS sequences are just like the rest of the genome, with a few differences here and there. But I've just realized that the analysis might be much easier than I had thought.
The Gibbs motif search provided us with a file containing about 2000 sequences, each about 40bp centered on a USS. In principle it should be straightforward to use BLAST to search the other available H. influenzae genome sequences for matches to each of these USS sequences. The USS sequences are long enough that truly homologous sequences will be found; I expect most of these to be perfectly matched, with the occasional single-base mismatch.
I think I can easily use my highly developed 'find and replace' MS Word skills to convert the list of USS sequences to the FASTA format needed by BLAST. I don't know how to set up a search that uses multiple input sequences to search a few genomes, but I'm pretty sure that the guy in the next office will be able to help me. I'm not sure what form the results will take, and this may be the complicating factor. The standard BLAST display isn't designed for this kind of analysis. But output will probably be provided as a text file..
OK, I'm doing a test run, searching for one fake USS sequence against the 13 (wow!) H. influenzae genome sequences in the BLAST microbial genomes database. Result: "No significant similarity found. For reasons why click here." OK, I should be using the 'Search for short nearly exact matches' rather than regular BLAST. Hmmm..., various complications. I think I need to try this using a real USS sequence rather than a made up one. I'll play around more later.
What math can teach us about drug discovery and biology (and all of science, really)
3 hours ago in The Curious Wavefunction