I asked the guy in the next office for advice about using BLAST to characterize the pattern of variation in and around USS sites in the sequenced H. influenzae strain genomes. His advice was to set up BLAST on one of our own computers (not a big deal, he said), to create a BLAST database containing all of the ~2000 USS sequences (39 nt each) from the H. influenzae Rd genome, and then to 'query' this database with the genome sequences of each of the other strain genomes in turn. This has the big benefit of creating only one output file for each genome, rather than the one file for each SS site I would have if I used the 39 nt USS sequences as queries.
Finding the downloadable files at NCBI was quite easy, and I now have a BLAST folder in the applications folder of our fast Mac (Stingray). I also had no trouble converting my file of ~2000 USS sequences into FASTA format, with each entry given an identifying number (I don't need that but BLAST does). And I downloaded the genome sequence files - only 3 other H. influenzae genome sequences turned out to be directly available from GenBank; the others apparently still must be obtained from the various sequencing centers. One of the post-docs may already have these, but if not I think I don't really need data from more than three anyway.
I also found the instructions for formatting commands for BLAST searches, and (must do first) how to use the 'formatdb' program to convert my FASTA file into a recognized BLAST database file. BUT, as usual, I'm stuck at the very basics of using Unix commands in the Mac Terminal interface. I'm hoping one or other of the post-docs will be able to set me straight today.
A mathematical theory of communication
8 hours ago in Doc Madhattan