I spent much of Saturday trapped in the swamp of trying to get files correctly formatted for the Gibbs searches. Files contained mysterious invisible characters that the Gibbs program refused to touch but that text editors could neither see nor delete. (And no, it wasn't that the carriage returns were of the nasty Mac type.) FASTA-formatted sequences had ">" characters hidden in what should have been plain DNA sequence. etc. etc. Sequence 'mask' instructions that had seemed foolproof didn't behave as they had a few days ago.
But finally I was able to queue up all the runs of the forward sequences of the Neisseria genomes (including a newly available sequence for N. lactamica) and of the three genomes with Apl-type USSs. And to find the errors in my queue instructions and queue them again.... And queue them again so I'd have duplicate runs. And to get results which were, all but one, USS motifs (and to re-queue that one). And to clean up the sequences and get the logos (thanks, WebLogo!). AND, to see that the results are just different enough to be interesting.
Doing the exact same analysis on the reverse complements of these sequences will give me an independent dataset. I found a website that not only converts FASTA sequences into their reverse complements but can handle whole genomes in one go (thanks, BMC!). And now, I've queued (in duplicate) all the corresponding reverse complement searches.
Drones, Silicon Valley and biology: The future isn't here yet
1 hour ago in The Curious Wavefunction