Our visiting grad student is working with Gallibacterium, a Pasteurellacean relative of Haemophilus. To help her optimize transformation we would like to find out about its uptake bias. As a first step, we'd like to find out whether it has repeats in its genome that resemble the known Pastuerellacean uptake signal sequences (USS) - fortunately a Gallibacterium genome sequence is available. I've done this analysis for all the other sequenced Pasteurellacean genomes, so I said I'd do this one too. Should be easy...
My first approach was to give the genome sequence to our Perl program that simulates USS, not because I want to do that, but because the program's first step is to count the numbers of full and partial USS matches in the starting sequence. The program was set up to do that for the H. influenzae USS (AAGTGCGGT), but when it didn't find many of these in the genome I modified it to find the other type of Pasteurellacean USS (ACAAGCGGT). It didn't find many of those either.
So, perhaps Gallibacterium has a previously unknown version of the USS. Or perhaps it has an unrelated USS. Or perhaps it doesn't have a USS at all, which would suggest that it has weak or no uptake bias. What was needed was analysis with the Gibbs motif sampler, which would look for any common repeat in the genome. OK, I did lots of those last summer, so I can do it again.
I remembered how to submit a sequence for analysis, but I didn't bother to carefully check what the different settings do bfore submitting the run. That was stupid, because 36 hours later I've received two emails fromt he system, telling me that my requested run failed. One says "ERROR:: Mismatched width ranges" and the other "ERROR:: Palandrome (sic) subscript overflow". Guess I'd better buckle down and sort it out.
19 hours ago in Variety of Life