I've started reanalyzing the old DNA-uptake data (see New bottles for old wine). Yesterday I succeeded in using the Gibbs motif-search software (thank you RSA Tools!) to analyze the sequences from the 1990 paper, and was encouraged when it did find a USS motif in 15 of the 28 sequences. These 15 were fragments that cells had strongly preferred to take up, and the USS motif looks very much like the one derived previously from the whole-genome consensus. This result is very preliminary (I haven't yet kept any notes or done it meticulously), but it suggests that the bias of the uptake machinery does correspond well to the consensus of the genome repeats.
Today I did the preliminary analysis (this time keeping notes) of the phage-derived sequences from one of the earlier papers (1984). These sequences had not been put into GenBank as a neat set, so I had to download the phage sequence and use a nice shareware DNA-analysis program (Sequence Analysis; thank you Will Gilbert!) to identify the sequences of the five short fragments I will analyze.
I still need to deal with an annoying format problem. The motif-search programs accept DNA sequences only in particular formats, of which the simplest is "FASTA". FASTA identifies comment lines by starting them with an ">", but for some reason these programs treat the text after my ">"s as sequence. Of course they choke, because the text contains non-sequence characters (i.e. not just A G C T and N). If I paste FASTA-format sequence in directly from GenBank there's no problem, so I think Word is doing something weird with the ">" character. I need to find a better text editor for Macs (maybe Mi). Unfortunately TextEdit has been 'improved' to the point where it can no longer handle plain text - it insists on saving all files as RTF or HTML.
Information and Structure in Complex Systems
21 hours ago in PLEKTIX