Everything is working.
I still can't get BLAST to attend to matches close to the ends of the 39 nt fragments, but I'm treating these as mismatches at the innermost position and 'no information' at positions closer to the end. For example, if a sequence matches at positions 4-39, I assume there's a mismatch at position 3 and that I have no information about positions 1 and 2.
I'm searching for the two USS orientations separately (searching the forward and reverse strands of the query genome separately). I'm analyzing the data separately. So far I've analyzed only the forward searches, but I'll need to flip the results I'll get from analyzing the reverse searches.
I'm collecting the output as pairwise comparisons between query and USSs, because this makes it easiest to pull out the positions of the mismatches without the information about what kind of a mismatch each is.
I'm doing the analysis by bouncing the file back and forth a couple of times between Word and Excel, using Word to first insert tabs between the output lines, and then Excel to delete the columns (formerly lines) I don't want (including the actual sequences) and to sort the results by both the match score and the locations of the ends of the matches. Then I use Word to insert tabs between each position of each alignment, and Excel to count the numbers of mismatches at each position and to graph the results.
Next post I'll describe the results.
Leroy Hood and the tool-driven revolution in biology
1 day ago in The Curious Wavefunction