Field of Science

Motif analysis update, part 3: covariation

The third question I asked about the USS motif was whether there is evidence for interactions. My query to the EvolDir list produced three applicable programs. One looked difficult so I left it as a last resort. A second had been written by a colleague (in Fortran! He's an old-fashioned guy (we were post-docs together)). He kindly offered to try running our preliminary sequence set for us, and sent a monster Excel file full of the statistical results, with the 24 significant ones highlighted. There's a strong risk of spurious correlations in this kind of analysis, but the ones he found seem likely to be genuine, as they are almost all between adjacent positions.

In the meantime I'd also been trying out a program that had a lovely simple web interface. But it found only two covarying positions, and these seemed very weak (i.e. their squares on the matrix were only a tiny bit darker than the background. I was attracted to this web program because its matrix display of the results seemed so intuitive, but quickly realized that this simplicity was failing to tell me what I need to know. After a lot of back and forth with a helpful expert (= person who let his email address be linked to the web page) I now have a folder full of the software and associated files (ReadMe, Help), and can begin working out how to run it for myself.

Aaarrgghhhh! It's written in a programming language called GAWK/NAWK. Wikipedia says AWK was a precursor to Perl, and runs in Unix; GAWK is GNU-AWK. Thanks, that's a big help. Mac OS 10.4 doesn't have GAWK, just AWK. I hope Westgrid has GAWK.

2 comments:

  1. Is it definitely necessary to use both programs to test for linkage, or could you just use the results that you already have?

    I have been pondering something you mentioned about strong consensus sites effacing evidence of linkage - can we have confidence in linkage searches like this across a sequence where strength of motif varies so greatly from position to position?

    ReplyDelete
  2. I think that the difference between AWK and GAWK isn't so crucial. Actually I believe that GNU awk is the prevalent implementation of awk nowadays, and that GNU awk certainly implements all the features of original awk (with the help of special options or just out-of-the box), so it shouldn't represent any problem.

    I must confess that you have astounding level of interaction with a computer in your research. Here in Russia (at least in the city where I live, ca. 2000 km from Moscow) no head of medical/biological lab would EVER use blogs, let alone Unix command-line tools.

    ReplyDelete

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS