The uptake-sequence manuscript we submitted to the new journal Genome Biology and Evolution has been provisionally accepted. One of the reviewers said the following, and I'm wondering if there might be an easy way to do this suggested analysis:
...despite claiming that one of their main goals was to determine whether uptake sequences had an effect on protein and organismal fitness, the authors did not look if these sites are under purifying/diversifying selection. It would be greatly relevant for their question of interest, which is currently only supported by indirect evidence.
The reviewer is absolutely right. We didn't think of doing this analysis, but we should have (though of it, not necessarily done it).

I don't think our dataset is appropriate for anything more sophisticated than simply calculating dN/dS ratios, and I'm not at all sure it's even suitable for that. I had to start by pulling out my complimentary copy of Freeman and Herron's undergraduate textbook Evolutionary Analysis, which explains how dN/dS ratios and McDonald Kreitman tests are used to examine DNA sequences for evidence of purifying or diversifying selection on the amino acids they encode. For a pair of aligned DNA sequences, dN/dS is the ratio of the number of differences that change the encoded amino acid to the number of differences that don't change the encoded amino acid. There are lots of programs and web sites that will do this analysis, given pairs of aligned seuqences in the appropriate format.

I think that my bioinformatician coauthor has DNA sequences of hundreds of H. influenzae and N. meningitidis genes, each aligned with each of three 'standard' homologs from genomes that don't have uptake sequences. These alignments have been sorted into classes, based on how many uptake sequences the H. influenzae or N. meningitidis gene has (0, 1, 2, 3, >3). I think the appropriate analysis would be to score the dN and dS ratio for each alignment, calculate the mean score of the three standard alignments of each H. influenzae or N. meningitidis gene, and then calculate the grand mean score for all the genes in each class.

This analysis isn't hard to describe, but it might be harder for my coauthor to automate, depending on the details of how the alignments are fomatted and what the dN/dS programs will accept. I'm going to email my former post-doc who has a lot of sophisticated knowledge about these methods, asking for her advice.

No comments:

Post a Comment

Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS