I had found that the distribution of coding-region USSs between the six possible reading frames paralleled the abundance of the tripeptides these USSs would encode in the H. influenzae proteome. This suggested that the reason some frames had more USSs than others was because USSs in some frames encoded more versatile tripeptides than others.
I now decided that this analysis needed some controls, so I scored the frequencies of the same USS-encodable tripeptides in the proteomes of several other bacteria: E. coli, B. subtilis, and (because it has the same base composition as H. influenzae) Listeria monocytogenes. The relative proportions of these six tripeptides seemed comparable to those of H. influenzae, but I noticed that the raw numbers were smaller than I would expect given the sizes of the respective proteomes (all the control bacteria have larger genomes than H. influenzae). So I

The results are in the top graph. The blue bars are the H. influenzae tripeptide densities, the other colours are the tripeptide densities of the controls. Note that the H. influenzae proteome has disproportionately high densities of the USS-specifiable tripeptides.

So what does this mean? It means that the biased uptake and recombination of USS-containing DNA fragments has not been neutral for the proteome. USS not only accumulate at protein-coding positions where they don't change the amino acids, they accumulate at positions that wouldn't otherwise have used these amino acids.
My collaborator in Ottawa has already shown that the corresponding 'preferred ' USS-encodable tripeptides are overrepresented in proteomes containing USS. What I've done here is extended that result to show that even the tripeptides we thought were least versatile have become overrepresented in the H. influenzae genome.
Very neat result!
ReplyDelete