Here's a nifty new result. I was working on the Results section of the USS manuscript revisions, to include a result I posted about last month.
I had found that the distribution of coding-region USSs between the six possible reading frames paralleled the abundance of the tripeptides these USSs would encode in the H. influenzae proteome. This suggested that the reason some frames had more USSs than others was because USSs in some frames encoded more versatile tripeptides than others.
I now decided that this analysis needed some controls, so I scored the frequencies of the same USS-encodable tripeptides in the proteomes of several other bacteria: E. coli, B. subtilis, and (because it has the same base composition as H. influenzae) Listeria monocytogenes. The relative proportions of these six tripeptides seemed comparable to those of H. influenzae, but I noticed that the raw numbers were smaller than I would expect given the sizes of the respective proteomes (all the control bacteria have larger genomes than H. influenzae). So I normalized the tripeptide counts by dividing by the total number of amino acids in each proteome, to get what I could call tripeptide density.
The results are in the top graph. The blue bars are the H. influenzae tripeptide densities, the other colours are the tripeptide densities of the controls. Note that the H. influenzae proteome has disproportionately high densities of the USS-specifiable tripeptides.
So the 'control' needs another control - the densities of tripeptides that aren't encodable by USSs. I chose these by taking the same amino acids in backwards order (e.g. VAS instead of SAV). This is good because it doesn't change the abundances of the single amino acids making up the tripeptide. Here's that graph. Again the blue bars are H. influenzae and the other colours are the control proteomes. And note that now the blue bars are nothing special - H. influenzae has the same densities of these tripeptides as do the control proteomes.
So what does this mean? It means that the biased uptake and recombination of USS-containing DNA fragments has not been neutral for the proteome. USS not only accumulate at protein-coding positions where they don't change the amino acids, they accumulate at positions that wouldn't otherwise have used these amino acids.
My collaborator in Ottawa has already shown that the corresponding 'preferred ' USS-encodable tripeptides are overrepresented in proteomes containing USS. What I've done here is extended that result to show that even the tripeptides we thought were least versatile have become overrepresented in the H. influenzae genome.
Neuroscience and other theory-poor fields: Tools first, simulation later
5 hours ago in The Curious Wavefunction