One of the things it does is show that the densities of uptake sequences in genes does not correlate well with the 18 'COG functional categories' that genes have been assigned to. This is a significant result because a previous paper claimed a strong correlation between high uptake sequence density and assignment to a modified COG functional category containing 'genome maintenance genes'. This result was considered to provide strong support for the hypothesis that uptake sequences exist to help bacteria get useful new genes, a hypothesis I think is dead wrong.
Our hypothesis is that the distribution of uptake sequences among different types of genes (with different functions) should only reflect how strongly these gene's sequences are constrained by their functions. Our analysis would be quite a bit more impressive if we showed a positive result - that uptake sequence density in different COG functional categories correlates well with the degree of conservation of the genes in these groups. My first idea was to ask my bioinformatics collaborator to do this analysis. But I suspect it might be a lot of work, because she's only done any COG analysis with the A. pleuropneumoniae genome, but we would want analysis done with the H. influenzae and/or N. meningitidis COG functional categories genes, looking at the % identity of the three 'control' homologs we've used for our other analysis.
So I'm wondering whether someone might have already done a version of this analysis. Not with H. influenzae or N. meningitidis, and not with the control homologs we've used, but any general estimate of levels of conservation of genes in the 18 COG functional categories. I searched Google Scholar for papers about COG functional group divergence and found a good review of analyses one can do in the COG framework. This got me hoping that maybe there was a web server that would let me do the analysis myself, but the paper didn't describe anything that would do the job.
But I looked deeper in Google Scholar's hits and found something that looks very promising. It examines rates of sequence evolution across genes in the gamma-proteobacteria. H. influenzae and the control genomes we used with it are all in the gamma-proteobacteria, and I think the paper has looked specifically at the relative rates of evolution of genes in different COG functional categories, so this might be exactly what I'm looking for. The only problem is, the paper appeared in PLoS Genetics, and their site is down right now! I'm trying to read the paper in Google's cached version, but the page is all greyed out and it can't show me the figures. Guess I'll just have to be patient and hope the site is back up soon.