I spent most of yesterday poking around in databases, hoping to get a better understanding of H. influenzae's competence-induced DNA ligase. Specific questions: Are some or all of the H. influenzae alleles non-functional? What other bacteria have homologs of this gene? Are the homologs typically targeted to the periplasm? Is there any other evidence associating them with DNA uptake? What is known about their function? I spent a lot of time chasing red herrings but ended up with some results.
Two homologs have been characterized biochemically, one from H. influenzae and one from Neisseria meningitidis. Both were found to be in the adenylated state when isolated from cells (I think this is common for DNA ligases), and both catalyzed typical ATP-dependent ligation reactions when purified - they could ligate nicked double-stranded DNA but could not catalyze joining of blunt ends. As I've mentioned before, this activity is puzzling for a protein that is predicted to not be cytoplasmic, since there's no ATP in the periplasm.
There's a concern about the functionality of the H. influenzae alleles, because the original Rd sequence annotation had two ligase-related genes, but these were the result of a frameshift mutation and restart at a downstream in-frame ATG. This was an 'authentic' frameshift present in the Rd genomic DNA, not a sequencing error. The ligase gene had been cloned and the protein shown biochemically to be a functional ATP-dependent DNA ligase, so we initially assumed that the translation machinery just skipped through the frameshift. However, none of the other H. influenzae sequences has this frameshift, and when we realized yesterday that the protein tested by the biochemists didn't come from the Rd strain we started worrying that maybe the Rd allele is nonfunctional. But even the independently determined sequence of another Rd strain lacks the frameshift, and when the postdoc checked his sequence of our Rd he found that it lacks the frameshift too. So the mutation must be only in the version of Rd that was originally sequenced by TIGR.
The search for homologs was complicated by annotation inconsistencies, but the basic result is that the distribution of this gene is odd. Many but not all Pasteurellaceae have it; to a first approximation I'll say that members of the 'Hin-clade' have it and members of the 'Apl-clade' don't. (It's hard to be sure because the genus names are a mess - they don't correspond with the real phylogeny.) The Enterobacteriaceae don't have it. All the Vibrios do, and all the Campylobacters. Most or all of the Neisseriaceae have it (Neisseria, Kingella, Eikenella). Shewanella and relatives have it. A few other groups, and a lot of species I've never heard of that aren't on the Wu and Eisen tree.
The red dots show the locations of the tree of species or groups that have this gene. It's in the epsilon-proteobacteria, in a few families of the gamma-, beta- and delta-proteobacteria. It might be in a few other groups, or these might be misassignments. The next-closest homologs are in phage and eukaryotes; the other bacterial ATP-dependent DNA ligases are in other ligase groups. Could this distribution have arisen by multiple horizontal transfer events, perhaps by phage? I don't think it makes sense as massive gene loss from a bacterial common ancestor.
The annotation inconsistencies are a pain in the butt. First, VanWagoner et al. called this gene ligA, but this name is widely used for the ubiquitous distantly-homologous NAD-dependent DNA ligase that's essential in all bacteria, and there's no standard name for this ATP-dependent ligase. I guess we should suggest one when we discuss this gene in our knockout-mutant paper (ligB? ligC-for-competence? ligK because it's a k-family ligase like those of eukaryotes?). The problem is made worse because some annotators have erroneously labelled ATP-dependent ligases as NAD-dependent ligases, and I suspect that some bacteria have genus names that don't reflect their true relationships (certainly in the Pasteurellaceae).
There's also uncertainty about the N-terminus of the protein. It's important to have the right start codon because the N-terminus sequence determines the localization predictions. Some Pasteurellaceae homologs are predicted to use a start codon that's 36 aa upstream of the predicted H. influenzae start; H. influenzae can't use this upstream start codon because there's an intervening stop codon.
I used PSORTB to look for evidence of targeting to the cell envelope for 9 representative proteins. Four were strongly predicted to be cytoplasmic: all are Pasteurellaceae (G. anatis, H. somnus, H. parainfluenzae and A. aphrophilus). The other five were targeted to the envelope, but the program could only assign a specific location for Shewanella putrefaciens (cytoplasmic membrane); the other four could have been ineither membrane, in the periplasm, or extracellular (H. influenzae, V. cholerae, N. gonorrhoeae and N. meningitidis).
The gene has been knocked out only in H. influenzae, first by VanWagoner et al. and now also by us. The mutants grow normally; VanWagoner et al. reported a small transformation defect but we saw none, both in our new marked and unmarked mutants and when we introduced their mutation into our KW20 background. My Google searching turned up a poster from a research group at AstraZeneca; they showed that the H. influenzae NAD-dependent ligase is essential, and concluded that its function can't be replaced by the ATP-ligase. But maybe the results would have been different if they had overexpressed the ATP-ligase, as it's under CRP-S regulation and levels may be very low in noncompetent cells.
Bottom line: We don't know what function(s) this protein serves in any bacterium and its phylogenetic distribution is weird. Our finding that it's in the H. influenzae competence regulon is the only clue to its function in any species, and figuring out what it contributes to competence would be a big advance. So we're going to try to complete the former undergrad's transformation tests.