In this analysis we'll be directly detecting a combination of (1) the transformation-specific biases of DNA uptake and translocation and (2) the non-specific biases of cytoplasmic nucleases, the RecA (Rec-1) strand annealing machinery, and mismatch repair. We'll be able to subtract the former from the latter, using information we'll get from the uptake and translocation parts of the NIH proposal (these are not in the Genome BC proposal).
RecA and its homologs provide the primary mechanism of homologous recombination in all cellular organisms. This recombination generates most of the new variation that's the raw material for natural selection (acting on mutations - the ultimate source of variation). Recombination is also a very important form of DNA repair; bacterial mutants lacking RecA are about 1000-fold more sensitive to UV radiation and other DNA-damaging agents.
We know almost nothing about the sequence biases of RecA-mediated recombination. Such biases are likely to exist (all proteins that interact with DNA have sequence biases), and they are likely to have very significant long term consequences for genome evolution. The better-characterized biases of DNA repair processes are known to have had big effects on genomes; for example, the dramatic differences in base composition between different species are now though to be almost entirely due to the cumulative effects of minor mutational biases compounded over billions of years of evolution.
RecA promotes recombination by first binding to single-stranded DNA, coating the strand with protein. I don't know whether the DNA is entirely buried within the protein-DNA filament, or whether the RecA molecules leave the DNA partly exposed. (Better find out!) The filament is then able to 'invade' double stranded DNA, separating the strands (?) and moving until it finds a position where the DNA in the filament can form base pairs with one of the strands.
(Break while I quickly read a book chapter about this by Chantal Prevost.)
OK, the ssDNA has its backbone coated with RecA, but its bases are exposed to the aqueous environment and free to interact with external ('incoming') dsDNA. The dsDNA is stretched by its interactions with the RecA-ssDNA filament (by about 40% of its B-form length); this may also open its base pairs for interaction with the based of the ssDNA. But the pairs might not open, and the exposed bases of the ssDNA would instead interact with the sides of the base pairs via the major or minor groove in the double helix. A hybrid model (favoured by Prevost) has the exposed bases pairing with separated As and Ts of the dsDNA, but recognizing the still-intact G-C base pairs from one side. Prevost favours a model in which base pairing interactions between the stretched dsDNA and the ssDNA then form a triple helix (the stretching opens up the double helix, making room for the third strand), which is then converted to a conventional triple helix before the products separate.
With respect to our goal of characterizing the sequence bias over the whole genome, she says:
"Taken together, the available information on sequence recognition does not allow one to ratio the sequence effects on strand exchange or to extract predictive rules for recognition."So there's need for our analysis.