This post will be about how biochemical modifications of DNA can tell us how different positions of the DNA sequence affect uptake by H. influenzae cells.
I'm planning a separate post about how we represent USS consensus sequences, and about how much information we have about the consensus, but for now I'll just present the extended 'consensus' as AAGTGCGGTnnRRWWWWnnnnnnRWWWWW, where 'n' means any base, 'R' means purine (A or G), and 'W' means weak (A or T). But above I've succeeded in including a 'sequence logo' representation of the genome consensus.
This consensus is mainly based on two kinds of information. The first is sequences that are consistently present in DNA fragments that H. influenzae preferentially takes up. The second is the consensus of repeats in the H. influenzae genome sequence. The consensus is supported by "ethylation-interference" experiments that Ham Smith's lab did about 25 years ago (Danner et al. 1980 Gene 11:311). I'm considering doing (persuading someone else to do?) more of these experiments, and a similar type of experiment called "missing nucleotide" analysis.
Both kinds of experiment work the same way. A short DNA fragment containing a USS (or variant) is radioactively labeled with 32P at one end, and then subject to a chemical treatment that modifies some of the bases or the phosphates that connect the bases. Competent H. influenzae cells are then allowed to take up (try to take up) the labeled modified DNA. It's expected that some modifications will interfere with (or sometimes enhance) this uptake, so fragments that by chance have modifications that interfere won't be taken up, and fragments that have modifications that enhance will be taken up more efficiently than the rest.
The cells, along with whatever DNA fragments they've taken up, are then washed, and DNA is prepared from them. This DNA will be a mixture of the non -radioactive cell DNA (which we can ignore) and the radioactive DNA they took up. The DNA is then treated with a second chemical that breaks the strands at the sites of modification, heated to separate the two strands, and then run in a gel and exposed to film.
If none of the modifications affect DNA uptake, we expect to see a band for each position of the sequence, as each position was equally likely to be modified and each modification was equally likely to be taken up. (We control for these assumptions by separately running in the gel DNA that was broken without ever having been exposed to cells.) But if modifications at some positions prevent fragments from being taken up, then the bands corresponding to breakage at those positions will be missing from the gel. If some modifications increase uptake, their bands will be stronger than the control bands. Here's a link to a paper that used both techniques to analyze interactions between RNA polymerase and the lac UV5 promoter.
The analysis called "ethylation interference" uses treatment with the chemical ethylnitrosourea, which puts ethyl groups (CH3CH2-) onto the phosphates that connect the bases. It chooses the phosphates randomly, so every position is equally likely to be modified. But the extent of ethylation should be limited, so that on average each DNA fragment gets modified at only a single random position.
Ham Smith's lab did this, and found that ethylation at most of the phosphates in the core interfered with uptake. A few positions outside the core, and one in the core, enhanced uptake. At that time these results were used mainly to confirm that the putative USS sequence repeat did interact with the cell's uptake machinery. The authors also speculated that they revealed the positions important for binding. (Note added later: In the next post, Representation matters, I show a figure from Smith et al. 1995 summarizing their analysis of USS in the H. influenzae genome. The *** above and below their consensus sequence indicate the positions where ethylation changed uptake.)
But now I think we may be able to use this kind of analysis to find out more about what the DNA does during the uptake process. Specifically, I suspect that some of the modifications may affect uptake by making the USS easier or harder to bend or kink. I really need to find out how ethylation is expected to change DNA bendability in general - is it more or less likely to bind at the position that's ethylated?
The other kind of modification, used by "missing nucleoside analysis", treats the labeled fragments with hydroxyl radicals. The hydroxyls destroy the deoxyribose, removing a single nucleoside (base plus deoxyribose) from one or more more-or-less random positions in the DNA. This creates a one-base gap in the strand and two extra negative charges (from the exposed phosphates I guess). The double-stranded DNA remains intact except at the site of this gap in one strand. After the cells take up the DNA it doesn't need to be broken, just heated to separate the strands before being run in the gel.
Loss of the nucleoside has two effects. First, any protein side chains that normally interact with that nucleoside make no contact with the DNA; this is likely to weaken the DNA-protein interaction. Second, the missing nucleoside and the broken backbone are likely to make the DNA more flexible and more easily denatured at the site of the gap. I think this might make DNA uptake easier. So, applied to USSs, weaker bands might identify where the DNA contacts the USS-recognizing proteins, and stronger bands might identify sites where bending, kinking or strand separation are needed.