Field of Science

  • in The Biology Files
  • in inkfish
  • in Life of a Lab Rat
  • in The Greenhouse
  • in PLEKTIX
  • in Chinleana
  • in RRResearch
  • in The Culture of Chemistry
  • in Disease Prone
  • in The Phytophactor
  • in The Astronomist
  • in Epiphenom
  • in Sex, Genes & Evolution
  • in Skeptic Wonder
  • in The Large Picture Blog
  • in Memoirs of a Defective Brain
  • in C6-H12-O6
  • in The View from a Microbiologist
  • in Labs
  • in Doc Madhattan
  • in The Allotrope
  • in The Curious Wavefunction
  • in A is for Aspirin
  • in Variety of Life
  • in Pleiotropy
  • in Catalogue of Organisms
  • in Rule of 6ix
  • in Genomics, Evolution, and Pseudoscience
  • in History of Geology
  • in Moss Plants and More
  • in Protein Evolution and Other Musings
  • in Games with Words
  • in Angry by Choice

HI0659/HI0660 update

My clever strategy for making a double knockout of the HI0659 and HI0660 genes has been derailed by the absence of plasmid in one of the E. coli strains and the absence of the SpcR resistance cassette from the other strain's plasmid.  I suspect that I've been given the wrong strains...

Publications update

This morning we got an email from Nucleic Acids Research with provisional acceptance of the postdoc's manuscript on Haemophilus influenzae uptake specificity.  The reviews were short and favourable so we should be able to get the revisions done quickly.

My opinion piece on genetics teaching is in press at PLoS Biology.

The visiting grad student's paper on Gallibacterium anatis transformation is in press at the Journal of Applied and Environmental Microbiology.

The RA's first paper this year (on E. coli competence) is already published in PLoS One.

My short essay on 'Do bacteria have sex' has now appeared in a collection of essays titled Microbes and Evolution: The World that Darwin Never Saw, published by the American Society for Microbiology.  (only $14.95)

The RA's second paper this year, on her H. influenzae competence-gene knockout collection, is under review at the Journal of Bacteriology.  (Update!  Later the same day we received a 'provisional acceptance' email for this too.  One of the reviewers described it as "an exemplary, thorough study that completes what is arguably the first global definition of a complete competence regulon."

And what about our GFAJ-1 #arseniclife paper?  After receiving largely favourable reviews from Science we submitted the revised manuscript on April 13.  Yes, that's six and a half weeks ago, and they still haven't reached a final decision.  If our email queries had gotten any interesting responses I couldn't tell you about them, because we've been cautioned that correspondence between Science editors and authors is confidential and that alerting the press to a manuscript under review may jeopardize its acceptance.

Yuck!

  

I found this in a small box of antibiotics in the cold room. It's mycophenolic acid that had somehow eaten through the foil ring (blue) that held the bottle's rubber seal in place.  

I had no idea why we would have this chemical, but Wikipedia says it's an inhibitor of de novo purine biosynthesis in eukaryote cells (but maybe not in E. coli?), so maybe we had been planning to try it on a protist or on Haemophilus.  I threw it out of course.

Strategy for making the HI0660/HI0659 double mutant

Here's my plan for making the mutant strain knocked out for both HI0660 and HI0659:

I'll start with the two single-knockout plasmids that the RA made by recombineering.  Both were made from the same parent plasmid containing a chromosomal segment (green) containing both genes and about 500 bp of flanking DNA on each side.  In the left plasmid the HI0660 gene has been replaced by a SpcR cassette (orange).  In the right plasmid the HI0659 gene has been replaced by the same cassette.


I'll cut both plasmids with the same two restriction enzymes.  SpeI cuts in the vector, close to the left end of the insert, and SacII cuts in the SpcR cassette, close to its right end.  Then I'll inactivate the enzymes (with heat or phenol extraction), mix the two digested DNAs, ligate the mixture and transform it into E. coli, selecting for AmpR and maybe SpcR.

The single fragments won't be able to self-ligate because the enzymes give different sticky ends, but 10 different bi-molecular ligation products are possible.  The plasmid I want (A+D) is shown below.  Three of the others won't be able to replicate (A+A, A+C, C+C), and three others will contain inverted duplications of the vector (B+B, B+D, D+D) and thus probably be unstable; they'll also be much larger than A+D and not SpcR. The other unwanted combinations will also be larger than A+D (A+B, C+D and B+C) so I should be able to easily distinguish them.
Once I identify the plasmid I want I'll just transform our wildtype H. influenzae strain with its insert DNA and select for SpcR.  Then I can find out whether deleting HI0660 eliminates the need for HI0659.

HI0659 progress and plans

I'm making progress in figuring out how the knockout of gene HI0659 prevents cells from becoming competent.  I don't know the answer yet, but I've ruled out some alternatives.  We know that the mutation blocks both DNA uptake and transformation, so the defect is not at the translocation or recombination steps - the mutant cells must either fail to induce their competence genes correctly or be blocked at some point in DNA uptake (assembly of the uptake machinery or uptake itself).

The first experiments were to see if treatments or mutations that normally induce competence would override the competence defect of the knockout strain (strain RR3112, HI0659::spc, but for convenience here I'll just call it HI0659-).  Competence induction requires that the CRP protein bind to its cofactor cyclic AMP (cAMP) and then induce transcription of competence genes, which is normally synthesized under competence-inducing conditions.  Adding cAMP restores competence to cells unable to synthesize it, so I tested whether the HI0659- competence defect was corrected by adding cAMP.  It's not.

I could also test whether the HI0659 mutation interferes with the ability of CRP, by assaying the strain's ability to ferment CRP-regulated sugars.  But I don't need to do that because one of the other experiments I've done (described below) shows that CRP regulation works normally in HI0659 mutants.

I also tested whether competence is restored to the HI0659- strain by mutations that cause expression of competence genes under conditions that normally repress this (hypercompetence mutations).  We have two sets of these mutations, in the sxy gene and in the murE gene.  I made double mutants by transforming these strains with DNA of strain RR3112, selecting for its SPcR cassette, and tested their competence.  They were not competent at all, even after normal MIV induction, so the defect isn't that the competence genes just require stronger-than-normal induction.

The next test asked whether HI0659- cells fail to induce competence genes.  This is a bit odd to think about since HI0659 is itself a competence gene whose transcription is induced by Sxy and CRP+cAMP, but maybe once some HI0659 gene product is made it increases or stabilizes transcription or translation of the other genes, or protects transcripts from degradation.  As I explained here, we have 'reporter' strains that let us detect transcription of the comA and rec2 competence genes because these gene's promoters have been fused to a lacZ gene, whose beta-galactosidase product is easy to detect with a colorimetric assay.

I introduced the HI0659- mutation into four fusion strains, two carrying a comA::lacZ fusion and two carrying a rec2::lacZ fusion, and assayed their production of beta-galactosidase and their competence.  Here are the results:


All the HI0659 mutants have the same beta-galactosidase levels as their HI0659+ parents (yellow bars and tubes).  Only the comA parent was included in this assay (the leftmost column), but you can see the induced and uninduced activities of both parent strains in the previous post.  Importantly, all the HI0659- strains were completely non-transformable (blue bars), confirming that they had replaced their HI0659+ allele with the HI-659- allele.

This result tells us that the HI0659 mutation does not act by interfering with normal transcription or mRNA stability of comA or rec2.  It's possible that it specifically affects the expression of another of the competence genes, but this is unlikely.

We've been preparing to do 'RNA-Seq' analysis of the HI0659 mutant - this analysis uses Illumina of other 'next-gen' sequencing of reverse-transcribed mRNAs to measure the amounts of transcripts present in the cell.  We have been hoping that it would reveal changes in transcription caused by the mutation, but the lacZ fusion results make that unlikely.

The RNA-Seq analysis is expensive and quite a lot of work - should we still do it?  The controls we'd need to do would give us lots of solid information about the regulation of competence, complementing the microarray analysis we did ten years ago.

I have one more analysis to do, suggested by the bioinformatics analysis I did a couple of weeks ago, described here.  The bioinformatics suggested that the HI0660/HI0659 gene pair might be a toxin-antitoxin system (or derived from one), with HI-0660 being the 'toxin' and HI0659 the 'antitoxin'.  If so, then HI0569's job is likely to be preventing HI0660 from doing something that prevents competence.  This is consistent with the normal phenotypes of the HI0660::spc and HI0660 unmarked mutants.  They both take up DNA and transform normally, even though the HI0660::spc insertion might be expected to interfere with expression of the downstream HI0659 gene and thus reduce competence.  If HI0659's job is just to stop the HI0660 product from doing something that prevents competence, then  the competence defect of the HI0659 mutant should be corrected by adding a HI0660 knockout.

This is simple in principle (just transform the HI0660 cells with HI0659::spc DNA), but complicated by how small the two genes are and how close they are to each other.  We have the E. coli plasmids carrying the mutations, and my plan is to instead construct a new plasmid that's deleted for both genes, with one spcR cassette inserted, and transform wildtype cells with this DNA to get the desired double mutant.  How easy this construction is will depend on whether the HI0660/0659 genes and the spcR cassette have convenient restriction sites, so I'm going to spend this afternoon looking for them.



That's more like it!

I'm using fusions of lacZ to the comA and rec2 competence genes to find out whether the HI0659 mutation acts by blocking competence induction.  The first step was to put the HI0659 mutation (strain RR3112, HI0659::spc) into strains carrying these fusions.  That was easy, because the fusion strains are still transformable (the fusions were introduced as duplications of the chromosomal comA and rec2 genes rather than as replacements), and because I can select for the HI0659 mutation using spectinomycin.

We has two versions of each fusion strain in the freezer - the original strains sent to us by their creator, and derivatives we'd made by transforming the fusions into our standard strain KW20.  I decided to start by using them both, in case anything wonky turned up.

So I made the strains competent and transformed them with RR3112 DNA and, as a control, with our standard MAP7 DNA.  Both transformations worked fine, with transformation frequencies between 10^-3 and 10^-2.  I streaked two colonies from each RR3112 transformation onto chloramphenicol plates to make sure they still had the fusion - only one didn't.  The next steps are to freeze these now strains (in case we want to do more with them), and to make them competent by incubation in MIV starvation medium.  I'll then test the competent cells for transformation (should be negative) and for expression of the lacZ fusions.

But first I needed to check that competence induction did induce the fusions on the parent strains.  When I had made these competent (for the RR3112 transformations) I had frozen aliquots of log-phase and competent cells, so I thawed them out and did beta-galactosidase assays on them.  My first set of assays were a complete failure (no yellow colour even after 18 hours!), because I'd used 10% SDS rather then 0.1%, but the second set worked great, with bright yellow colour after 20 minutes.


Here's the graph:


I forgot to label the XY-axis - it's the OD420 reading, indicating the level of expression of the fusion.  (I didn't bother to convert these numbers into Miller units.)  Three of the strains have almost no fusion expression in log phase and high expression after competence induction, which is what we expect of strains with normal competence regulation.  But the fourth strain (878) has high expression in log phase, because it also carries a mutation (murE749) that causes the competence genes to be highly induced even in log phase, giving a 'hypercompetent' phenotype.  I'll include the HI0659 derivative of this strain in my assays as a control.

Why does white gunk develop on the anode?

 

The gunk is soft, almost-gel-like.  In the photo it's sitting in lumps in the buffer in the bottom tank, but only because I gently scraped it off the anode wire with a spatula after I took the gel plates out of the apparatus.  The anode was clean before I ran the gel.

The gel buffer was TBE with 10 mM MgCl2 added; might this be Mg(OH)2?

Later:  The white gunk is alkaline and dissolves in acid but not alkali.  It's not from the gel.  It also appeared when I ran a test minigel using TAE buffer with 10 mM MgCl2 added.

Why my gels wouldn't set?


This is the expiry date on my bottle of TEMED (the catalyst for acrylamide polymerization).

500 ml?


I think our polypropylene measuring cylinders must shrink with age or autoclaving.  The black lines mark the height of the water when this '500 ml' cylinder is filled with 500 grams of water.  (Well yes, the temperature is only ~20°C, not 25°C).  The markings are off by about 50 ml!

And the USS is bent at...

So yesterday I poured and ran my first polyacrylamide gel in many many years.  Actually I ran my second gel, because the first one didn't set at all.  I was quite proud of myself for remembering a lot of little tricks, like flushing the wells before loading them and flushing the bubbles out from under the gel.

And the results told me that the USS is bent at the T-tracts.



Above is the gel photo. I tested seven different versions of the USS sequence, each embedded in an otherwise-identical 200 bp fragment.  The white bands in the gel are the positions that the DNAs migrated to during the electrophoresis (6% acrylamide in 0.5 x Tris-Borate buffer, run at 60-70 volts for about 5 hours).  'S' indicates DNAs that I scored as running slower, and 'F' DNAs that ran faster.

Below is the key to the differences between the DNAs:

The first good result is that the randomized-sequence DNA did run faster than the consensus USS, as it had in the former postdoc's experiment five years ago.  That might not look obvious in the gel photo, where the random fragment (in the leftmost lane) is at about the same position as the USS fragment on its right.  But this end lane ran slower than the others because it stayed cooler - I've tried to indicate this by the shape of the blue tracking-dye bands I've drawn onto the gel.

Most of the other DNA fragments ran with the same mobility as the USS, but DNAs 6 and 7 ran faster, like the randomized-sequence DNA.  These are the only two DNAs whose T-tracts are changed: 6 has one T-G substitution in each T-tract, and 7 has these plus the same two outer-core changes as DNA 4.

You may wonder why I didn't run the DNAs for longer. to better resolve the migration differences.  But I expected the DNA to have run much further; the Molecular Cloning manual I was using as a guide said that xylene cyanol (the upper turquoise dye band, labeled 'xc') would migrate at the same rate as a 260 bp DNA fragment in a 5% polyacrylamide gel, and at the same rate as a 160 bp fragment in an 8% gel, so I expected my DNAs to coincide with the xc band.

The only explanation I can think of is that I put 10 mM MgCl2 into the gel buffer but forgot to also put it in the running buffer in the tanks.  The manual says that even minor differences in ionic strength 'can greatly distort the migration of DNA'.  So I should probably repeat the gel with the correct buffer.  Maybe I'll also try running it in the cold room.

What part of the USS is bent?

For years I've been referring to the gel photos and DNA-structure as evidence that the H. influenzae consensus USS sequence is naturally bent.  The gel images on the left (gel run by a former postdoc) show that a 222 bp fragment containing the consensus USS migrates slower than the control fragment with a randomized-sequence version of the USS.  Slow migration is an established consequence of DNA bending.  The structure diagram on the right (generated by the MDDNA website, which appears to have gone tits-up) compares the same consensus and randomized sequences, and predicts a slight bend at each of the USS T-tracts.


The current postdoc's analysis of uptake bias indicates that the T-tracts play a relatively minor role in uptake specificity, which got me thinking about whether it really is the T-tracts that are bent.  We're in a position to test this, since he's synthesized a number of different variants of the USS for uptake assays, and I can easily repeat the gel analysis to see if any of the changes affect gel migration.

Below is a list of the different USS variants that he has synthesized.  (The DNAs don't contain 'n's; I've just put these in at positions where the bases don't matter for DNA uptake.)


From the top, we have:
  • the consensus USS
  • three different USS with changes in different parts of the outer core of the USS (A4G, T6G, and T11G)
  • a USS with two changes in the outer core (A4G + T11G)
  • A USS with a change at the most important position in the inner core (C8A)
  • a USS with two changes, one in each of its two T-tracts (T17G + T27G)
  • a USS with 4 changes, two in the outer core and one in each T-tract (A4G + T11G + T17G + T27G)
  • a control 'USS' with the same bases as the consensus USS but in randomized order.

These variants let us examine effects of changing the inner core, the outer core and the T-tracts, singly and in combination.  We expect to see slower migration of the consensus fragment than the randomized fragment.  Finding that all the variant fragments have the same mobility as the consensus fragment might not tellus much because the individual changes we're testing might not have dramatic effects (e.g. maybe we'd need to change the whole T-tract, not just one T).  But finding that one of more variants migrates faster than the consensus would be very informative.

The fragments we'll test aren't cloned; the psotdoc generates them when needed using PCR and specific primers.  So he's going to make them for me, and clean them up to remove primers and primer-dimers.  Then I'll check their DNA concentrations (because concentration can affect gel migration) and run them all in a gel.  

I'll try to dig up the gel details in the former postdoc's notebooks - I/m guessing 1% agarose, TAE buffer, and a fairly high voltage, but she might have done something different.  The gel image was included in our 2007 CIHR grant proposal so I'm guessing she did the experiment in January or February of 2007.  OK, I found the postdoc's notes (December 2006).  The gel was 6% acrylamide, run at 70V, with 10 mM MgCl2 added to the buffer.  Hmm, I wonder if we have acrylamide made up.

She then resequenced both the DNAs to confirm that they were the same lengths (these were cloned fragments).  This seems a bit silly for the PCR fragments we'll test, though it might we worth it if the gel results are interesting.  I can't think of any other way to check that the fragments are the same lengths - I don't know of gel condition where bending wouldn't happen or where it wouldn't affect mobility.


Questions about the new 'journal' PeerJ

Yesterday people were tweeting about a new open-access journal called PeerJ, which at present offers a provocative web page and blog but no solid information.


Their website says:
The $99 Sustainable Model
PeerJ is establishing a new sustainability model. Researchers will be able to purchase Lifetime Memberships, starting at just $99, giving them the rights to publish their articles in our peer reviewed journal. All published articles are made freely available to the public. Subscription fees made sense in a pre-Internet world, but now they just slow the progress of science. It's time to change that.
But there's no explanation of what they will offer or how it will be accomplished.  Apparently this will be revealed in a few weeks.  In the meantime viewers are encouraged to build the buzz by pointing more people to the site.

There's a good discussion here about the costs of running a site, comparing it to the ArXiv and to PLoS ONE.  Maybe this could be feasible.

But for me the big issue is the peer review.  Peer review has critical parts that can't be automated or delegated to authors, and if this fails then the journal becomes just another repository for anything researchers want to post.  Somebody has to select the reviewers - authors can be encouraged to suggest them, but peer review that depended entirely on the reviewers suggested by the authors would be close to worthless.  And getting reviewers to agree to take on a manuscript, and to do the reviewing they promised, takes more than automated emails.  And someone with appropriate expertise has to read the reviews and make a decision about whether the manuscript should be published.  Is this all going to be done by volunteers?  Or is this going to be a journal with very low standards?

The soon-to-be-former Publisher of PLoS ONE, Peter Binfield, is apparently behind PeerJ.  He certainly should know what he's doing, but this combination of spin and secrecy is something I expect of companies looking to make a buck out of the unwary.


Update June 13:  The PeerJ project was officially announced yesterday. John Dupuis has a big list of links here.

HI0659/HI0660 bioinformatics

I haven't done my experiment with the HI0659 knockout yet, because I'm waiting for the RA to pop in and get me the correct strain she made (it hadn't been frozen in the lab stocks).  In the interim I've been poking around in the databases to see what I can find.

HI0659 has a helix-turn-helix motif and not much else (it's only ~100 aa long).  Hi 0660 is about the same length; it has a motif typical of Holliday junction resolvases.

In our paper about the knockout mutants, our summary table says that no homologs of HI0659 have been found, and in the text we say that alleles of HI0660 in other H. influenzae strains often contain deletions and that homologs are missing from most Pasteurellaceae.  I've now found good homologs of both HI0659 and HI0660 in Actinobacillus pleuropneumoniae, but nothing in any other Pasteurellaceae.  Homologs in other bacterial groups are rare - the only genus whose name I recognize is Streptococcus, where several species have homologs of both genes, but many don't.

Here's the Genomes Region Comparison view from the JCVI Comprehensive Microbial Resource.  I included quite a few other Pasteurellacean and Streptococcal species in the search, but only these gave homologs.


A BLAST search with the HI0659 protein sequence turns up homologs in only the same species.  Some of these are annotated as members of an 'Xre-family toxin-antitoxin system' (I think HI0660 is homologous to the toxin component, and HI0659 to the antitoxin component).  HI0660 is also tagged as a member of the Gp49 superfamily (also phage proteins I think).  Xre family repressors are known to perform a variety of regulatory functions unrelated to toxin-antitoxin systems' (ref).  The same paper suggests that the Tad toxin components might be mRNA-cleaving ribonucleases.  Maybe that's what HI0660 does, and HI0659 is a repressor that prevents it from acting.  If so, and if sxy mRNA was HI0660's target, then the mutant phenotypes would make sense. 

Maybe I should transform the HI0659::spc mutation into our lacZ reporter strains (sxy::lacZ, comA::lacZ and rec2::lacZ) and see if it affects their expression.  I think I might have pooh-pooh'd this idea when the postdoc or RA suggested it a few months ago, but now maybe I see the light.  We're going to do RNA-seq on the HI0659 mutant, but if its only job is to repress HI0660 we might miss this. But, if HI0660 is a mRNA-destroying RNase, we might see the loss of other mRNAs.

Back to the bench

I think I'm about done with manuscript writing for a while.
  • The RA's manuscript on natural competence in E. coli, has been published in PLoS One.
  •  The visiting grad student's paper on natural competence in Gallibacterium anatis is in press at Applied and Environmental Microbiology.
  • The RA submitted her manuscript on all her H. influenzae competence knockout mutants last week; it's now under review at J. Bacteriology.  
  • The post-doc's DNA uptake manuscript will be ready to submit tomorrow or Monday to Nucleic Acids Research.  
  • On Thursday I submitted what I hope are the final revisions for my PLoS Biology opinion piece on genetics teaching.  
  • And, we're still waiting for the word from Science about our revised #arseniclife manuscript (it was sent back to the reviewers).
So it's time to do some experiments.  In fact I hope that the whole summer will be time to do some experiments.  I think I'll start with something easy, checking the knockout mutant of HI0659 for effects of known competence inducers.

HI0659 is a competence-induced gene that is needed for both DNA uptake and transformation.  Lots of genes are needed for these processes, but what's surprising about HI0659 is that the protein it specifies doesn't appear to be part of the DNA uptake machinery.  Not only does it lack any features of known DNA uptake proteins, it's small, cytoplasmic, and has a helix-turn-helix domain (these usually function as DNA binding elements).

I'm going to check whether any of the known competence inducers can turn on competence when HI0659 is knocked out.  Two of the inducers I'll test are genetic - the sxy-1 mutation and the murE749 mutation, both of which cause cells to be competent when they normally wouldn't be (hypercompetent).  The third is the small molecule cyclic AMP.

I'll need to make double mutants (sxy-1 HI0659::spc and murE749 HI0659::spc).  This should be straightforward - I'll just transform the hypercompetent mutants with DNA from HI0659::spc and select for spectinomycin resistance.

First I need to streak the cells out - no time like the present.  Then on Monday I'll do a quick DNA prep of HI0659 and transform the hypercompetent mutants.  No need to make them competent - they'll be competent enough straight from the plate, or in log phase growth.

Monday:  The cells all grew up nicely but I'm not going to proceed until the RA has confirmed (by email, she's on leave) that the HI0659 strain in the freezer stocks is the correct new one, not the incorrect old one.  The label looks old, and there's no note in the strain list saying that the vials of the incorrect strain were discarded and replaced..

A simple bioinformatics experiment, done with Word

One of the results of the postdoc's lovely analysis of DNA uptake specificity is that one block of four positions of the 32 nt 'uptake sequence' are critically important for uptake.  This is 5'-GCGG-3' (and 5'-CCGC-3' in the other strand).

The H. influenzae genome contains 10,044 occurrences of this sequence, but a random sequence of the same length and base composition is expected to only contain 4107.  This suggests that the molecular drive arising from biased DNA uptake may have caused the the excess ~6000 occurrences to accumulate in the genome.  We know that about 2000 of them have strong matches to the full uptake sequence motif, but what about the rest?  Might they also have more-or-less-weak matches, because they are under weaker  drive?

The postdoc could do a thorough test of this using R, but he's busy with the final polishing of his uptake-motif paper (to be submitted by Monday, we hope).  So I just did a quick test using Word.


I had used Word's Find/Replace function to count the GCGGs and CCGCs, so I did it again, this time highlighting the occurrences.  I copied the sequences around the first 30 GCGGs in the genome sequence, and around 30 CCGCs from the middle of the genome, aligned them by hand, and used WebLogo to look for any patterns in the flanking sequences.

No patterns.

Interaction effects (yes, again...)

I'm mostly very confident that the ideas I wrote in my last post were correct, but the postdoc still has doubts, and each time I try to explain my position to him I come away with my own little niggling doubts.   So here's a different perspective.

This time I'll start from the biology of DNA uptake and the consequent accumulation of preferred sequences in the genome.


Our computer simulation model showed that the sequences that accumulate in the genome have the same properties as the bias of the uptake machinery.  The model could consider only the base biases of each position individually (there were no between-position interaction effects), but I think we can safely expect that any interaction effects in the uptake bias will also be seen in the sequences that accumulate in the genome.

When we, as researchers, examine the sequence patterns of the preferred-uptake sequences and the overrepresented-in-the-genome sequences, we first compare the single-position pattern and then compare the interaction-effect pattern.  Because the two sets of sequences are the same, we will find the same patterns.

        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Now let's consider a real research situation, where we don't know anything about the underlying biology. We identify a set of sequences that are preferred by the uptake machinery, and another set that are overrepresented in the genome.  We analyze each set, and find that the single-position patterns are different.


What can we infer about the underlying biology?  We must conclude that the two sets of sequences  have different properties, and thus that the sequences preferred by the uptake machinery are not the same as the sequences overrepresented in the genome.  The differences must be due to post-uptake forces that alter the sequences accumulating in the genome.

We might go on to analyze the interaction effects in the uptake set of sequences and in the genomic set of sequences.  These analyses may give us insights into the uptake process and the post-uptake forces, but they won't change the fact that the two sequence sets are different.