Field of Science

Searching for motifs

Well, I don't know why my first attempt at running a Gibbs motif search with the Gallibacterium genome returned errors. The errors it described were ones I hadn't made (as far as I could tell), so I resubmitted the runs and they were fine.

But the grid-computer system was slow to get around to my runs (probably those physicists and meteorologists hogging the system), so I poked around and rediscovered that the Gibbs motif searcher program now also runs on Macs. Luckily one of the post-docs had just come back from a two-week course that required intensive use of Unix, so she was able to dive in and sort out the permissions etc. for me. So now I can run the Gibbs searches on my newish MacBook Pro and on our other fast Mac. And I can also still run them on the grid system too.

Results: I can't find anything (i.e. Gibbs can't find anything) in the Gallibacterium genome that looks anything like a typical Pasteurellacean uptake sequence. I've done simple searches where I just asked it to look for an 8-mer motif, and searches wherre I gave it 'segmentation-mask' prior files telling it the spacing typical of a H. influenzae or A. pleuropneumoniae USS, but it still didn't find anything. Most of the time it can't find anything it even considers to be a motif at all. This may be because the USS motifs are too sparse for it to pick them up, or because Gallibacterium really doesn't have a USS at all.

But I'm trying one more thing - giving it a prior file that specifies not just the spacing but the actual position weight matrix to expect. If this doesn't find anything we may need to do some uptake-specificity experiments.

Can I remember how to run a Gibbs motif sampler analysis?

Our visiting grad student is working with Gallibacterium, a Pasteurellacean relative of Haemophilus. To help her optimize transformation we would like to find out about its uptake bias. As a first step, we'd like to find out whether it has repeats in its genome that resemble the known Pastuerellacean uptake signal sequences (USS) - fortunately a Gallibacterium genome sequence is available. I've done this analysis for all the other sequenced Pasteurellacean genomes, so I said I'd do this one too. Should be easy...

My first approach was to give the genome sequence to our Perl program that simulates USS, not because I want to do that, but because the program's first step is to count the numbers of full and partial USS matches in the starting sequence. The program was set up to do that for the H. influenzae USS (AAGTGCGGT), but when it didn't find many of these in the genome I modified it to find the other type of Pasteurellacean USS (ACAAGCGGT). It didn't find many of those either.

So, perhaps Gallibacterium has a previously unknown version of the USS. Or perhaps it has an unrelated USS. Or perhaps it doesn't have a USS at all, which would suggest that it has weak or no uptake bias. What was needed was analysis with the Gibbs motif sampler, which would look for any common repeat in the genome. OK, I did lots of those last summer, so I can do it again.

I remembered how to submit a sequence for analysis, but I didn't bother to carefully check what the different settings do bfore submitting the run. That was stupid, because 36 hours later I've received two emails fromt he system, telling me that my requested run failed. One says "ERROR:: Mismatched width ranges" and the other "ERROR:: Palandrome (sic) subscript overflow". Guess I'd better buckle down and sort it out.

CRP maniscript revisions submitted, on to Gallibacterium...

As usual, it took me about 3 tries to get the manuscript resubmitted with all the files in their correct forms. But it's done.

I'm going to try to get back to the bench next week, doing some competence-induction experiments with Gallibacterium, brought to the lab by our visiting grad student. Oh boy!

Manuscript almost ready to go back

Our manuscript about how the CRP proteins of E. coli and H. influenzae differ in their sequence specificity has been provisionally accepted, and the revised version is almost ready to send back to the journal. We weren't able to do the one experiment requested by the editor, but we make what we think is a pretty good argument about why it isn't needed.

The only remaining problem is that some of the figures look a bit weird, I think due to being shuffled between different formats. The dark grey shading in some of the bar-graph bars has turned into a dark grey check pattern. My former-postdoc-coauthor converted the figures into high-resolution PDFs so he could email them to me, but maybe he should instead post them to one of those file-sharing sites where I can download them. I know Google Groups works for this, but I think there are also sites dedicated to this.

The Response to Reviewers letter has been written and revised and polished, so once the figures are sorted out I think I can sit down and do the on-line submission.

Where are they now? Part 2

Two more lines of research that we're no longer working on:

3. When did eukaryote sexual reproduction begin? During the first 10 years that I was working on competence, I fully intended to switch to studying the origins of meiosis in eukaryotes. The plan, and the reasons I set it aside, are explained in this post from last summer. (Fortunately John Logsdon has taken up the torch.)

As the first steps in this project Joel Dacks, then a M.Sc. student in my lab, and I published two papers on the phylogeny of early-diverging eukaryotes. These results have been since confirmed by more detailed analyses, although the deep phylogeny of eukaryotes is still rather obscure.

4. Quorum sensing and/or diffusion sensing:
Most bacteria secrete small more-or-less inert molecules into their micro-environments and monitor the external concentrations of these molecules.When this autoinducer-secretion was first discovered it was proposed to be a means of cell-cell communication, evolved to enable bacteria to monitor the cell density of the population they are living in and to respond with appropriate changes in gene expression. This "quorum sensing" explanation quickly became dogma, despite having serious theoretical/evolutionary problems. In retrospect, this acceptance was partly because there were no alternative explanations for the evolution of autoinducer secretion and sensing, and partly because the idea that bacteria are secretly talking to each other is very appealing.

In 2002 I published an opinion piece (in Trends in Microbiology) proposing a much simpler explanation, that the secreted molecules serve as inexpensive sensors of the diffusional properties of each cell's microenvironment, and thus allow cells to secrete expensive effector molecules (such as degradative enzymes) only when they and their products will not be lost by diffusion. This ‘diffusion sensing’ hypothesis was welcomed by evolutionary biologists but largely ignored by the many researchers actively investigating quorum sensing. My lab initially tried to develop experimental systems to demonstrate that isolated cells use secreted autoinducers for gene regulation, but gave up because of the technical problems of monitoring gene expression at the scale of single isolated cells.

However the paper now gets regular citations in reviews of quorum sensing, and several other research groups have produced evidence validating the importance of diffusion in autoinducer regulation. The latest is a study of Pseudomonas cells on leaves (Dulla and Lindow PNAS 2007), which found that diffusion and other physical factors in cells' microenvironments are major determinants of this regulation. They pointed out that my proposal "has received little attention despite the extensive study of QS in many species", and even quoted approvingly my sentences about what research is needed.

Where are they now?

In the course of updating my CV I've been checking what's become of hypotheses and projects we initiated but are no longer working on. The good news is that all of them are still active areas of research, and the ones I consider most important are getting increasing attention. Here's a quick overview of two of them.

1. Mutation rates in males vs females: In response to a paper reporting that point mutation rates are much higher in males than females (because sequences on X chromosomes evolve slower than sequences on Y chromosomes), I used a computer simulation model to show that the excess mutations in male lineages usually canceled out the benefits of sexual recombination for females (Redfield Nature 1994). This paper made a big media splash when it came out; Natalie Angier wrote it up for the New York Times, Jay Leno made a joke about it, and it even got a paragraph in Cosmopolitan! This was partly because the title was full of buzzwords 'sex', 'male', 'female', 'mutation', and partly because I wrote up a very clear useful press release.

It didn't make much of a scientific splash, and it hasn't had much impact on subsequent work on the evolution of sex, but the number of citations continues to increase. Many citations are from a European group of theoretical physicists who publish mainly in physics journals, but others are from evolutionary biologists. One 2007 review discusses the implications of my work, referring to it as 'a seminal study' (which I choose to interpret as not just a bad pun).

The hotspot paradox: Most meiotic crossing-over happens at chromosomal sites called recombination hotspots; the largest influence ont he activity of these sites is the DNA sequence at the site. While I was still a grad student I realized that, over evolutionary time, active hotspot sequences should disappear from genomes, being replaced first by leas-active and then by inactive sequences. This is because the mechanism by which hotspots cause recombination also causes more-active hotspot sequences to be physically replaced by less-active sequences. At that time the genetic evidence was strong but little was known about the molecular details. This creates a paradox, because hotspots have not disappeared (each chromosome has many of them).

About 10 years later I returned to this problem, using detailed computer simulations to model the evolution of hotspots. We first created a deterministic model of a single hotspot, and showed that none the forces opposing hotspot elimination (evolutionary benefits of recombination, benefits of correct chromosome segregation, direct fitness benefits of hotspots that also act as promoters, singly or in combination) were strong enough to maintain hotspots against their self-destructive activity. Several years later we created a better, stochastic, model that followed multiple hotspots on a chromosome - this confirmed and strengthened the previous conclusions.

The first paper (Boulton et al, PNAS 1997) was ignored by just about everyone, particularly the molecular biologists whose work might be expected to resolve the paradox. By the time the second paper was published (Pineda-Krch and Redfield 2005), evidence from human genetics had confirmed that the hotspot destruction originally studied in fungi also occurs in humans. Now, the increasing ability to examine individual crossover events at base-pair resolution has focused attention on the paradox, and most papers about hotspots in natural populations (including humans) mention it as a sign that the evolutionary history of recombination hotspots remains perplexing.

I'll write up a couple more of these projects tomorrow.