Field of Science

Two steps forward, one step back?

The work and ideas underlying the US variation manuscript are getting better, but the manuscript itself is getting farther from completion, not closer.

Advance #1: Realizing that I can use a log scale for the x-axis to see if runs have really reached equilibrium.

Retreat (the opposite of advance?) #1: Realizing that runs I had thought were at equilibrium aren't, so that conclusions I had thought were solid are not, so I don't really know what the true results are!

Advance #2: Realizing that I should think of/write about the elevated fragment mutation rates as degrees of divergence.

Advance #3: Remembered that the Introduction says that the goal of the work is to evaluate the drive hypothesis as a suitable null hypothesis for explaining uptake sequence evolution. Our results show that it is; the accumulation of uptake sequences under our model is strong, robust, and has properties resembling those of real uptake sequences.

Progress: Going through the Results section, annotating each paragraph with a summary of the all of the relevant data, annotated by whether this is solid equilibrium data (fit to use) or not. This is taking a lot of time, because I have to identify and check out (and sometimes enter and graph) the data, but the results aren't as bad as I had feared. Runs that have already finished fill some of the new-found gaps, and others are already running but won't finish for a few days (some longer). So I'm going to finish this annotation, queue whatever runs I think are still needed, and then maybe spend a few days on the optical tweezers prep work (at long last) while the runs finish.

Two views of the same data

Here is some data from a series of runs where I simultaneously varied the lengths and numbers of the recombining fragments, so that the total amount of recombination remained the same (e.g. 1000 fragments 100 bp long or 50 fragments 2 kb long). I concluded that the runs were close to equilibrium and that the runs that got their recombination with the shortest fragments reached the highest equilibrium score.

But wait! This is the same data, now with the X-axis on a log scale. Now I see something quite different - after the first few cycles, the scores of all the runs are going up at the same rate (same slope), and their rate of increase is very log-linear. None of the runs show any sign of approaching equilibrium (i.e. of leveling off).

I had said I would always do runs both from above (a seeded genome) and from below (a random-sequence genome) and take as equilibrium the score the up and down runs converged on. I didn't do that here, but I see I should have.

I don't know whether I've done any runs that went long enough that the score obviously leveled off when plotted on a log scale. If not I should.

***Later: Some of the runs I've done clearly do level off even on the log scale. This is good. But some of the runs that I've been treating as at equilibrium (runs for which I've done an up run but not a down run) haven't leveled off at all, so I'm not justified in making any assumptions about where they'll stop. Time to run them longer, and do some down run counterparts.

A clearer way to look at the mutation rates?

I think some of my problems understanding the effects of mutation rates in our model are created by the words I've been using, words that describe an aspect of what the model does rather than what it represents.

In most of the runs I've done, the genome being followed (the focal genome) mutates at a specified rate in each cycle (usually 0.0001 or 0.00001 mutations per bp), and the DNA fragments that are recombined mutate at a 100-fold higher rate. This is designed to simulate the fragments coming from cells that shared a common ancestor with the focal genome 50 generations ago. This is a biologically reasonable setup.

In the model itself the differences are handled as different mutation-rate settings that are applied to the same mutation-generating subroutine, and that's how I've been thinking of them. But I now think that referring to both types of changes (in the genome and in the fragment) as due to different mutation rates has created problems, causing me to feel that I have to justify using two different rates, and that the model would somehow be simpler or purer if the two rates were the same.

But what if I instead referred to changes in the fragments as being due to 100 generations of divergence, rather than to a 100-fold higher mutation rate? I'm going to try this and see if my thinking and writing become clearer.

Effects of mutations

One continuing puzzle about the results from the Perl simulation of uptake sequence variation is the independence of equilibrium score and mutation rate. It's true that mutations both create and destroy uptake sequences, but I wouldn't expect the effects to consistently cancel each other out. Even though that's what happened in a very simple mathematical model I did years ago.

I suspect that the interactions between mutation and uptake sequence accumulation are more subtle than their independence would seem to suggest. So here are several questions that I think I can quickly answer with more runs:

1. Does changing the proportionality of µg and µf change the equilibrium? (I already know there's about a 2-fold equilibrium score difference between 100:1 and 100:100, but I want to test more proportions.)

2. Does changing fragment length but keeping the number of fragments recombined per cycle the same change the equilibrium?

3. Does changing the fragment length but keeping the amount of DNA recombined the same change the equilibrium?

4. When µf = µg, does changing the fragment length but keeping the number of fragments recombined per cycle the same have no effect? Maybe not, because long fragments are more likely to change other US?

Line breaks in Word 2008

I have to chop a 200 kb file into 20 kb pieces, because the USS position weight matrix I'm using (derived from Gibbs analysis of the H. influenzae genome) is so fastidious (???) that runs take forever. Specifically a 200 kb simulation that's using a pre-evolved sequence with quite a few uptake sequences already in it has taken 28 days to complete about 3300 cycles and it's about to exceed its pre-specified time limit (800 hours, about 33 days) and be terminated before it finishes. Terminating prematurely means that it won't report the sequence it has to painstakingly evolved. And I had even given it a tenfold higher mutation rate to help it run fast!

Anyway, my clumsy solution was to chop the 200 kb input sequence into ten 20 kb segments, and evolve them all in parallel. Because Word is good with work counts, I opened the sequence file (as a text file) in Word and marked off every 20 kb with a couple of line breaks. Then I opened the file in Textedit and deleted everything except the last 20 kb to get a test file (no line breaks at all, that I could see). But it generated an 'unrecognized base' error message when I tried to use it, so my first suspicion was that Word had somehow generated a non-Unix line break.

Sure enough, opening the file in Komodo showed that it had. But surprisingly, the problem wasn't a Mac-style line break, but a DOS/Windows line break! Maybe Word 2008 thinks all .txt files are for Windows?

My coauthors cut the Gibbs analysis!

Well, not completely, they give it a single paragraph, but without any explanation of what the Gibbs motif sampler does or why that's important. I'm going to expand it back to a few paragraphs (maybe half of its original length). What will this say?

First, that we need to be using uptake sequence datasets that reflect how uptake sequences actually evolve (as motifs). We need information about both the nature of the motif for each genome (as a position weight matrix) and the positions of sequences fitting this motif. The matrix won't necessarily reflect the true biases of the respective uptake systems, but it's the best estimate we have.

Second, that Gibbs analysis found many more positions, but that the matrices based on these gave logos similar to those from previous searches for core-consensus and singly-mismatched uptake sequences. Not surprisingly these logos gave more weight to the flanking sequences that had been omitted from the core-consensus searches.

Third, that we did the Gibbs analysis for all genomes with uptake sequences.

Fourth, that we used the datasets to analyze patterns previously reported, and found no evidence of motif differences due to direction of replication or transcription, nor to location in translated vs untranslated regions.

Fifth, the more extensive variation in the Gibbs datasets allowed us to look for covariation between the bases present at different positions of DUS and USS. The only covariation this found was between very close positions, indicating that more distant interactions between bases are unlikely to play roles in uptake.

Time to think about doing an experiment!

OK, the draft NIH proposal is in the hands of the internal reviewers. Early in January they'll give me lots of (I hope harsh) feedback on what needs to be fixed. Till then, I've got two big things on my plate.

One is the US variation manuscript. It isn't exactly on my plate yet, as my coauthors are still assembling their drastic revisions back into a coherent manuscript, but they say I should get it tomorrow. I'm hoping it's now in pretty good shape, with my job now just to polish the writing and the figures so we can get it submitted within a few weeks.

The other is beginning the groundwork for the optical tweezers experiments. Let's see what I can remember about what needs to be done. In general, I'm going to use Bacillus subtilis cells as my positive control, because they're bigger and tougher than H. influenzae. They're also naturally competent, and were successfully used for an optical tweezers study of DNA uptake in 2004.

1. I need to be able to stick cells onto glass coverslips without killing them or otherwise preventing them from taking up DNA. I'll start with Bacillus subtilis, and with poly-L-lysine coated cover slips. I'll need to make these myself - I have several protocols and several offers of help, but don't know if any of these people will be around over the holidays. The alternative to poly-L-lysine is a silane solution (fancy name) that was used for the B. subtilis experiments. But I don't have a protocol for using this, so it's a bit of a crapshoot. Some of the poly-L-lysine protocols say to pre-clean the coverslips with strong acid (nitric? chromic?) - a researcher down the hall said he might have some (old-tyme labs are good to have around).

2. I need to attach streptavidin to polystyrene beads. I have a protocol, and the streptavidin, and the coupling reagent, and the ready-for-coupling beads (they may be a bit small, 1 µ rather than 2 µ, but they'll do). What I don't have is a good way to test how well the coupling has worked (see below).

3. I need some biotin-conjugated DNA (H. influenzae chromosomal DNA). The research associate made some a while back for a different experiment, but I don't know if there's any left, or where it would be. I could make my own, if I can find the biotin.

4. I need to make the B. subtilis competent. This means that I need to make up the appropriate culture medium and competence-inducing medium (2 kinds, as I recall), and the appropriate agar plates for selecting transformants (so I can test whether they really are competent).

5. Once I have the streptavidin-coated beads and the biotin-coupled DNA, and some competent cells, I can test whether cells will stick to beads that have been incubated with DNA but not to beads without DNA or to DNase-treated beads. If this works I will know that there's streptavidin on the beads and biotin on the DNA and the cells are competent. If it doesn't I'll only know that at least one thing isn't right.

6. At this stage I can also test whether the cells I've stuck onto a coverslip can still bind DNA, by giving them the beads-plus-DNA and seeing if the beads stick to the cells (with the same controls as in step 4). Oh, but first I have to make sure that the competent cells will also stick to the coverslips.

7. Then I can make some competent H. influenzae and try steps 5 and 6 with them. Assuming I've been able to stick the H. influenzae cells onto coverslips).

8. After all this is working, I'll be ready to go back to the physics lab and try to measure some forces!

A draft NIH proposal

With the US variation manuscript in the hands of my co-authors, I've spent the past week working on our planned proposal to NIH. It's not due till the beginning of February, but I have some internal reviewers lined up and I promised I'd get them a presentable draft by today. Which I will - this morning I just have to tidy up the last bit (how we'll characterize the transformability QTLs we hope to find) and polish a couple of paragraphs. (Note that this is far in advance of my usual grant-writing schedule - feels good).

We have three Specific Aims: 1. Characterize recombination tracts in individual transformants. 2. Measure recombination efficiencies across the genome, in pooled transformant genomes. 3. Identify loci responsible for transformability differences.

There's still lots of work to be done once I hear back from the reviewers (hopefully in a couple of weeks). The Methods aren't yet as good as they can be (assuming the goals don't change, which they might). The preliminary sequence data needs more analyzing and there are a couple of preliminary experiments we really need to do. The 'grantsmanship' needs LOTS of work (more selling of us, more potential problems and fallbacks). There aren't any references yet. The writing still needs work, but not until after the content is improved.

And I've done nothing about the other parts of the proposal (Budget etc).

In the meantime, my co-authors have sent me their revised draft of the US variation manuscript...

Manuscript progress

The US variation manuscript is in the hands of my co-authors. They've taken a step that I thought might be needed but I had lacked the stamina to undertake - rearranging everything so it starts with the analysis of multiplicatively scored simulations rather than additively scored ones.

I didn't want to do this only because it would mean setting up and analyzing yet more simulation runs. That's because there were some variables for which I had good data for the additive case (which I was presenting), but less thoroughly done data for the multiplicative case. It's not that I had any reason to doubt that the multiplicative data I had was correct - it just wasn't sufficiently replicated to make good figures.

Luckily, now that my co-authors have dealt with the reorganization, there turn out to be only a few cases needing more work. I think most of these runs are now done, but I still have to analyze the results and generate figures.

But I've promised to have a semi-presentable version of our NIH proposal ready for the internal reviewers by Monday, so the US variation data analysis will have to wait until that's done.

We have a plan (again, but it's a new plan)

Specific Aims, in the form of questions:

Aim I. Characterizing the genetic consequences of transformation:
A. How do transformant genomes differ from the recipient genome? We want to know the number and length distributions of recombination tracts. This will be answered by sequencing a number of recombinant genomes (20? more?), preferably using multiplexing. We have preliminary data (analysis of four) showing 3% recombination.

B. How much do recombination frequencies vary across the genome? This will be measured by sequencing a large pool of recombinant genomes. The sensitivity of this analysis will be compromised by various factors - some we can control, some we can't.

C. Are these properties consistent across different strains? We should do 2 or more transformants of 86-028NP and of a couple of other transformable strains.

D. How mutagenic is transformation for recombined sequences? For non-recombined sequences? Is mutagenicity eliminated in a mismatch repair mutant? If not, is it due to events during uptake or translocation?
Aim 2. Characterizing the genetic differences that cause strain-to-strain variation in transformability: (The results of Part A will guide design of these experiments.)
A. What loci cause strain 86-028NP to be ~1000-fold less transformable than strain Rd? (Are any of these loci not in the CRP-S regulon?) We will identify these by sequencing Rd recombinants (individually and/or in one or more pools) pre-selected for reduced transformability.

B. What is the effect of each 86-028NP allele on transformability of Rd, and of the corresponding Rd allele on 86-028NP? Are the effects additive? Do some affect uptake and others affect recombination?

C. Are transformation differences in other strains due to the same or different loci? This can be a repeat of the analysis done in Aim 2A. Does each strain have a single primary defect?

D. How have these alleles evolved? Have they been transferred from other strains? Do defective alleles have multiple mutations, suggesting they are old?
In the Approach section we'll explain how we will accomplish these aims, and why we have chosen these methods. In the Significance and Innovation sections we'll need to convince the reader that these aims will make a big difference to our understanding of bacterial variation and evolution.

Aim 1 will provide fundamental information about bacterial recombination (and associated mutation), which will put almost all studies of bacterial evolution on a more solid footing. Aim 2 will help us understand why natural transformation has such dramatic variation in populations of many different bacteria, and thus how natural selection and other processes act on transformability.

Bacillus subtilis

Yesterday I streaked out some old B. subtilis strains that I would like to use as positive controls for the optical-tweezers experiments.

You gotta love bacteria that are still fully viable after 18 years as a slant in the refrigerator (not the freezer, the fridge).

My turn to do lab meeting, yet again!

(I really do need to get some more people into the lab.)

What will I talk about? I wish I'd been doing some experiments. What did I talk about last time? I don't have my notes here, but I think it was the US variation manuscript - I think I went through the figures. Since then I've mostly been working on the NIH proposal, so I guess that's the best thing to talk about.

The Specific Aims keep morphing. Here's what I thought was the stable version:

Aim 1. Characterize the recombinome by sequencing DNA from millions of pooled transformants.

Aim 2. Characterize the biases of the steps that make up the recombinome, by sequencing:
a) millions of chromosomal DNA fragments and degenerate uptake sequences taken up by competent cells,
b) chromosomal DNA fragments taken into the the cytoplasm of competent cells,
c) chromosomal DNA recombined by a non-RecA recombinase,
d) DNA of millions of transformants of a strain unable to do mismatch repair.
Aim 3. Map loci responsible for transformation differences between two strains.

But the post-doc's new data suggests high mutation rates in recombined sequences, and this may mean that we should put more emphasis on what he calls the transmission genetics. That is, we should first pin down the general properties of what actually gets recombined. How much DNA is typically replaced? In how many segments? What is their size distribution? Are tracts of donor DNA interrupted by recipient alleles? Do indels recombine cleanly, by homologous recombination in flanking sequences, or do we see non-homologous recombination at one end? What is the frequency of new mutations in recombined DNA (and in the rest of the genome)? This information is best obtained by sequencing the DNA of individual transformants, not a big pool.

This should probably become (part of?) our first Aim. Should it be an Aim in itself? Will we then have too many Aims?

I can see two different directions we could take this: I. We could make a big shift in direction, getting rid of all of Aim 2 and expanding Aim 3 to include more strains. That would give a much more streamlined proposal, one focused on the consequences of recombination and not the component steps. Or II. we could get rid of Aim 3, keeping the focus on the processes that give rise to recombinant genomes. Both are areas I think are really important. Direction I fits well with the work a previous post-doc did, characterizing the variation in DNA uptake and transformability of a wide range of H. influenzae strains. Direction II fits better with my desire to understand the role of uptake sequences, but this goal is really only addressed by Aim 2 a, and that's already included in another proposal, the one we submitted to CIHR in September.

In either case we should probably demote Aim 2 c to an optional goal, unless we can get the data showing that this alternate recombinase (lambda Red) does indeed work in H. influenzae. That would be a very cool result, but it's not central to this proposal.

Keyboard rehab

I wondered if the death of my Apple aluminum keyboard might be not a direct consequence of the tea I spilled in it, but rather due to starchy goo created when the tea contacted the millions of nanoparticle-sized cracker crumbs that had probably slipped through the narrow gaps surrounding the keys over the last two years.

So I again took the batteries out, and washed it for a long time under warm running water, massaging the keys to loosen any crud stuck under them. Then I dried it over night in the 37°C incubator (the one with a fan that circulates warm dry air).

And voila - it works fine again!

Grant writing

I'm beginning to think that years (decades) of espousing unpopular ideas have left me with an arrogance deficit.

Bacteria moving on surfaces

A colleague just sent me a manuscript about differences between bacterial cells growing in broth (or in soft agar) and the same cells growing on agar plates under conditions where they can use flagella to move across the agar surface. This is important work. When we study cell physiology we usually use cells that are growing in well mixed broth, because this lets us assume that all the cells are experiencing the same conditions (stochastic molecular-level factors excepted).

An aside: For similar reasons we usually use broth cultures in exponential growth, because we expect the physiology of such cells to be independent of the culture density. Once the density gets high enough to affect cell growth, the culture will no longer be growing exponentially, and minor differences in cell density can cause big differences in cell physiology. Unfortunately many microbiologists are very cavalier in their interpretation of 'exponential', and consider any culture whose growth hasn't obviously slowed as still being in log phase.

The usual lab alternative is to grow then on the surfaces of medium solidified with agar. This is quite convenient, as most kinds of bacteria can't move across the usual 1.5% agar, so isolated cells grow into colonies. The density of cells on an agar surface can get very high (a stiff paste of bacteria), because the cells are being fed from below by nutrients diffusing up through the agar.

Depending on the agar concentration, the film of liquid on the surface of the agar may be thick enough (?) to allow bacteria that have flagella to swim along the surface. Because the bacteria often move side-by-side in large groups this behaviour is called 'swarming'. Often swarming on agar surfaces is facilitated by surfactants that the bacteria produce, which reduce the surface tension of the aqueous layer. I've always assumed that bacteria living in soil and on other surfaces produce such surfactants as a way of getting surface-adsorbed nutrients into solution (that's how the surfactants we use in soaps and detergents do their job), but maybe surfactants are also advantageous for moving across surfaces with air-water interfaces, such as damp soil. The side-by-side cell orientation and movement may also be a consequence of surface-tension effects, as illustrated in this sketch.

One commonly observed effect of high density growth on agar is less sensitivity to antibiotics. We and many others have noticed that we need higher antibiotic concentrations on agar plates than in broth cultures (or vice versa, that antibiotic-resistant bacteria die if we grow them in broth at the same antibiotic concentration we used on agar plates). We also directly see density effects in our transformation assays - if we put a high density of antibiotic sensitive cells on a plate, we often see more 'background' growth of the sensitive cells. (Sometimes we see the opposite - resistant cells can't form colonies when they're surrounded by too many dying sensitive cells.)

But why would more dense bacteria be more resistant to an antibiotic? One possibility is that the individual cells aren't more resistant, but because more cells are present, more cells get lucky. If this were true we'd expect the number of cells in the 'background' to be directly proportional to the number of cells plated. A more common interpretation is that the presence of other cells somehow protects cells from the antibiotic. We know that resistant cells can protect sensitive cells from some antibiotics, if the mode of resistance is inactivation of the antibiotic. This is especially powerful if the resistant bacteria secrete an enzyme that inactivates the antibiotic, as is the case with ampicillin. This effect occurs both in broth and on agar plates.

But can sensitive cells protect other sensitive cells? Might dying cells somehow sop up antibiotic, reducing the concentration their neighbours are exposed to? Might an underlying layer of sensitive cells protect the cells above them from antibiotic?

The big problem I see is that bacteria are so small that concentrations will very rapidly equilibrate across them by diffusion. The agar in the plate is about 500 µ thick, and the cells are only about 1 µ thick, so there should be far more antibiotic molecules in the medium than the sensitive cells can bind**. Thus I don't see how a layer of sensitive bacteria could use up enough of the antibiotic to significantly reduce the effective concentration for the cells above. Even if the cell membrane is a barrier to diffusion of the antibiotic, there's going to be enough fluid around the cells for the antibiotic to diffuse through.

** But I haven't done the math. OK, back of the envelope calculation puts the number of molecules of antibiotic at about 10^15/ml (assume a m.w. of 5 x 10^2 and a concentration of 5 µg/ml). The density of cells on top of the agar might be 10^12/ml. If each cell were to bind 1000 molecules of antibiotic (that seems a lot, but maybe it's not), they would together bind up all the antibiotic from an agar layer equivalent to the thickness of the cell layer. But the thickness of even a very thick layer of cells is no more than a few % of the thickness of the agar, so the overall antibiotic concentration would only decrease by a few %.

eyboard kaput

or the first time in about 25 years of computer use, 've spilled some liquid tea with milk and sugar into my keyboard. t's one of those lovely tiny apple aluminum wirelss keyboards. t wasn't very much tea, and quickly took out the batteries and rinsed the keyboard very thoroughly with distilled water and dried it overnight in the 378 incubator. ut it's toast. haracters type while holding the shift key down don't appear, and the 's' and 'f' keys tend to get stuck on strings of 'sssss' and 'fffff' showing up randomly. hat's not happening right now, but you can see the effects of the shift problem.

uckily ondon rugs has new ones on sale for 69, so 'll try to get one tonight.

Choosing a topic for a NIH proposal

I'm reading some excellent grantwriting advice from the Nov. 12 issue of the NIH NIAD newsletter. It frames its suggestions with the instructions given to NIH reviewers, emphasizing that the most important thing they're looking for is evidence that the proposed research will have a significant impact on the field:
"Your ultimate task is to judge the likelihood that the proposed research will have an impact on advancing our understanding of the nature and behavior of living systems and the application of that knowledge to extend healthy life and reduce the burdens of illness and disability. Thus, the first assessment should be “Is it worthwhile to carry out the proposed study?”"
This is excellent advice, and I'm going through the draft we have now, identifying places where we can point to impacts in various fields (pathogenesis, recombination, evolution).

But I have a harder time relating to another part of the advice, on how to choose a topic to propose research on. It appears to be directed at people who know they want to get a research grant but don't really care what topic they'll be researching. "I know how to do A, B and C, so I'll propose to find out D."