Change of address
3 months ago in Variety of Life
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
 3.  We could improve this by doing a control experiment to see how big we expect chance effects to be.  One simple control is to use a microarray where both RNAs are from the same treatment. We can then use a scatter plot to compare the scores for each point.  The diagonal line then represents the 1.00 ratio expected in the absence of random factors, and the degree of scatter of the ratios away from 1.00 tells us how big our chance effects typically are.
3.  We could improve this by doing a control experiment to see how big we expect chance effects to be.  One simple control is to use a microarray where both RNAs are from the same treatment. We can then use a scatter plot to compare the scores for each point.  The diagonal line then represents the 1.00 ratio expected in the absence of random factors, and the degree of scatter of the ratios away from 1.00 tells us how big our chance effects typically are. nt we can see significant differences.
nt we can see significant differences.

"The simplest interpretation of this reduced variation is that chromosomal sites containing USSs tend to recombine with a pool of internalized fragments containing relatively few mutations within the USS, presumably because of selection for strong USS by the DNA uptake machinery at the cell surface."We don't have any direct evidence of how often cells take up DNA in their natural environment (on respiratory tract epithelium if they're H. influenzae), nor how much of this DNA comes from other H. influenzae cells rather than from unrelated bacteria or the human host, nor how biased each uptake event is, given the pool of available DNAs, nor whether unidentified sequence factors affect the probability that a homologous fragment will replace the chromosomal copy. But the reduced variation may be telling us that the net effect of these unknown factors dramatically reduces the rate at which existing USSs diverge.


 This is why we must ALWAYS do the controls.
This is why we must ALWAYS do the controls. yesterday (shown here in a different version, with the blue bars).  Both have more mismatches at the edges, and both have none at the central position.
 yesterday (shown here in a different version, with the blue bars).  Both have more mismatches at the edges, and both have none at the central position. Between-strain variation in nucleotide sequence is greatly reduced at positions that are part of the USS motif.  This is clearly seen in the figure to the left, where the blue bars representing the amount of variation for each position are small at the positions where the motif bases are tall (strong consensus).  The error bars are the standard deviations of the six datasets (forward and reverse strands of the three readily available genomes)  This was about 5000 alignments.
Between-strain variation in nucleotide sequence is greatly reduced at positions that are part of the USS motif.  This is clearly seen in the figure to the left, where the blue bars representing the amount of variation for each position are small at the positions where the motif bases are tall (strong consensus).  The error bars are the standard deviations of the six datasets (forward and reverse strands of the three readily available genomes)  This was about 5000 alignments. normalized the tripeptide counts by dividing by the total number of amino acids in each proteome, to get what I could call tripeptide density.
normalized the tripeptide counts by dividing by the total number of amino acids in each proteome, to get what I could call tripeptide density. So the 'control' needs another control - the densities of tripeptides that aren't encodable by USSs.  I chose these by taking the same amino acids in backwards order (e.g. VAS instead of SAV).  This is good because it doesn't change the abundances of the single amino acids making up the tripeptide.  Here's that graph.  Again the blue bars are H. influenzae and the other colours are the control proteomes.  And note that now the blue bars are nothing special - H. influenzae has the same densities of these tripeptides as do the control proteomes.
So the 'control' needs another control - the densities of tripeptides that aren't encodable by USSs.  I chose these by taking the same amino acids in backwards order (e.g. VAS instead of SAV).  This is good because it doesn't change the abundances of the single amino acids making up the tripeptide.  Here's that graph.  Again the blue bars are H. influenzae and the other colours are the control proteomes.  And note that now the blue bars are nothing special - H. influenzae has the same densities of these tripeptides as do the control proteomes. Commenters are offering help and advice!  Damn, I love open science! But I need to clarify what I'm trying to do.
Commenters are offering help and advice!  Damn, I love open science! But I need to clarify what I'm trying to do. Genbank has genome sequences of 12 other H. influenzae strains (shown on the left).    I want to take each of the 2000 Rd sequences and find the sequences of its homologs in the other strains' genomes.  If all the genomes have homologs of all the sequences, that will give me, in principle, 24,000 39bp sequences.
Genbank has genome sequences of 12 other H. influenzae strains (shown on the left).    I want to take each of the 2000 Rd sequences and find the sequences of its homologs in the other strains' genomes.  If all the genomes have homologs of all the sequences, that will give me, in principle, 24,000 39bp sequences. When BLAST agrees to do what I want, it produces alignments like those on the left. The upper one shows an Rd sequence to which all the genomes
 When BLAST agrees to do what I want, it produces alignments like those on the left. The upper one shows an Rd sequence to which all the genomes  have perfect matches; the lower one shows an Rd sequence several of whose homologs have a single mismatch to (yikes, is that syntax correct?).
have perfect matches; the lower one shows an Rd sequence several of whose homologs have a single mismatch to (yikes, is that syntax correct?). When I first made this plan, the field of deep phylogeny  was full of optimism that we would soon have a reliable phylogenetic tree showing the true relationships of all eukaryotes.  The figure to the left shows the state of the tree then.  The groups called Diplomonads, Trichomonads and Microsporidia were thought to be the first to have branched from the line that eventually led to plants and animals.  Microsporidians were known to have  (various weird forms of) sexual reproduction, while sex was unknown in Trichomonads and Diplomonads, consistent with an origin between the times of these divergences.  The precise branching order wasn't yet firm, but most researchers felt that all we needed was a few more sequences of a few key taxa, and maybe a few refinements to the analytical methods used to infer relationships from sequences.
When I first made this plan, the field of deep phylogeny  was full of optimism that we would soon have a reliable phylogenetic tree showing the true relationships of all eukaryotes.  The figure to the left shows the state of the tree then.  The groups called Diplomonads, Trichomonads and Microsporidia were thought to be the first to have branched from the line that eventually led to plants and animals.  Microsporidians were known to have  (various weird forms of) sexual reproduction, while sex was unknown in Trichomonads and Diplomonads, consistent with an origin between the times of these divergences.  The precise branching order wasn't yet firm, but most researchers felt that all we needed was a few more sequences of a few key taxa, and maybe a few refinements to the analytical methods used to infer relationships from sequences. But the adjacent figure shows the present state of this phylogeny.  It's from a recent review by a  collaboration of CIfAR researchers, all but one of whom were at last week's meetings (Keeling et al. Trends in Ecology & Evolution 20:670-676).  There's a pdf here; I think it's open access.
But the adjacent figure shows the present state of this phylogeny.  It's from a recent review by a  collaboration of CIfAR researchers, all but one of whom were at last week's meetings (Keeling et al. Trends in Ecology & Evolution 20:670-676).  There's a pdf here; I think it's open access.