I've spent so much time fiddling with the MatrixPlot settings to get the best visualization of the correlation analysis that Matrix Plot won't let me submit any more jobs. (Who knew the site had a limit of 50 jobs per 24 hr period?)
I've done the final analysis with a set of 3466 sequences, each 39bp) and each containing at least a rough match to the USS motif. These were obtained by motif searches told that 3000 sites were expect on each strand; 1650 and 1816 were found. 1454 of these contain perfect matches to the 9bp core consensus, 512 have one-off matches, 557 have two-off matches, and 943 have core sequences that match the consensus at no more than 6 places. I think having this many mismatched sequences gives the analysis the power to detect correlations even between the highly-conserved core positions.
First, look at the control figure to the left. This shows analysis of 3500 random sequences, each 39bp long, taken from random segments of both strands of the H. influenzae genome. The bar charts at the top and left can be ignored - they show the 'information' at each position, but the scale for these bars only goes from 0.0 to 0.00 (weird, I know. I guess '0.0' represents zero, and '0.00' represents less than 0.01),
It's a bit surprising (to me) that the few scores higher than 0.002 are mostly found between positions separated by 3 (positions 1 and 4, positions 3 and 6, positions 4 and 7, positions 9 and 12, etc.). I suspect this has something to do with the way coding for proteins constrains the genome, but it's not something I'm going to follow up.
Here's the 'experimental' image. It shows significant correlations only between close-neighbour positions, and only between neighbours within each of the two flanking conserved AT segments. I suspect that even these 'significant' correlations are quite weak (the highest correlation score is only 0.107), but I don't understand the analysis well enough to be sure. The documentation is very brief; I may need to send someone an email asking for clarification.
(Here's a logo as a reminder of the motif.)
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
2 comments:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
The CBS tools website is great, but most of their services have limits - not good for people who want to play with whole genomes. Most websites limit input in this way for good reasons - you can't have users submitting all of GenBank to your site!
ReplyDeleteThe MatrixPlot paper says that it's available on request, so they'll license it to you if you email them (should be a contact on the website somewhere). I've obtained SignalP and TMHMM from them before, they're pretty helpful and the standalone software is worth having if you're using it a lot.
Having enough power depends on how much variation is present at each site. Although having that many sequences increases the probability of having variation at each site, it still doesn't tell you that you have enough power at all sites. I don't think it matters anyway. In the 9 bp core, all of the sites are "correlated" with one another and we know this anyway.
ReplyDelete