A few suggestions from the post-doc ("Try keeping all the BLAST files in the same folder.") got my Unix problems solved. Commenters on my last post have provided lots of suggestions, some of which are useful and some of which address problems I've already solved. Neil has even set up a NodalPoint page about this problem, but I haven't yet figured out how to edit in my comments.
I've now used standalone (local) BLAST to search the database of 2136 different 39 nt segments of the H. influenzae Rd genome with the source genome (Rd) sequence. This is the 'positive control' for the searches I want to do with the non-Rd genome sequences. This worked, and let me do some trials to work out which options I should specify. Then I did the same searches with the three other genomes I have. This showed me other problems, some of which I haven't worked out yet.
I have two big remaining problems.
First, I need to understand BLAST searches well enough to optimize the alignments. I understand that mismatches close to the ends of the sequences will be under-represented, because of how BLAST works. I think this appeared in the search results - instead of alignments with single mismatches near the ends I think I got alignments that had been shortened by 4 nt. I may be able to minimize this by setting the options appropriately, but I probably can't eliminate it. Luckily the first and last 5 nt are the least important for this analysis.
Second, the information I need to get from the non-Rd searches is the locations of mismatches within the 39 nt segments. For this I found that representing the output as pairwise alignments made easiest to extract the information that specifies the positions of the mismatches within the alignments. This is relatively straightforward (yes Neil, with Word and Excel) provided the mismatches are internal to 39 nt alignments, but will need some more sophisticated tricks for alignments that are truncated at one or the other end. Another problem for extracting the position info is that half of the 39 nt sequences are in the opposite orientation to the others, and so they are reversed in the output.
The guy in the next office strongly recommended BBEdit, so I bought that. He said it does a lot of the editing chores that I'd otherwise need to write Perl scripts to do. Sounded great. But BBEdit has wandered far from its "Bare Bones Software" (="BB") roots, and learning how to use it will take some time...
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
From Valley Forge to the Lab: Parallels between Washington's Maneuvers and Drug Development4 weeks ago in The Curious Wavefunction
-
Political pollsters are pretending they know what's happening. They don't.4 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
Course Corrections5 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Not your typical science blog, but an 'open science' research blog. Watch me fumbling my way towards understanding how and why bacteria take up DNA, and getting distracted by other cool questions.
7 comments:
Markup Key:
- <b>bold</b> = bold
- <i>italic</i> = italic
- <a href="http://www.fieldofscience.com/">FoS</a> = FoS
Subscribe to:
Post Comments (Atom)
Hi
ReplyDeleteWhy don't you use Geneious for some of the alignment editing. The free version would be enough for your needs.
Paulo
I'd never heard of it! It might not do what I want for this problem but I bet it will be very useful for other things we do. I'll download the free version and tell the lab about it.
ReplyDeleteThanks.
Geneious is a very nice software, has some problems here and there but it is solid.
ReplyDeleteI mentioned it because I guess you use Macs, so BioEdit is not compatible. Another option for Macs is Clc Workbench, but it is not my favourite.
cheers
Good to see progress. I was going to mention Excel as an option. If you have delimited text output, it's easy to open in Excel and sort by various columns. Ultimately it's about getting the job done fast in a way that works for you so if you can avoid scripts/regexes, go for it. I stand by my deep hatred of Word though, for any purpose :)
ReplyDeleteIf you feel like editing the wiki page, great, but no worries if not. You need to login with the Nodalpoint user/pass, then the edit buttons will appear. Of course, anyone else is welcome to register, create and edit content there.
I'd definitely look at BLAT at some stage, if not for this project, it's far faster than BLAST and very useful for things like whole genome alignment. I included an example of some output in the BLAT section at the wiki page.
Here's Wikipedia's Smith-Waterman page - not bad.
About the "missing ends" problem - there's an alignment method named "glocal", which tries to find best local alignments that include ends. I don't know what's available for Mac in this regard, but try glocal as keyword in your web searches.
Neil, I'd like to edit the wiki page, as I see this as an experiment in open science. But I don't see any 'edit' buttons even after logging in. Where should they be? What should they look like?
ReplyDeleteWiki edit buttons - there should be one at the top left just under the page title, one at bottom left and a small one for each page section at the right.
ReplyDeleteIf you don't see them there may be a Nodalpoint permission issue. I changed you to a "registered" user type - if that doesn't help let me know and I'll check with Greg, the site admin.
Thanks Neil, now I can edit it.
ReplyDelete