Parsing Blast Results from Your Favorite Database
From GcatWiki
This tutorial demonstrates how to quickly parse out hits from a blastn search on a local blast database. A python script (blastParse.py) is used to parse the data. This script was developed to identify what pieces of an unfinished blueberry genome (scaffold/contig) typically had the most instances of chloroplasts or mitochondrial DNA.
NOTE: This tutorial assumes the user has blast version 2.2.24 installed and has already made their local blast database (on their computer). It is also written for Macintosh users; however, all scripts and tools are Windows compatible or have similar programs for Windows.
- Open terminal and navigate into the folder containing the blast query sequence and the blast data base using following Unix commands
cd OR ls
- Run your blast search using the command (remember to replace the command parameters)
/usr/local/ncbi/blast/bin/blastn -query querySequence.fasta -db dataBase.fasta -outfmt "7 qacc sacc evalue qstart qend sstart send" -out blast_output.txt
- What does this command do? "-" indicates a command. The text that follows is the actual command parameters.
- -query = query file
- -db = database to search
- -outfmt = output format DO NOT CHANGE THIS UNLESS YOU KNOW HOW TO EDIT blastParse.py
- -out = output file for blast results
- What does this command do? "-" indicates a command. The text that follows is the actual command parameters.
- Download blastParse.py
- Place blastParse.py into the same folder as your blast results file
- In terminal, navigate into the folder containing blastParse.py and your blast results file
- Run blastParse.py using the command
python blastParse.py
- Follow prompts of the BLASTPARSE program
- Results will be saved as tab delimited data in text files. If you would like to visualize the data, open files in excel to make graphs (right click > open with > excel).
A video of this tutorial can be found here.