Parsing Blast Results from Your Favorite Database

From GcatWiki
Jump to: navigation, search

This tutorial demonstrates how to quickly parse out hits from a blastn search on a local blast database. A python script (blastParse.py) is used to parse the data. This script was developed to identify what pieces of an unfinished blueberry genome (scaffold/contig) typically had the most instances of chloroplasts or mitochondrial DNA.

NOTE: This tutorial assumes the user has blast version 2.2.24 installed and has already made their local blast database (on their computer). It is also written for Macintosh users; however, all scripts and tools are Windows compatible or have similar programs for Windows.

  1. Open terminal and navigate into the folder containing the blast query sequence and the blast data base using following Unix commands
    cd OR ls
  2. Run your blast search using the command (remember to replace the command parameters)
    /usr/local/ncbi/blast/bin/blastn -query querySequence.fasta -db dataBase.fasta -outfmt "7 qacc sacc evalue qstart qend sstart send" -out blast_output.txt
    What does this command do? "-" indicates a command. The text that follows is the actual command parameters.
    • -query = query file
    • -db = database to search
    • -outfmt = output format DO NOT CHANGE THIS UNLESS YOU KNOW HOW TO EDIT blastParse.py
    • -out = output file for blast results
  3. Download blastParse.py
  4. Place blastParse.py into the same folder as your blast results file
  5. In terminal, navigate into the folder containing blastParse.py and your blast results file
  6. Run blastParse.py using the command
    python blastParse.py
  7. Follow prompts of the BLASTPARSE program
  8. Results will be saved as tab delimited data in text files. If you would like to visualize the data, open files in excel to make graphs (right click > open with > excel).


A video of this tutorial can be found here.