Mastering the Art of NCBI: It's a BLAST
The National Center for Biotechnology Information (NCBI) is an organization founded in 1988 as a national resource available to the public for access to molecular biology information. NCBI creates numerous databases, online tools and research software programs to analyze genomes. The Basic Local Alignment Search Tool (BLAST) is an online tool designed to enable users to rapidly search through nucleotide and protein databases.While the website is designed for both novice and veteran users, the task of mastering the tool and the art can be daunting. This website is designed to provide a step-by-step process of how to use BLAST and interpret your results.
How to use Basic BLAST - Nucleotide Search:
When you get to the main page, you may notice you have a number of options to choose from:
To search for matching nucleotide sequences in the database, choose:
This link will take you to the page shown below.
In the entry box below Enter Query Sequence there are three possible methods of entry for your search. The first is bare sequence, which refers to simply to the nucleotide sequence (ATCG, etc.) you wish to search for.
The second method uses FASTA format, shown below. This format requires the first line to be used as a descriptor, followed by a return and the nucleotide sequence. The descriptor can be found on the website where the gene sequence was obtained.
Finally, you may choose to use identifiers such as a gene's Accession Number as the query. It is important that there are no spaces in between letters or numbers, because they will be treated as separate sequences, or BLAST will fail to read them.
Once you have entered your query, you must choose which database you wish to search.
The most widely used database is the Nucleotide Collection (nr/nt) since it encompasses a broad range of nucleotide sequences across all domains, however you may choose to search another, depending on your research.
You may wish to restrict your search hits to only those found in certain organisms, or to exclude those found in a certain organism. You may do so by entering the common name, the binomial name or the taxonomic identification. Clicking Exclude excludes hits found in this organism's genome. Furthermore, clicking the + allows you to include or exclude multiple organisms or taxa.
You have the option to further narrow your search using Entrez Query which limits searches a subset of the selected BLAST database. This tool uses special and specific syntax described on the NCBI website. This function is a specialized measure for narrowing search results, but it is only optional since the methods already described provide good results.
At this point, you need to choose the specificity of your search hits. You have three options: highly similar sequences (megablast), more dissimilar sequences (discontiguous megablast), and somewhat similar sequences (blastn). Megablast provides the small number of most exact matches, blastn provides a greater amount of matches that are not as close, and discontiguous megablast provides the greatest amount of matches that are only minimally related
At this point, clicking BLAST will take you to some intermediate waiting pages, and then to a page similar to the one below.
The color chart uses color coding to demonstrate how much of the query sequence the result hits matched. The table below provides descriptive information regarding the statistical value of the results. The results can be sorted by clicking the heading of whichever column you wish to sort by. The key values you should look at when searching for a sequence match are Query Coverage, Max Identity, and E-Value. In the first two instances, you want to have a high percentage, which correlates to a high level of matching. The E-Value or the Expected Value is a value that tells you the probability that this match was due to chance. A good cutoff for a significant match is 0.001---anything smaller than that is a statistically significant match. Examples of good E-values are 2e-98 and 3e-57.
If you scroll down, NCBI provides detailed information on each hit that was returned, including information on what each hit encodes or what is encoded in that segment of DNA. These descriptions provide links to individual pages for each gene, which may be useful to your investigation.
How to Use Basic Blast - Protein Search:
To search protein sequences return to the main page and click:
This will take you to a page exactly like the one you encountered with the nucleotide search. In the Enter Query Sequence box, you may use the same three methods of entry previously described, however the sequence must use only the amino acid single-letter code.
The search databases you will choose from are also different, because they are protein databases as opposed to nucleotide databases.
Finally, the algorithm used is slightly different. The possibilities are blastp (protein-protein BLAST), PSI-BLAST (Position-Specific Iterated BLAST), or PHI-BLAST (Pattern Hit Initiated BLAST). The most commonly used is blastp, which simply matches protein sequences to protein sequences. PSI-BLAST lets the user build a PSSM (position-specific scoring matrix) using the results of the first blastp run. PHI-BLAST performs the search but limits alignments to those that match a pattern in the query sequence. The latter two algorithms are sophisticated functions of this tool, however the PHI-BLAST function can prove to be very useful when studying families of proteins or conserved entities.
How to Use Basic BLAST - blastx:
This tool allows you to search protein databases using a translated nucleotide query. In the Enter Query Sequence box, you should enter a nucleotide sequence, and BLAST will search for the resulting translated nucleotide (protein) sequence. This is useful when studying genes that have undergone changes in the nucleotide sequence, but have retained their protein identities.
How to Use Basic BLAST - tblastn:
This tool allows you to search translated nucleotide databases using a protein query. In the Enter Query Sequence box, you should enter a protein sequence, and BLAST will search for the resulting nucleotide sequences that would make up the query sequence. This is useful when comparing similar proteins or genes that might be related.
How to Use Basic BLAST - tblastx:
This tool allows you to search translated nucleotide databases using a translated nucleotide sequence. In the Enter Query Sequence box, you should enter a nucleotide sequence. This tool is useful when trying to reconcile missing nucleotide sequences or genes in biological pathways. It can be used to find model genes that may be used to "fill in the holes" where information may be missing.
Created by Claudia M. Carcelen, 2009.