Mastering the Art of NCBI: It's a BLAST

The National Center for Biotechnology Information (NCBI) is an organization founded in 1988 as a national resource available to the public for access to molecular biology information. NCBI creates numerous databases, online tools and research software programs to analyze genomes. The Basic Local Alignment Search Tool (BLAST) is an online tool designed to enable users to rapidly search through nucleotide and protein databases.While the website is designed for both novice and veteran users, the task of mastering the tool and the art can be daunting. This website is designed to provide a step-by-step process of how to use BLAST and interpret your results.

How to use Basic BLAST - Nucleotide Search:

When you get to the main page, you may notice you have a number of options to choose from:

Error creating thumbnail: Unable to save thumbnail to destination

To search for matching nucleotide sequences in the database, choose:

Error creating thumbnail: Unable to save thumbnail to destination

This link will take you to the page shown below.

Error creating thumbnail: Unable to save thumbnail to destination

In the entry box below Enter Query Sequence there are three possible methods of entry for your search. The first is bare sequence, which refers to simply to the nucleotide sequence (ATCG, etc.) you wish to search for.

Error creating thumbnail: Unable to save thumbnail to destination

The second method uses FASTA format, shown below. This format requires the first line to be used as a descriptor, followed by a return and the nucleotide sequence. The descriptor can be found on the website where the gene sequence was obtained.

Error creating thumbnail: Unable to save thumbnail to destination

Finally, you may choose to use identifiers such as a gene's Accession Number as the query. It is important that there are no spaces in between letters or numbers, because they will be treated as separate sequences, or BLAST will fail to read them.

Error creating thumbnail: Unable to save thumbnail to destination

Once you have entered your query, you must choose which database you wish to search.

Error creating thumbnail: Unable to save thumbnail to destination

The most widely used database is the Nucleotide Collection (nr/nt) since it encompasses a broad range of nucleotide sequences across all domains, however you may choose to search another, depending on your research.

You may wish to restrict your search hits to only those found in certain organisms, or to exclude those found in a certain organism. You may do so by entering the common name, the binomial name or the taxonomic identification. Clicking Exclude excludes hits found in this organism's genome. Furthermore, clicking the + allows you to include or exclude multiple organisms or taxa.

Error creating thumbnail: Unable to save thumbnail to destination

You have the option to further narrow your search using Entrez Query which limits searches a subset of the selected BLAST database. This tool uses special and specific syntax described on the NCBI website. This function is a specialized measure for narrowing search results, but it is only optional since the methods already described provide good results.

At this point, you need to choose the specificity of your search hits. You have three options: highly similar sequences (megablast), more dissimilar sequences (discontiguous megablast), and somewhat similar sequences (blastn). Megablast provides the small number of most exact matches, blastn provides a greater amount of matches that are not as close, and discontiguous megablast provides the greatest amount of matches that are only minimally related

Error creating thumbnail: Unable to save thumbnail to destination

.

At this point, clicking BLAST will take you to some intermediate waiting pages, and then to a page similar to the one below.

Error creating thumbnail: Unable to save thumbnail to destination

The color chart uses color coding to demonstrate how much of the query sequence the result hits matched. The table below provides descriptive information regarding the statistical value of the results. The results can be sorted by clicking the heading of whichever column you wish to sort by. The key values you should look at when searching for a sequence match are Query Coverage, Max Identity, and E-Value. In the first two instances, you want to have a high percentage, which correlates to a high level of matching. The E-Value or the Expected Value is a value that tells you the probability that this match was due to chance. A good cutoff for a significant match is 0.001---anything smaller than that is a statistically significant match. Examples of good E-values are 2e-98 and 3e-57.

Error creating thumbnail: Unable to save thumbnail to destination

If you scroll down, NCBI provides detailed information on each hit that was returned, including information on what each hit encodes or what is encoded in that segment of DNA. These descriptions provide links to individual pages for each gene, which may be useful to your investigation.

Error creating thumbnail: Unable to save thumbnail to destination

How to Use Basic Blast - Protein Search:

To search protein sequences return to the main page and click:

Error creating thumbnail: Unable to save thumbnail to destination

This will take you to a page exactly like the one you encountered with the nucleotide search. In the Enter Query Sequence box, you may use the same three methods of entry previously described, however the sequence must use only the amino acid single-letter code.

The search databases you will choose from are also different, because they are protein databases as opposed to nucleotide databases.

Error creating thumbnail: Unable to save thumbnail to destination

Finally, the algorithm used is slightly different. The possibilities are blastp (protein-protein BLAST), PSI-BLAST (Position-Specific Iterated BLAST), or PHI-BLAST (Pattern Hit Initiated BLAST). The most commonly used is blastp, which simply matches protein sequences to protein sequences. PSI-BLAST lets the user build a PSSM (position-specific scoring matrix) using the results of the first blastp run. PHI-BLAST performs the search but limits alignments to those that match a pattern in the query sequence. The latter two algorithms are sophisticated functions of this tool, however the PHI-BLAST function can prove to be very useful when studying families of proteins or conserved entities.

Created by Claudia M. Carcelen, 2009.

Mastering the Art of NCBI: It's a BLAST

Navigation menu

Views

Personal tools

Navigation

Search

Tools