Mastering the Art of NCBI: It's a BLAST

From GcatWiki
Jump to: navigation, search

The National Center for Biotechnology Information (NCBI) is an organization founded in 1988 as a national resource available to the public for access to molecular biology information. NCBI creates numerous databases, online tools and research software programs to analyze genomes. The Basic Local Alignment Search Tool (BLAST) is an online tool designed to enable users to rapidly search through nucleotide and protein databases.While the website is designed for both novice and veteran users, the task of mastering the tool and the art can be daunting. This website is designed to provide a step-by-step process of how to use BLAST and interpret your results.


Contents:

  • Nucleotide Search
  • Protein Search
  • blastx
  • tblastn
  • tblastx
  • Aligning Multiple Sequences
  • Saving Your Work

How to use Basic BLAST - Nucleotide Search:

When you get to the main page, you may notice you have a number of options to choose from:

BLAST Home.png


To search for matching nucleotide sequences in the database, choose: Nucleotide blast.png

This link will take you to the page shown below.

Nucleotide BLAST Entry.png


In the entry box below Enter Query Sequence there are three possible methods of entry for your search. The first is bare sequence, which refers to simply to the nucleotide sequence (ATCG, etc.) you wish to search for.

Enter Query Sequence.png

The second method uses FASTA format, shown below. This format requires the first line to be used as a descriptor, followed by a return and the nucleotide sequence. The descriptor can be found on the website where the gene sequence was obtained.

FASTA Entry.png

Finally, you may choose to use identifiers such as a gene's Accession Number as the query. It is important that there are no spaces in between letters or numbers, because they will be treated as separate sequences, or BLAST will fail to read them.

Identifier Entry.png


Once you have entered your query, you must choose which database you wish to search.

Choose Database.png

The most widely used database is the Nucleotide Collection (nr/nt) since it encompasses a broad range of nucleotide sequences across all domains, however you may choose to search another, depending on your research.


You may wish to restrict your search hits to only those found in certain organisms, or to exclude those found in a certain organism. You may do so by entering the common name, the binomial name or the taxonomic identification. Clicking Exclude excludes hits found in this organism's genome. Furthermore, clicking the + allows you to include or exclude multiple organisms or taxa.

Choose Organism.png


You have the option to further narrow your search using Entrez Query which limits searches a subset of the selected BLAST database. This tool uses special and specific syntax described on the NCBI website. This function is a specialized measure for narrowing search results, but it is only optional since the methods already described provide good results.


At this point, you need to choose the specificity of your search hits. You have three options: highly similar sequences (megablast), more dissimilar sequences (discontiguous megablast), and somewhat similar sequences (blastn). Megablast provides the small number of most exact matches, blastn provides a greater amount of matches that are not as close, and discontiguous megablast provides the greatest amount of matches that are only minimally related

Choose Algorithm.png.


At this point, clicking BLAST will take you to some intermediate waiting pages, and then to a page similar to the one below.

BLAST Results.png

The color chart uses color coding to demonstrate how much of the query sequence the result hits matched. The table below provides descriptive information regarding the statistical value of the results. The results can be sorted by clicking the heading of whichever column you wish to sort by. The key values you should look at when searching for a sequence match are Query Coverage, Max Identity, and E-Value. In the first two instances, you want to have a high percentage, which correlates to a high level of matching. The E-Value or the Expected Value is a value that tells you the probability that this match was due to chance. A good cutoff for a significant match is 0.001---anything smaller than that is a statistically significant match. Examples of good E-values are 2e-98 and 3e-57.

Descriptions Table.png

If you scroll down, NCBI provides detailed information on each hit that was returned, including information on what each hit encodes or what is encoded in that segment of DNA. These descriptions provide links to individual pages for each gene, which may be useful to your investigation.

Sample Hit.png


How to Use Basic Blast - Protein Search:

To search protein sequences return to the main page and click: Protein BLAST.png

This will take you to a page exactly like the one you encountered with the nucleotide search. In the Enter Query Sequence box, you may use the same three methods of entry previously described, however the sequence must use only the amino acid single-letter code.


The search databases you will choose from are also different, because they are protein databases as opposed to nucleotide databases.

Choose Protein Database.png

Finally, the algorithm used is slightly different. The possibilities are blastp (protein-protein BLAST), PSI-BLAST (Position-Specific Iterated BLAST), or PHI-BLAST (Pattern Hit Initiated BLAST). The most commonly used is blastp, which simply matches protein sequences to protein sequences. PSI-BLAST lets the user build a PSSM (position-specific scoring matrix) using the results of the first blastp run. PHI-BLAST performs the search but limits alignments to those that match a pattern in the query sequence. The latter two algorithms are sophisticated functions of this tool, however the PHI-BLAST function can prove to be very useful when studying families of proteins or conserved entities.


How to Use Basic BLAST - blastx:

Blastx.png

This tool allows you to search protein databases using a translated nucleotide query. In the Enter Query Sequence box, you should enter a nucleotide sequence, and BLAST will search for the resulting translated nucleotide (protein) sequence. This is useful when studying genes that have undergone changes in the nucleotide sequence, but have retained their protein identities.


How to Use Basic BLAST - tblastn:

Tblastn.png

This tool allows you to search translated nucleotide databases using a protein query. In the Enter Query Sequence box, you should enter a protein sequence, and BLAST will search for the resulting nucleotide sequences that would make up the query sequence. This is useful when comparing similar proteins or genes that might be related.


How to Use Basic BLAST - tblastx:

Tblastx.png

This tool allows you to search translated nucleotide databases using a translated nucleotide sequence. In the Enter Query Sequence box, you should enter a nucleotide sequence. This tool is useful when trying to reconcile missing nucleotide sequences or genes in biological pathways. It can be used to find model genes that may be used to "fill in the holes" where information may be missing.


How to Align Multiple Sequences Using BLAST:

You may wish to align multiple sequences of nucleotides and/or proteins in order to compare and contrast their similarities and differences. This tool is useful when comparing genes that encode for the same protein product or conserved genes across different organisms. This can be used to compare similar genes and pick out conserved domains within those genes

To do so, simply check the box next to Align two or more sequences and the page will reload with an additional Enter Query Sequence box. Enter one or more query sequences in the top box and one or more query sequences in the bottom box, and click BLAST. The results will show you how well the queries aligned with each other, if at all.

Align Sequence.png


How to Save Your Strategies and Results:

As your research develops, you will probably want to save your work. While the it is recommended you use some sort of Screen Shot program to photographically document your results, you may also choose to create an account with NCBI. Registering with NCBI is free, and is very useful since you can save and keep track of your jobs.

In the top right-hand corner, simply click: Register.png

The link will take you to the following page, and registration only takes a few seconds to complete.

My NCBI.png

Once you have registered and logged in, you can save your strategies by clicking the Save Search Strategies hyperlink at the top of your results page. Doing so saves the search to your NCBI account, and allows you to access it for later use. You may also choose to download the search, which you may also do by clicking the Download hyperlink at the top of the results page.


Created by Claudia M. Carcelen, 2009.