Difference between revisions of "TBLASTn and Protein Sequence Analysis"
(→When Nucleotides Don't Work) |
(→Using tBLASTn) |
||
Line 27: | Line 27: | ||
== Using tBLASTn == | == Using tBLASTn == | ||
+ | |||
+ | === Why BLAST proteins? === | ||
+ | Working at the protein level provides a number of benefits. Because each amino acid can be coded for by 3 or 4 different codons, a given gene could have a number of different nucleotide sequences that all produce the same amino acids. Therefore, BLASTing at the protein level allows for a certain flexibility that cannot be achieved with nucleotides alone. Protein BLASTing will also account for any ''silent mutations'' (point mutations that do not change amino acid sequence) present in the genome. | ||
+ | |||
+ | === Conserved Protein Sequences === | ||
+ | Even in highly conserved proteins, there will be some variation across species and individuals. Oftentimes, however, there is at least one region of the gene that had an important enough function that it changed very little over time. These especially conserved portions of the protein can be very useful when trying to identify them in a new genome. | ||
== Why Amino Acids? == | == Why Amino Acids? == |
Revision as of 20:00, 22 February 2011
Contents
When Nucleotides Don't Work
Occasionally BLASTing genes with just a nucleotide sequence doesn't work. This can be due to a number of reasons:
- Lack of conservation between species
- Incomplete genome database
- Rearrangement of introns/exons
But assuming that you're working with a gene that should be highly conserved and you have allowed for the exon/intron issue, it may be time to go through some extra steps to make sure everything is working as it should:
- First -- Check to make sure that your database is made correctly.
- You should have three additional files made from the original fasta file:
- First -- Check to make sure that your database is made correctly.
- Second -- Check that your commands are entered correct.
- Ensure that you are in the correct directory (i.e. Desktop) and BLASTing with the correct file name
- Sequences being blasted should be in TextEdit and saved without any extension (such as .txt)
- Second -- Check that your commands are entered correct.
- Third -- Ensure that the databases/commands are working.
- Take a known sequence from the database being blasted
- Typically copy a small portion of the genome into a separate text file
- Run a BLAST with that sequence against the database it was originally in
- If the sequence doesn't show a hit in the scaffold that you took it from, something is wrong in your programming
- Third -- Ensure that the databases/commands are working.
If you've gone through these steps and still can't find the issue, it might be time to move away from nucleotide sequences and try some amino acid sequences.
Using tBLASTn
Why BLAST proteins?
Working at the protein level provides a number of benefits. Because each amino acid can be coded for by 3 or 4 different codons, a given gene could have a number of different nucleotide sequences that all produce the same amino acids. Therefore, BLASTing at the protein level allows for a certain flexibility that cannot be achieved with nucleotides alone. Protein BLASTing will also account for any silent mutations (point mutations that do not change amino acid sequence) present in the genome.
Conserved Protein Sequences
Even in highly conserved proteins, there will be some variation across species and individuals. Oftentimes, however, there is at least one region of the gene that had an important enough function that it changed very little over time. These especially conserved portions of the protein can be very useful when trying to identify them in a new genome.