Instructions For Tree Construction And Analysis
If you are unfamiliar with the phylogeny technique, please refer to our other page, "Phylogenetic Analysis Made Easy."
The present web-interface constructs phylogenetic trees for families of proteins in order to illustrate any expected evolutionary relationships among their members. The two publicly available software packages employed herein are CLUSTALW and PHYLIP (phylogenetic inference package).
Unfortunately the present interface is unable to perform analysis for nucleic acid sequences.
PHYLIP version 3.573c was used here for phylogenetic analysis. The software pacakage is available at <http://evolution.genetics.washington.edu/phylip.html>.
Return to the Submission page.
For data submission into the present interface, the required file format is Fasta/Pearson. An example of a set of sequences in Fasta format is provided below:
>gi|17017987:30-419 Homo sapiens cytochrome c oxidase subunit Vb (COX5B), mRNA
ATGGCTTCAAGGTTACTTCGCGGAGCTGGAACGCTGGCCGCGCAGGCCCTGAGGGCTCGCGGCCCCAGTG
GCGCGGCCGCGATGCGCTCCATGGCATCTGGAGGTGGTGTTCCCACTGATGAAGAGCAGGCGACTGGGTT
GGAGAGGGAGATCATGCTGGCTGCAAAGAAGGGACTGGACCCATACAATGTACTGGCCCCAAAGGGAGCT
TCAGGCACCAGGGAAGACCCTAATTTAGTCCCCTCCATCTCCAACAAGAGAATAGTAGGCTGCATCTGTG
AAGAGGACAATACCAGCGTCGTCTGGTTTTGGCTGCACAAAGGCGAGGCCCAGCGATGCCCCCGCTGTGG
AGCCCATTACAAGCTGGTGCCCCAGCAGCTGGCACACTGA>gi|45827791:75-284 Homo sapiens cytochrome c oxidase subunit 8A (COX8A), mRNA
ATGTCCGTCCTGACGCCGCTGCTGCTGCGGGGCTTGACAGGCTCGGCCCGGCGGCTCCCAGTGCCGCGCG
CCAAGATCCATTCGTTGCCGCCGGAGGGGAAGCTTGGGATCATGGAATTGGCCGTTGGGCTTACCTCCTG
CTTCGTGACCTTCCTCCTGCCAGCGGGCTGGATCCTGTCACACCTGGAGACCTACAGGAGGCCAGAGTGA>gi|18105034:463-702 Homo sapiens cytochrome c oxidase subunit VIIa polypeptide 1 (muscle) (COX7A1), mRNA
ATGCAGGCCCTTCGGGTGTCCCAGGCGCTGATCCGCTCCTTCAGCTCCACCGCCCGGAACCGCTTTCAGA
ACCGAGTGCGCGAGAAACAGAAGCTCTTCCAGGAGGACAATGACATCCCGTTGTACCTGAAGGGCGGCAT
CGTTGACAACATCCTGTACCGAGTGACAATGACGCTGTGTCTGGGCGGCACTGTCTACAGCTTGTACTCC
CTTGGCTGGGCCTCCTTCCCCAGGAATTAA
Note: the present interface treats only the first 10 characters follow the greater-than character, '>', as the sequence identification or name. Identification strings longer than 10 characters are truncated. The program then proceeds to the first new line to read in the sequence.
1. Multiple Sequence Alignment
The first step following data retrieval is the execution of a multiple sequence alignment, obtained via CLUSTALW (progressive alignment method). If you have not previously aligned your sequences, you may not skip this step. The purpose of this step is to place the most closely related sequences in the user's data set together prior to initiating tree construction. PHYLIP takes the patterns gleaned from multiple sequence alignment when building phylogenies.
2. Phylogenetic Method
Analyses in the present interface are rendered according to the distance method. Four pr within PHYLIP are empolyed here. They are, SEQBOOT, PROTDIST, NEIGHBOR, and CONSENSE.
[A] Once multiple alignment has been completed, the data set is transmitted to SEQBOOT. SEQBOOT generates multiple possible arragnements of the alignment (reflecting the number of conceivable evolutionary paths).
[B] PROTDIST reads in the data from SEQBOOT and computes a distance score for protein sequences. This step is most critical, since no subseqent analysis can be made without a measure of sequence divergence or similarity. A Dayhoff PAM matrix is used for compuation of distance scrores between pairs of sequences. A distance score reflects the number of single amino acid alterations required in order generate an identity sequence from a second sequence.
[C] NEIGHBOHR implements the Neighbor-joing method (Saitou and Nei 1987) to determine the most reasonable positioning of branches. Two sequences having the smallest distance scores are joined as "neighbors" and will share a node below them (or to their left) in the final tree.
Note: the user option "Tree" affects the execution of NEIGHBOR. The default setting gives rise to unrooted trees. An unrooted or rooted tree is produced depending upon certain assumptions of the two alternative algorithms that can be used in NEIGHBOR. The Neighbor-joining method does not assume a constant rate of mutation, known as the molecular clock hypothesis, and as such this algorithm always produces unrooted trees.
Alternatively, if the user specifies a rooted tree, then NEIGHBOR implements another algorithm, the unweighted pair group method with arithmetic mean (UPGMA). The UPGMA algorithm assumes a molecular clock and generates rooted trees.
[D] Lastly, the branch odering data is passed to CONSENSE for resampling computations. Any phylogenetic method renders the most likely tree, i.e., those relationships that are most reasonable given the sequence alignments. As such, any single tree is only one of many possible trees that could have arisen over evolutionary time. Resampling methods, therefore, are designed to find the most probable tree among the many possible evolutionary paths that could have generated a given set of proteins.
References
Felsenstein J. (1981). PHYLIP: Phylogeny inference package (version 3.2). Cladistics 5: 164-166.
Higgins DG and Sharp PM. (1988). CLUSTAL: A package for performing multiple sequence alignment on a microcomputer. Gene 73: 237-244.
Higgins DG, Thompson JD, and Gibson TJ. (1996). Using CLUSTAL for multiple sequence alignements. Methods Enzymol. 266: 383-402.
Mount DW. (2001). Bioinformatics: Sequence and genome analysis. Cold Spring Harbor Laboratory Press, 564 pp.
Saitou N and Nei M. (1987). The neighbor-joining method: A new method for reconstronting phylogenetic trees. Mol. Biol. Evol. 4: 406-425.
Return to the Submission page.
Created on 22 April 2004, by A. Clement