Difference between revisions of "Gene Annotation Template"

From GcatWiki
Jump to: navigation, search
Line 132: Line 132:
 
This website has two tools:
 
This website has two tools:
  
-    KEGG Pathway is a database that is a collection of manually drawn pathway maps the represent knowledge of the molecular interaction and reaction networks for:<br>
+
-    KEGG Pathway is a database that is a collection of pathway maps to represent the molecular interaction and reaction networks for:<br>
       1. Metabolism <br>
+
       1. Metabolism
       2. Genetic Information Processing <br>
+
       2. Genetic Information Processing
       3. Environmental Information Processing<br>
+
       3. Environmental Information Processing
       4. Cellular Processes<br>
+
       4. Cellular Processes
       5. Human Diseases<br>
+
       5. Human Diseases
 
+
-    KEGG Module is a collection of pathway modules, molecular complexes, and other functional units<br>
-    KEGG Module is a collection of pathway modules, molecular complexes, and other functional units, each represented as a list of KEGG Orthology (KO) identifiers.<br>
 
  
 
[http://ecocyc.org/ EcoCyc]: <br>
 
[http://ecocyc.org/ EcoCyc]: <br>
This is a bioinformatics database that describes the genome and the biochemical machinery of ''E. coli'' K-12 MG1655. The long-term goal of the project is to describe the molecular catalog of the ''E. coli'' cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of ''E. coli''. It is an reference source for that we can relate our findings to.  
+
This is a bioinformatics database that describes the genome and the biochemical machinery of ''E. coli'' K-12 MG1655. It can be used as a reference source that we can relate our findings to.  
  
 
E.C. Number: <br>
 
E.C. Number: <br>

Revision as of 20:53, 8 September 2008

Gene Annotation Log - Template

Basic Information:

DNA Coordinates:

DNA Sequence (FASTA format):

Protein Sequence (FASTA format):

Isoelectric Point:



Similarity Data (Sequence-Based):

BLAST Data:
- Gene Product Name:
- Top hit – organism:
- Length, Score, E-value, Identity, Positives and Gaps
NCBI Statistics
- Alignment of Top Hit and Query Sequence
Alignment Scoring

CDD: Conserved Domains Database
- Significant COG Hits:
Definition of COG
- Names of COGs:
- Score:
- E-value:
CDD website

PDB: Protein Data Bank
- Significant Structure Hits:
This database provides information about the structures of proteins in addition to performing a BLAST alignment.
o Length
o Score
o E-value
o Identities
o Positives
o Gaps
o Alignment
PDB website

T-Coffee:
- Multi-Sequence Alignment
T-coffee Website
This is a useful tool, but it is confusing to use.



Cellular Localization Data:

TMHMM:
http://www.cbs.dtu.dk/services/TMHMM-2.0/
This database predicts the number of transmembrane helices in a protein.
- Number of Predicted TMH’s
- Transmembrane Topology graph and comment

SignalP:
http://www.cbs.dtu.dk/services/SignalP/
This database predicts whether or not a protein is a signal protein.
- Signal Peptide Probability
- Signal Peptide Graph

PSORT:
http://psort.ims.u-tokyo.ac.jp/form.html
This database predicts protein localization sites.
- Cytoplasmic Score:
- Cytoplasmic Membrane Score:
- Periplasmic Score:
- Outer Membrane Score:
- Extracellular Score:
- Final Prediction for Protein Location (of the above listed):

Phobius:
http://phobius.sbc.su.se/
This database lists the locations of the predicted transmembrane helices and intervening loop regions.
Note: If the report states that the protein is non cytoplasmic or cytoplasmic, it simply predicts that no transmembrane helices are likely. It should not be used as a predictor of location.

- Enter Graph:

Final Hypothesis: Where do you expect to find this protein?



Alternative Open Reading Frames:

Proposed DNA Coordinates:

Reasoning:



Structure-Based Evidence of Function:

Pfam-A:
- Significant Matches:
- Pfam Name:
- Pairwise Alignment:
- HMM logo:
- Key Functional Residues:

PDB:
- Significant Structure Hits:
o Length
o Score
o E-value
o Identities
o Positives
o Gaps
- Alignment:



Pathways:

KEGG:
This website has two tools:

- KEGG Pathway is a database that is a collection of pathway maps to represent the molecular interaction and reaction networks for:

     1. Metabolism
     2. Genetic Information Processing
     3. Environmental Information Processing
     4. Cellular Processes
     5. Human Diseases

- KEGG Module is a collection of pathway modules, molecular complexes, and other functional units

EcoCyc:
This is a bioinformatics database that describes the genome and the biochemical machinery of E. coli K-12 MG1655. It can be used as a reference source that we can relate our findings to.

E.C. Number:



Duplication and Degradation:

Paralog:
- Length
- Score
- E-value
- Identity
- Positives
- Gaps

Alignment of Top Hit and Query Sequence:



Evidence of Horizontal Gene Transfer:

Phylogenetic Tree Diagram:

Gene Context:
- Ortholog Neighborhood Region of Organism:
- Examples of similarities or Differences:
- Comment:

Chromosome Viewer GC Heat Map:
- Characteristic GC% of genome:
- Average GC% of gene:



RNA (Rfam):

RNA Family:
Bits Score:

Alignment: