Difference between revisions of "Gene Annotation Template"

From GcatWiki
Jump to: navigation, search
Line 135: Line 135:
  
 
-    KEGG Pathway is a database that is a collection of pathway maps to represent the molecular interaction and reaction networks for:<br>
 
-    KEGG Pathway is a database that is a collection of pathway maps to represent the molecular interaction and reaction networks for:<br>
       1. Metabolism
+
       - Metabolism
       2. Genetic Information Processing
+
       - Genetic Information Processing
 +
      - Environmental Information Processing
 +
      - Cellular Processes
 +
      - Human Diseases
 
-    KEGG Module is a collection of pathway modules, molecular complexes, and other functional units<br>
 
-    KEGG Module is a collection of pathway modules, molecular complexes, and other functional units<br>
  

Revision as of 04:23, 9 September 2008

Gene Annotation Log - Template

Basic Information:

DNA Coordinates:

DNA Sequence (FASTA format):

Protein Sequence (FASTA format):

Isoelectric Point:



Similarity Data (Sequence-Based):

BLAST Data:
- Gene Product Name:
Better to do a protein BLAST than a nucleotide BLAST
- Top hit – organism:
- Length, Score, E-value, Identity, Positives and Gaps
NCBI Statistics
- Alignment of Top Hit and Query Sequence
Alignment Scoring

CDD: Conserved Domains Database
Have to enter a protein sequence to get a result.
- Significant COG Hits:
Definition of COG
- Names of COGs:
- Score:
- E-value:
CDD website

PDB: Protein Data Bank
- Significant Structure Hits:
This database provides information about the structures of proteins in addition to performing a BLAST alignment.
Have to enter the protein sequence to get a result.
o Length
o Score
o E-value
o Identities
o Positives
o Gaps
o Alignment
PDB website

T-Coffee:
- Multi-Sequence Alignment
T-coffee Website
This is a useful tool, but it is confusing to use.



Cellular Localization Data:

TMHMM:
http://www.cbs.dtu.dk/services/TMHMM-2.0/
This database predicts the number of transmembrane helices in a protein.
- Number of Predicted TMH’s
- Transmembrane Topology graph and comment

SignalP:
http://www.cbs.dtu.dk/services/SignalP/
This database predicts whether or not a protein is a signal protein.
- Signal Peptide Probability
- Signal Peptide Graph

PSORT:
http://psort.ims.u-tokyo.ac.jp/form.html
This database predicts protein localization sites.
- Cytoplasmic Score:
- Cytoplasmic Membrane Score:
- Periplasmic Score:
- Outer Membrane Score:
- Extracellular Score:
- Final Prediction for Protein Location (of the above listed):

Phobius:
http://phobius.sbc.su.se/
This database lists the locations of the predicted transmembrane helices and intervening loop regions.
Note: If the report states that the protein is non cytoplasmic or cytoplasmic, it simply predicts that no transmembrane helices are likely. It should not be used as a predictor of location.

- Enter Graph:

Final Hypothesis: Where do you expect to find this protein?



Alternative Open Reading Frames:

Proposed DNA Coordinates:

Reasoning:



Structure-Based Evidence of Function:

Pfam-A:
- Significant Matches:
- Pfam Name:
- Pairwise Alignment:
- HMM logo:
- Key Functional Residues:

PDB:
- Significant Structure Hits:
o Length
o Score
o E-value
o Identities
o Positives
o Gaps
- Alignment:



Pathways:

KEGG:
This website has two tools:

- KEGG Pathway is a database that is a collection of pathway maps to represent the molecular interaction and reaction networks for:

      - Metabolism
      - Genetic Information Processing
      - Environmental Information Processing
      - Cellular Processes
      - Human Diseases

- KEGG Module is a collection of pathway modules, molecular complexes, and other functional units

EcoCyc:
This is a bioinformatics database that describes the genome and the biochemical machinery of E. coli K-12 MG1655. It can be used as a reference source that we can relate our findings to.

E.C. Number:



Duplication and Degradation:


Duplication:
Paralogs are homologous genes within a single species that arose by gene duplication. Through analysis of paralogs, we can determine which genes may have duplicates in our genome.

You can search for paralogs of an individual gene:
Scroll to the bottom of the Gene Detail page.
Under "Homolog Display", you will find a "Homolog selection" dropbox.
Select "Paralogs / Orthologs."

JGI requests certain information about the top paralog hit:
- Gene Object ID
- Length (bp)
- Score
- E-value
- Identity
- Positives
- Gaps
- Alignment of Top Hit and Query Sequence:
***Alignment Instructions***

Other possible information:
- Number of paralogs above a certain Bit Score.
- How could we measure Degradation?



Evidence of Horizontal Gene Transfer:

Phylogenetic Tree Diagram:

Gene Context:
- Ortholog Neighborhood Region of Organism:
- Examples of similarities or Differences:
- Comment:

Chromosome Viewer GC Heat Map:
- Characteristic GC% of genome:
- Average GC% of gene:



RNA (Rfam):

RNA Family:
Bits Score:

Alignment: