Halorhabdus utahensis Genome

From GcatWiki
Revision as of 03:18, 23 September 2008 by Pebakke (talk | contribs)
Jump to: navigation, search

This page will be used by Davidson College students in the Genomics Laboratory course.

RNA Genes

tRNA Genes Check List
rRNA operon
2 misc. RNA genes (short summary list)
References
Gene Annotation Template
General Questions
Page for Annotated Genes

Other Resources

Consensus Shine Dalgarno Excel File for H. utahensis
Tutorials for annotating genomes

  1. Will DeLoache- BioPerl Installation
  2. Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)
  3. Pallavi-Conserved Domains Database (CDD)
  4. Mary- Protein Data Bank
  5. Laura Voss - Pfam Database

This is a list of glossary words (A - Z):

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

Arabidopsis thaliana - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics (Wikipedia.org, Jay)

B

BAC - bacterial articifical chromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms (Wikipedia.org, Jay)

bioinformatics - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [1] (Matt)

BLAST - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [2] (Mary)

bioperl- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [3] (Wikipedia, Max Win)

C

carbon fixation - using carbon dioxide to create organic materials [4] (Samantha)

CDD (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [5] (Mary)

chaperonin - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [6] (Matt)

chemotaxis - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [7] (Nick)

chemotaxonomy - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [8] (Mary)

COG (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs (COG Pallavi)

concatemer - long continuous DNA molecule that contains the same DNA sequence repeated in series [9](Samantha)

contigs (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [10], Max Win)

coverage - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)

D

de novo synthesis - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [11] (Matt)

diatom - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [12] (Mary)

dot plot-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[13], Max Win)

E

EC number (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [14] (Mary)

E-value (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[15], Max Win)

F

FASTA format - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acide sequence. [16] (Nick)

finished genome - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)

G

GC Content - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [17] (Matt)

GC-skew – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[18], Max Win)

gene amplification - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [19] [20] (Matt)

gene knockout - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [21] (Matt)

gene oncology- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[22], Max Win)

glaucophyte - freshwater algae that have not been studied well [23](Samantha)

H

haemolysin or hemolysin - a chemical produced by a bacteria that causes lysis of red blood cells [24] (Nick)

halophile - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [25] (Matt)

haplotype-collection of alleles that travel together (Lecture, Pallavi)

haptophyte - phylum of algae [26](Samantha)

heterokont - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [27](Samantha)

homeobox - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [28](Samantha)

homodimer - a protein made of paired identical polypeptides (Answers.com, Jay)

horizontal gene transfer-DNA transmission between species and incorporation of the DNA into the recipient's genome (horizontal gene transfer Pallavi)

hydrolase - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [29] (Nick)

I

ideogram - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)

identities - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)

indole-a chemical compound that is produced from the break down of tryptophan (indole Pallavi)

inclusion body - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [30] (Nick)

intron - a region of DNA in a gene that is not part of the final coding sequence for the protein. [31] (Peter)

isoelectric point - the pH at which a molecule is neutral [32] (Nick)

isozymes - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)

J

K

L

M

motif - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[33], Max Win)

mycoplasma - genus of bacteria that lack a cell wall [34] (Nick)

N

NORFs (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[35], Max Win)

nucleomorph - reduced eukaryotic nuclei found in plastids [36](Samantha)

O

open reading frame (ORF)-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) ORF (Pallavi)

operon - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [37] (Nick)

optical mapping-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome optical mapping (Pallavi)

ortholog-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)

oxidoreductase - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [38] (Nick)

P

paralog-identical DNA sequences within a species (Lecture, Pallavi)

p-arm - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) (MedTerms Dictionary, Jay)

plastid - major organelles in plants or algae [39](Samantha)

pleomorphism - the occurrence of two or more structural forms during a life cycle [40] (Mary)

phylogenetic tree - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [41] (Nick)

phylotypes – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[42], Max Win)


positives - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [43] (Mary)

proteome - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [44](Samantha)

psuedogenes-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)

Q

q-arm - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) (MedTerms Dictionary, Jay)

R

RAST - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([45], Max Win)

rDNA-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. (rDNA Pallavi)

retrotransposons - RNA transcribed back into DNA and added into the genome [46](Samantha)

ribonuclease - a nuclease that catalyzes the degradation of RNA into smaller components [47] (Mary)

S

Serovar-a subdivision of a species based on the characteristics of their cell surface antigens (serovar Pallavi)

scaffold - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected (MedTerms Dictionary, Jay)

Shine-Dalgarno sequence - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and Wikipedia article, Laura)
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.

signal peptide - a short peptide chain that directs the post-translational transport of a protein [48] (Matt)

synteny - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor (Answers.com, Jay)

T

transferase - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [49] (Matt)

transmembrane helix - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [50](Mary)

transposons / transposable elements - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [51](Samantha)

Transposon Mutagenesis-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene (transposon mutagenesis Pallavi)

tRNA splicing endonuclease - an enzyme that cleaves intervening sequences of precursor tRNA. [52] (Peter)

U

V

W

whole genome shotgun sequencing - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [53](Samantha)

X

xenolog - homologs that are created by horizontal gene transfer between two different species [54] (Matt)

Y

Z




This is a list of the student-created tutorials: