Difference between revisions of "Blueberry Genome Project for Bio343"

From GcatWiki
Jump to: navigation, search
(F)
(T)
Line 512: Line 512:
  
 
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br>
 
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br>
 +
 +
'''Tribe''' - (Lexi)
  
 
'''type strain''' - an isolated sample of an organism that acts as the reference point for defining that species (Lecture, Olivia)
 
'''type strain''' - an isolated sample of an organism that acts as the reference point for defining that species (Lecture, Olivia)

Revision as of 03:52, 7 February 2011

This page will be used by Davidson College students in the Genomics Laboratory course.

Personal Lab Notebooks

Laura

Lexi

Dylan

Puneet

Leland

Jared

Lauren

William

Team Lab Notebooks

Leland & Will

Dylan & Jared

Lauren & Puneet

Lexi & Laura

Team Lab Notebooks

Priority List of Topics


Links to Multiple Databases


Papers of Interest


Submitted Course Assignments


Glossary words (A - Z):

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

5' Cap - a methylated guanine nucleotide that is added to the 5' end of a mRNA molecule in eukaryotes. It is added by a 5' to 5' triphosphate linkage, and it gives the mRNA resistance to 5' exonucleases. [1] (Laura M.)

16S rRNA - ribosomal RNA found in the small subunit of prokaryotic ribosomes. rRNA functions in decoding mRNA and interacting with tRNAs in translation. Particularly 16S rRNA is a well-conserved gene found in all organisms (in prokaryotes and eukaryotic mitochondria) often used in comparative genomes when studying phylogeny (Lecture, Olivia)

454 Sequencing - 454 instruments are pyrosequencers that carry out many reactions at a time (parallel sequencing) in wells of a PicoTiter Plate. Beads coated with thousands of homogeneous DNA fragments are added to individual wells on the plate. The DNA fragments are amplified in an oil emulsion mixture with DNA polymerase and primers. dNTPs are sequentially added to the wells one at a time and washed. The process of continuous washing and the sequencial addition of dNTPs, DNA polymerase, luciferase, and ATP-sulfurylase explains the high reagent costs of sequencing. ATP-sulfurylase converts the PPi released from each dNTP addition to the complementary strand of the original ssDNA to ATP. ATP fuels luciferase in each well. The light produced is detected with a flourescence microscope. The current (2009) 454 FLX system has the ability to sequence 100 Mb DNA in 8 hours with an average read of 250 bp and raw accuracy of 99.5%. [2] [3] (Jared)

A

accession number - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [4] (Will).

acid invertase- (Lauren)

adsorption - the accumulation of molecules on the surface of a material. This can be part of a lab procedure to purify and isolate a specific portion of a cell or a protein (Wikipedia, Olivia)

alien genes - genes found in a genome that appear to have been inserted into an organism's genome from another species, more than likely through horizontal gene transfer ([1] Campbell, Claudia)

anthocyanins - a member of the flavonoid family that changes color with pH, giving various fruits their coloration. The health benefits of anthocyanin are potentially great, with laboratory results suggesting positive effects against cancer, aging and neurological diseases, inflammation, diabetes, and bacterial infections. It is, however, poorly conserved during digestion and would have to be modified somehow for medicinal use. [5] [6] (Dylan)

antisense (RNA or DNA)-a piece of DNA or RNA that binds to a complementary sequence of DNA or RNA. These segments of genetic material can be used to identify the existence of a disease gene and they can also be used to bind to specific DNA or mRNA sequences to inhibit their function (5 Pallavi).

Apollo - Gene annotation software that allows you to visualize genes you have identified and where they lie within a genome (Lexi).

Arabidopsis thaliana - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics (Wikipedia.org, Jay)

Archaea - one of the three evolutionary domains. A group of unicellular prokaryotes that were previously grouped with Bacteria, but have some genes and metabolic pathways more similar to eukaryotes, such as those involved in transcription and translation. Many Archaea are extremophiles, such as Halobacteria that thrive in high-salt environments (Lecture, Olivia)

Archaeal rhodopsins - Archaeal rhodopsins are light-sensitive and light-activated transmembrane proteins only found in archaeal plasma membranes. Bacteriorhodopsin (BR) and Halorhodopsin (HR) are both archaeal rhodopsins that are proton and chloride light drive pumps, respectively, indicating that the functionality of archaeal rhodopsins is diverse [7] (Katie)

B

BAC - bacterial articifical chromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms (Wikipedia.org, Jay)

Bacteriorhodopsin- A transmembrane archaeal rhodopsin protein that uses light energy to move protons across membranes, creating an electrochemical gradient that is converted into chemical energy [8] (Katie).

Bacterioruberin - Bacterioruberin is a “carotenoid pigment” found in some halophiles giving them a red color and providing assumed protection from strong sunlight [9]. The structure also plays a stabilizing role in the archaeal rhodopsin proteins [10] (Katie).

bioinformatics - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [11] (Matt)

BLAST - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [12] (Mary)

Bligh-Dyer method- A lipid extraction method that uses chloroform-methanol as a solvent but also includes a re-extraction of the sample, just with chloroform, before evaporation of the solvent to capture more non-polar lipids. [13] The lipid membrane of archaea is extremely unique not only in composition (see Isoprenoid lipids) but also in the archaeal rhodopsins that are scattered among the plasma membrane [14]. In order to study the uniqueness of archaeal membranes one needs to observe the lipids outside of the membrane, which the Bligh-Dyer method accomplishes (Katie)

bioperl- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [15] (Wikipedia, Max Win)

bootstrap value - common reliability test of a phylogenetic tree, calculated as a percentage. In generating a phylogenetic tree, the sequences will be resampled, or rerun, multiple times. If a pair of sequences are consistently grouped together for 100 out of 100 resamplings, then the certainty that those sequences are correctly grouped would be very high, and the bootstrap value would be 100. If a pair of samples were grouped together only 50 out of 100 resamplings, the certainty that those sequences are correctly grouped would be lower; the bootstrap value would be 50. On phylogenetic trees, these values may be placed adjacent to the group to which they refer. (Lecture, Olivia)

C

carbon fixation - using carbon dioxide to create organic materials [16] (Samantha)

CCCP - carbonyl cyanide m-chlorophenyl hydrazone; a nitrile ionophore that inhibits oxidative phosphorylation and photophosphorylation. Ionophores are lipid-soluble molecules allowing them to transfer across membranes, creating pores that disrupt transmembrane ion gradients. (Sugiyama 1994 article, Olivia)

cell division control (Cdc) protein - for example, Cdc6 found in Halorhabdus utahensis; protein responsible for activating and maintaining mechanisms of cell division. Cell division control proteins are important in annotation because the presence of a Cdc gene is a good indicator for finding the origin of replication in a circular chromosome. (Bakke et al 2009, Olivia)

CDD (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [17] (Mary)

cDNA - DNA that is reverse-transcribed from mature mRNA. A cDNA library provides templates for genes that are expressed within an organism. [18]. (Pyfrom)

chaperonin - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [19] (Matt)

chemoorganotrophic - refers to organisms that obtain energy from oxidation/reduction reactions using organic electron donors (Link, Earthlife Claudia)

chemotaxis - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [20] (Nick)

chemotaxonomy - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [21] (Mary)

chimeric genome - A genome that consists of a mixture of genes from distinct species Baliga et al., 2004 (Karen)

ClustalW - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [22] (Will).

COG (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs (COG Pallavi)

fold coverage - c= (L*N)/G, L= average read lengths, N= number of reads, G= genome size. A higher fold coverage allows for higher final accuracy statistically due to a larger sample size in calculating the mode nucleotide across point polymorphic sites (between reads) e.g. 12X coverage means 12X redundacy of bases, higher base accuracy and higher accuracy of assembly [23] (Jared)

comparative genomics - the study of relationships between genomes of different strains and species. Comparative genomics aims to define similarities and differences in structure and/or function of different proteins, RNAs and regulation between organisms (Wikipedia and Lecture, Olivia)

concatemer - long continuous DNA molecule that contains the same DNA sequence repeated in series [24](Samantha)

congenic - two strains of an organism that are nearly identical, varying only at a single locus (also called coisogenic) [25] (Megan)

consensus sequence - a nucleotide sequence that is common, though not necessarily identical, in different genes and in genes from different organisms that are associated with a particular function. [26] (Megan)

contigs (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [27], Max Win)

controlled vocabulary - a set of terms used to standardize the description of characteristics in organisms' genomes, as designated by the Gene Ontology (GO) project ([1] Campbell, Claudia)

coverage - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)

CPAN (Comprehensive Perl Archive Network) - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [28](Will).

Cytogenetics-the study of normal and abnormal chromosomes. This involves studying the causes of chromosomal abnormalities and looking at the structure of chromosomes (7 Pallavi).

D

DCCD - dicyclohexylcarbodiimide; compound that acts as a proton ATPase inhibitor (Sugiyama 1994 article, Olivia)

de novo synthesis - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [29] (Matt)

dehydrogenase - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [30] (Peter)

dendrogram - a tree diagram used to illustrate the arrangement of the clusters produced by hierarchial clustering based on the degree of similiarity of characteristics. Dendrograms are often used in computational biology to illustrate the grouping of genes or samples. [31](William G.)

deoxyribodipyrimidine photolyase - enzyme which breaks the errant covalent bonds that form pydrimdine dimers. UV light is a common cause of this particular anomaly and causes covalent bonds to form between adjacent pyrimidines. Many archaea and bacteria use deoxyribodipyrimidine photolyases in order to break these bonds and avoid errors during replication or transcription [32]. (Pyfrom)

diatom - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [33] (Mary)

dicotyledon - a group of flowering plants that has two leaves in the embryo of the seed. Most have net-veined leaves, and the vessels in the stem are arranged in a circle near the stem surface. [34] Blueberries are dicotyledon. [35] (Laura M.)

domain (protein) - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. (Wikipedia article, Laura)

dirigent proteins - a protein that controls the stereochemistry of a compound synthesized by other enzymes. Ex: In lignin formation, dirigent proteins are suggested to "direct the coupling of two monolignol radicals, producing a dimer with a sinlge regio- and stereo- configuration." [36] (William G.)

dot plot-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[37], Max Win)

draft genome- a genome that has been sequenced by computers and programs but has not yet been reviewed by humans in order to create a finished genome. Draft genomes usually contain gaps or mistakes due to the limited capacity of the programs used for sequencing (Lecture, Pyfrom).

E

EC number (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [38] (Mary)

Edman degradation-A method for sequencing amino acids in a peptide chain. It allows the ordered protein sequence to be determined by proceeding from the N-terminus of the chain and piecing together fragmented sequenced chains of a protein [39] (Katie).

Eurosid clade -Lauren

E-value (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[40], Max Win)

epistasis - the interaction between two or more genes to control a single phenotype. Epistasis is not the same as dominance; dominance involves the interaction of two alleles for the same gene, whereas epistasis is the interaction of different genes. [41] (Megan)

Ericaceaea - The family of plants that blueberry belongs to. This family includes herbs, subshrubs, shrubs and trees, and grows best in acidic soils Flora of North America (Lexi).

expressed sequence tag (EST) – a short piece (200-500bp) of transcribed cDNA that can be used to determine the position of an expressed gene within the genome [42]. (Pyfrom)

extremophile - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [43] (Will).

F

FASTA format - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [44] (Nick)

family (protein) - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. (Wikipedia article and lecture, Laura)

finished genome - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)

Fragaria vesca - Strawberry, a fruit related to blueberry that had its genome sequenced in 2010. Strawberry has a relatively small genome (240 Mb), compared to the 487 Mb genome of the grape, demonstrating that there is great variability in the genomic structure of related species Strawberry Genome Paper Grape Genome Paper (Lexi).

frustule - a hard, porous cell wall made up of silica that makes up the outermost layer of diatoms. These structures have complex and elaborate designs (Wikipedia Claudia)

fusion mRNA-mRNA that results from the transcription of a gene after a chromosomal translocation event. This results in an mRNA sequence that comes from two different genes (Rowley and Blumenthal 2008 Science Pallavi)


Flavonoids - polyphenolic biochemical compounds that have been shown to have antioxidant effects. They are known to be found in fruits, vegetable, olive oil, cocoa and beverages such as tea and red wine. The most common flavonoids include anthocyanins, flavols, flavones, flavanones, flavan-3-ols, and isoflavones. [45] (Lauren)

G

GAF Domain - A GAF domain is a small-molecule binding unit present in all domains of life. It is a light-responsive domain found in plant and cyanobacterial phytochromes (a pigment photoreceptor used to detect light). This domain plays an important role in an organism's ability to respond to its environment. (Baliga et. al., Molecular Interventions, Ecomii Claudia)

gap - a region of the genome for which no sequence is currently available. Two types of gaps exist: heterochromatic gaps consist largely of a highly repetitive sequence (and is therefore difficult to determine the exact non-overlapping sequence of), and euchromatic gaps are more likely to contain genes. [46] (Megan)

GC Content - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [47] (Matt)

GC-skew – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[48], Max Win)

gene amplification - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [49] [50] (Matt)

gene calling - Determining which parts of a sequenced genome represent genes. This process could also be called gene finding. The process is generally fully automated. Magnaporthe grisea Automated Gene Calling(Karen)

gene fusion-occurs when DNA segments of two different genes come together. Can result in hybrid proteins (9 Pallavi)

gene knockout - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [51] (Matt)

gene oncology- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[52], Max Win)

gene transfer - the incorporation of a DNA segment into an organism's cells, or DNA. This usually occurs through a vector such as a virus. This method is used in gene therapy. (Genomics.energy.gov Claudia)

genome annotation - the process of attaching biological meaning to sequence data. In other words, genome annotation involves determining where genes are located in a genome and discovering functions of these genes. Genome annotation: from sequence to biology (Karen)

glaucophyte - freshwater algae that have not been studied well [53](Samantha)

H

haemolysin or hemolysin - a chemical produced by a bacteria that causes lysis of red blood cells [54] (Nick)

halophile - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [55] (Matt)

haplotype-collection of alleles that travel together (Lecture, Pallavi)

haptophyte - phylum of algae [56](Samantha)

heterokont - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [57](Samantha)

Heterologous -literally meaning, “derived from a different organism,” heterologous refers to the fact that the gene/protein of interest was taken from a different cell type or species than the gene/protein recipient [58]. (Katie)

Hidden Markov Model - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. (Wikipedia and lecture, Laura)

hierarchical genome shotgun sequencing - a method for sequencing genomic DNA. Genomic DNA is cut into pieces of about 150 Mb and inserted into BAC vectors, transformed into E. coli where they are replicated and stored. The BAC inserts are isolated and mapped to determine the order of each cloned 150 Mb fragment. This is referred to as the Golden Tiling Path. Each BAC fragment in the Golden Path is fragmented randomly into smaller pieces and each piece is cloned into a plasmid and sequenced on both strands. These sequences are aligned so that identical sequences are overlapping. These contiguous pieces are then assembled into finished sequence once each strand has been sequenced about 4 times to produce 8X coverage of high quality data [59]. (Pyfrom)

HMM Logo - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. (How to read HMM Logos, on Pfam, Laura)

homeobox - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [60](Samantha)

homodimer - a protein made of paired identical polypeptides (Answers.com, Jay)

horizontal gene transfer-DNA transmission between species and incorporation of the DNA into the recipient's genome (horizontal gene transfer Pallavi)

Hox gene-a gene that contains a homeobox region that is involved in morphogenesis along the cranio-caudal body axis (4 Pallavi)

hydrolase - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [61] (Nick)

Hydropathy analysis - This method determines the hydrophobic nature of an amino acid sequence. It uses a window moving through the sequence, summing the Gibbs free energy values for each amino acid and running these values through programs to determine hydrophobic segments. [62] In respect to halophiles, there is evidence to suggest that protein stability, in some cases, may be dependent upon high salt concentrations and since the hydrophobic nature of proteins increase stability, it is important to be able to measure stability in terms of hydrophathy [63] (Katie)

hypothetical protein - A hypothetical protein is a gene encoded by a genome that has a predicted function, but this function has not been experimentally tested or proved. The predicted function is determined by the protein's structural similarities to proteins of known function as well as the protein's sequence makeup. It has no analogs in the protein database. (Web Definitions Claudia)

I

ideogram - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)

identities - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)

Illumina sequencing - Illumina instruments amplify DNA fragments in situ on a flow cell. Fragment colonies are dispersed on the flow cell at a low concentration at first, allowing for non-overlapping fragment colonies. Clusters are promoted by isothermal bridging amplification. The amplification increases the density of these colonies. Florescently labeled nucleotides are cyclically washed over the flow cell. These nucleotides are conjugated with reversible terminators so that the four nucleotide bases can be simultaneously incorporated base by base across the flow cell. Laser induced excitation of the cell allows imaging of the excited flourophores. The use of a flow cell and reversible terminator allows the Illumina Genome Analyzer to produce 600 Mb of DNA per day with only 36 bp reads. The tradeoff between pyrosequencing methods and the flow cell method is increased throughput for shorter reads. The raw accuracy of the Illumina genome analyzer is over 98.5%. Increased coverage is necessary when using sequencers with high raw error rates. [64] [65] (Jared)

immunopreciitation - the technique of precipitating a protein out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a sample containing many thousands of different proteins [66]. (Pyfrom)

indel - term used to describe insertions or delations within a genome. Since an insertion in one genome is a deletion in another, "indel" is a catch-all term coined to remove the relative subjectivity of determining a mutation as being either an insertion or deletion (Lecture, Pyfrom).

indole-a chemical compound that is produced from the break down of tryptophan (indole Pallavi)

inclusion body - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [67] (Nick)

intergenic distance - The distance (in base pairs) between genes wikipedia (Karen)

intron - a region of DNA in a gene that is not part of the final coding sequence for the protein. [68] (Peter)

IS elements - (insertion sequence element) sequences of DNA that can transpose to new positions in the genome. This can cause disruptions in other gene coding regions and major reorganizations of the genome Baliga et al., 2004 (Karen)

isoelectric point - the pH at which a molecule is neutral [69] (Nick)

Isoprenoid lipids -lipids made from five carbon isoprene units, also known as isoterpene units which is the organic compound CH2=C(CH3)CH=CH2. [70]. The side chains in phospholipids are built from isoprene instead of fatty acids in archaea, making them isoprenoid lipids [71]. (Katie)

isozymes - members of a gene family with very similar cellular roles (Campbell-Heyer Genomics textbook, Jay)

J

Junk DNA - sections of DNA that do not code for genes, or a label for stretches of DNA for which no function has been identified. Non-coding DNA is often referred to as "junk DNA." [72] (Megan)

K

KEGG (Kyoto Encyclopedia of Genes and Genomes) - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [73](Will).

kinase - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [74] (Peter)

Kozak consensus sequence - a sequence present in eukaryotic mRNA and that is upstream of the start codon, and plays a major role in the initial binding of mRNA to ribosomes that facilitate translation. ([75], Lauren)

Kyte Doolittle Hydropathy plot - Lauren

L

lateral gene transfer - see "horizontal gene transfer" (Pallavi)

lignin - a protein found in the cell wall of plants. It is important in the stiffness and strength of the plant stem. It also makes the cell wall waterproof, allowing transport of water and solutes through the vascular system. [76] (Laura M.)

Liposome - microscopic fluid filled vesicle whose phospholipid walls are identical to that of the cell membrane and are often used as models for artificial cell membranes, which is useful in studying the uniqueness of archaeal membranes outside of the archaea organism, and drug delivery [1] (Katie).

M

Manatee - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [77](Will).

metabolism - chemical reactions organisms utilize in order to maintain life. Metabolism can be constructive such as anabolism in which energy is used to create cell components like protein, or it can be destructive such as catabolism where a substance such as sugar is systematically broken down in order to harvest energy for the organism. Wikipedia (Karen)

microsatellites-stretches of repetitive, short DNA segments that can be used to track the inheritance of certain traits within families (3 Pallavi)

minisatellites-segments of DNA that can be used for individual identification (ex. DNA fingerprinting) or in determining relationships between people (ex. paternity cases) (2 Pallavi).

monocotyledon - a group of flowering plants that has one seed-leaf (cotyledon). In most, the leaf veins are parallel, and the vessels in the stem are scattered. [78] (Laura M.)

motif - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[79], Max Win)

mycoplasma - genus of bacteria that lack a cell wall [80] (Nick)

Myb transcription factors - a family of proteins that regulate gene expression within the cell by binding directly to DNA. Absence of Myb factors has been shown to cause various types of cancer by inhibiting cell division. Myb proteins are identified by a number of imperfect tandem repeats known as the "Myb domain" which serve to identify where the protein binds to the DNA. Myb factors have been linked to various flavonoid pathways within plants. [81] (Dylan)

N

NCBI - (The National Center for Biotechnology Information) is a division of the National Library of Medicine (NLM) in the National Institutes of Health (NIH). This organization seeks to develop and make available information technologies for use in discovering and deciphering the fundamental molecular and genetic processes affecting health and disease. (NCBI Claudia)

Nhx - Family of antiporter proteins in plants responsible for regulating intercellular pH. One member of the family, Nhx1, is a Na+/H+ antiporter. 1 (Lexi)

NORFs (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[82], Max Win)

nucleolar organizer - the region of a chromosome around which the nucleolus forms after cell division. It contains tandem repeats of rRNA genes, which are transcribed, processed and formed into ribosomes (with the addition of ribosomal proteins) in the nucleolus. [83] [84] (Laura M.)

nucleomorph - reduced eukaryotic nuclei found in plastids [85](Samantha)

O

object-oriented programming - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will).

open reading frame (ORF)-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) ORF (Pallavi)

operon - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [86] (Nick)

opsin - In eukarya, this is a group of light sensitive G protein-coupled receptors often found in the retina. In prokaryotes, opsins are used to fix carbon by harvesting energy from light. Additionally, these receptors are independent of any chlorophyll pathway Wikipedia (Karen)

optical mapping-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome optical mapping (Pallavi)

origin of replication - the sequence in a genome where DNA replication( in Eukaryotes and Prokaryotes) or RNA replication (in RNA viruses) is initiated. In Eukaryotes there are multiple origins of replication that aid in speeding up the process of replication within the cell. [87], Lauren)

ortholog - one within a group of DNA sequences each found in separate genomes that look very similar. Orthologs may have an evolutionary relationship, but the term itself does not imply the presence or absence of one. (Lecture, Olivia)

oxidoreductase - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [88] (Nick)

P

palaeo-hexaploidy -Lauren

paralog- identical DNA sequences within a species (Lecture, Pallavi)

p-arm - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) (MedTerms Dictionary, Jay)

pectin - a polysaccharide found in and between the cell walls of plants, which helps to keep cells rigid by regulating water flow between cells. It functions as a gelling agent in making fruit jellies and jams. [89] (Laura M.)

peptidyl transferase - an enzymatic part of the ribosome that catalyzes the peptide bonds between the amino acids during translation. Peptidyl transferase activity is done by rRNA in the large subunit (60S in eukaryotes) of the ribosome. [90] [91] (Laura M.)

Perl - Developed by Larry Wall in 1987, Perl is a high-level programming language used frequently by biologists and bioinformaticists [92] (Will).

periplasmic space - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [93] (Peter)

Pfam - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. (Pfam Help, Laura)

plasmid - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [94](Peter)

plastid - major organelles in plants or algae [95](Samantha)

pleomorphism - the occurrence of two or more structural forms during a life cycle [96] (Mary)

phylogenetic tree - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [97] (Nick)

phylotypes – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[98], Max Win)

phytanyl lipids - Organically, a phytanyl is a branched-chain hydrocarbon containing 20 carbon atoms [99]. Phytanyl lipids are often found in the membrane of archaea and are thought to contribute to increased membrane stability at high salt concentrations [van de Vossenberg et al. Extremophiles (1999) 3:253-257]. (Katie)

phytochrome - a pigment that acts as a photoreceptor that triggers a response or signaling cascade in many plants and bacterial organisms as well as some animals. It is made up of a chromophore, or a compound that absorbs visible light, which is bound to a protein. Phytochrome is one of the most intensely colored pigments found in nature. This intense pigmentation allows the organism to sense even dim light. (Ecomii, Phytochrome Claudia)

positives - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [100] (Mary)

promoter - a region of DNA that facilitates transcription of a gene; promoters are typically located closely upstream of the gene they regulate [101] (Megan)

proteome - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [102](Samantha)

proton pump - an integral membrane protein capable of transporting protons across a membrane. Mitochondria utilize proton pumps in order to create a proton gradient used for producing ATP. Wikipedia (Karen)

PSORT - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. (PSORT, Laura)

pseudogenes-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)

purine - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [103] (Peter)

p-value - probability associated with a statistical test of the difference between populations. Populations are considered significantly different if the associated p-value is small (typically 0.1 or smaller). Discovery Genomics, Proteomics and Bioinformatics[104], Pyfrom)

pyrosequencing - Pyro.jpg(image from [105]) (Jared)

pyrimidine - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [106] (Peter)

Q

q-arm - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) (MedTerms Dictionary, Jay)

query sequence - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. (BLAST on Wikipedia, Laura)

R

RAST - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([107], Max Win)

rDNA-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. (rDNA Pallavi)

replicon - a region of DNA or RNA that replicates from a single origin of replication [108] (Megan)

repressor - a protein that binds to a section of DNA in order to regulate one or more genes by decreasing the rate of transcription [109] (Megan)

residue (protein) - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. (Pfam Help, Laura)

Resveratrol - part of the stilbene family,a polyphenol compound found in grapes, blueberries,and other food that has been shown to have cancer-preventive antioxidant, antimutagen activity and anti-inflammatory activity. [110](Lauren)

retinal - vitamin A aldehyde; a chromophore (colour-producing molecule) that is bound to proteins called opsins. For example, Haloarcula and other halophilic archea have a light-driven proton pump such as bacteriorhodopsin. This pump contains a reddish-purple retinal that absorbs green visible light. (Wikipedia, Olivia)

retropseudogenes-these are genes that have been reverse-transcribed from mRNA and the resulting DNA sequence is incorporated back into the genome. They are non-functional segments of DNA and can be distinguished from pseudogenes in that they do not have intron sequences. (1 Pallavi)

retrotransposons - RNA transcribed back into DNA and added into the genome [111](Samantha)

ribonuclease - a nuclease that catalyzes the degradation of RNA into smaller components [112] (Mary)

ribosome binding site (RBS) - short purine-rich sequence found directly (4-8 bp) upstream of the start codon of a protein coding sequence to which ribosomes bind to begin translation. The RBS sequence tends to be species-specific, and the consensus sequence acts as a good indicator of the start site of a gene (Bakke et al 2009 and Lecture, Olivia)

ribozyme - an RNA molecule that acts as an enzyme to catalyze a reaction. Some ribozymes can catalyze self-splicing by folding in order to remove introns without the need for a protein. (Lecture, Olivia)

RNAi (RNA interference) - a process by which short pieces if RNA are used to degrade larger pieces of complementary RNA. It is found in all eukaryotes and is being considered as a possible approach for gene therapy where a reduced gene product would alleviate symptoms [113]. (Pyfrom)

RNA polymerase I - an enzyme in eukaryotic organisms that transcribes pre-rRNA 45S, which is processed to form 28, 18, and 5.8 rRNA molecules. These forms of RNA account for over 50% of the RNA synthesized in a typical cell. [114] [115] (Laura M.)

RNaseP - a ribozyme that cleaves off a precursor section of RNA from a tRNA molecule. Previously, it was thought that this gene was necessary for life and therefore ubiquitous. However, species of archaea have been discovered that have adapted to life without this ribozyme. Wikipedia; Life without RNaseP (Karen)

S

Serovar-a subdivision of a species based on the characteristics of their cell surface antigens (serovar Pallavi)

sequence tag site (STS) - A sequence-tagged site (or STS) is a short (200 to 500 base pair) DNA sequence that has a single occurrence in the genome and whose location and base sequence are known [116]. (Pyfrom)

scaffold - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected (MedTerms Dictionary, Jay)

Shadow enhancers - secondary enhancers that are thought to be important for natural selection to occur in regulatory DNA segments. They evolve much faster than primary enhancers, which suggests that they are under fewer functional constraints (Wray and Babbit 2008 Science Pallavi)

Shine-Dalgarno sequence - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and Wikipedia article, Laura)
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.

Section - A taxonomic term analogous to subgenus. High bush blueberry belongs to the cyanococcus section of vaccinium. (Lexi)

SignalP - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. (SignalP Output explained, Laura)

signal peptide - a short peptide chain that directs the post-translational transport of a protein [117] (Matt)

small nuclear ribonucleic acid (snRNA) - small RNA molecules found in the nucleus of eukaryotic cells. They combine with specific proteins (called Sm proteins) to form ribonucleoprotein complexes (snRNPs), which function in removal of introns during RNA splicing. [118] (Laura M.)

Smith-Waterman alignment - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [119](Will).

SNP (Single Nucleotide Polymorphism) - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [120](Will).

Stilbenes - polyphenolic compounds have been the focus of clinical research for cancer prevention. [4] One of the most commonly known stilbene, resveratrol, has been shown to have anticancer properties and the ability to suppress proliferation of cancer cells.[121] (Lauren)


subject sequence - In BLAST, the sequences retrieved from the database, which are compared for similarity to the query sequence, are considered subject sequences. As a general rule, subject sequences should be longer than the query sequence. BLAST searching (Karen)

symporter - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [122] (Peter)

synteny - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor (Answers.com, Jay)

synthetase - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [123] (Peter)

T

tandem array - a series of copies of a gene back-to-back on a chromosome. These genes are transcribed at the same time and ensure that many copies of the gene product are made by the cell. Ribosomal RNA genes are often in tandem arrays. [124] (Laura M.)

tannin - (Lauren)

TATA box - (Leland)

toxicogenomics - a subdiscipline of genomics that deals with gene and protein activity in order to determine how organisms respond to toxins in the environment. This has important implications for research concerning the effects of toxins on genetic material, and how that affects the organism in question (MedTerms, WebDefinitions Claudia).

transcriptome - the set of all mRNA molecules transcribed from a genome [125] (Megan)

transferase - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [126] (Matt)

transmembrane helix - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [127](Mary)

transposons / transposable elements - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [128](Samantha)

transposon mutagenesis - a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene (transposon mutagenesis Pallavi)

'trans-splicing '- fragmented exon sequences fuse to form a mature species of mRNA. This process results in fusion mRNA (8 Pallavi).

tRNA splicing endonuclease - an enzyme that cleaves intervening sequences of precursor tRNA. [129] (Peter)

Tribe - (Lexi)

type strain - an isolated sample of an organism that acts as the reference point for defining that species (Lecture, Olivia)

U

V

Vertical gene transfer-the transmission or absorption of genetic material that is associated with sexual reproduction and, thus, acknowledges species-specific boundaries (6 Pallavi)

Vitis vinifera - also known as grapes or grapevines are dicotyledonous plants [130]. (Lauren)

Vaccinium corymbosum - (Leland)

Vaccinium macrocarpon - Cranberry, a fruit closely related to the blueberry belonging to the subgenus (or, section) Ocycoccos of Vaccinium (Lexi).

W

whole genome dupliction(WGD) - (Lauren)

whole genome shotgun sequencing - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [131](Samantha)

X

xenobiotic - a substance that is found within an organism that is not normally produced or expected to be found within that organism [132] (Megan)

xenolog - homologs that are created by horizontal gene transfer between two different species [133] (Matt)

Y

Z