Blueberry Genome Project for Bio343

From GcatWiki
Jump to: navigation, search

This page will be used by Davidson College students in the Genomics Laboratory course.

GenSAS Blueberry Database

Vaccinium Database

Spring 2014

Broccoli Mineral Final Papers with SSR Primers Included - 2014







Broccoli Mineral Final Oral Presentations - 2014

Broccoli Mineral Accumulation Network Reports - 2014

Blueberry SSR Reports - 2014

KEGG Grape Folate Pathway

How to convert an Excel Sheet into a wiki table. Copy and Paste is all you need.

Wiki Glossary

  • W85-20, it is one of the parents of Jeannie's mapping population
  • Vaccinium corymbosum Encyclopedia of Life
  • Highbush Blueberry Management Book
  • Plants in the same family
  • Transcriptome Analysis of Blueberry using 454 EST Sequencing
  • EBI training online
  • UCSC Genome Browser Training
  • iPlant Tools
  • Taxonomy ID: 69266
  • common name: highbush blueberry
  • common name: American blueberry
  • authority: Vaccinium corymbosum L.
  • Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; asterids; Ericales; Ericaceae; Vaccinioideae; Vaccinieae; Vaccinium
  • Arabidopsis thaliana = Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; malvids; Brassicales; Brassicaceae; Camelineae; Arabidopsis
  • Vitis vinifera = Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; rosids incertae sedis; Vitales; Vitaceae; Vitis

FINAL PAPERS with SSR Details in Appendices

Spring 2013 Projects

Cold tolerance Cold Acclimation Paper Media:Blueberry_cold_acclimation.pdf

Drought stress

Dormancy induction

Fruit firmness

Fruit size

Time of fruit ripening

Time of bloom

Disease resistance to fungal diseases

Disease resistance to viral diseases

Heat stress - Malcolm - Media:HeatStress_SSR_Primers.docx

From Jeannie: "All the 454 ESTs are there on the Towson website and in the short read archive of GenBank. All the Sanger ESTs are in the EST database of GenBank."

Example Formatting of SSR primer report

SSR Submission Format.png

Spring 2012

Aaron_D will focus on color of blueberries

Mike_N will focus on timing of blooming

Shamita_P will focus on cold tolerance

Malcolm_C will focus on Allan's list and then chilling requirement

All 4 EST PPT presentations

SSR Guidelines: Ideally, I’d like my primer as close to the gene as possible. The further you get the more likely you are to have recombination between the marker and gene of interest. I also tend to prefer di and tri nucleotide repeats of lengths greater than 5 as these tend to be the most polymorphic among different lines. Total fragment length (Both primers plus sequence between them) is ideally above 100bp and less than 700bp. Smaller fragments are hard to score accurately and fragments longer than 700bps can’t be scored accurately on automated capillary sequencers due to the limits of the PCR reaction and the lane standards in fragment analysis kits.

We thought we might focus on the transposable element family called MITEs. Here is a are four papers to get us started. In the end, we decided this was not the best use of our time.

  1. MITE-Hunter (Han et al, NAR 2010)
  2. Identification of MITE/siRNA function (Kuang et al, 2008)
  3. Active TE in Rice
  4. amplification of MITEs in rice

Therefore, we have decided to focus on the ESTs which we have not explored at all. Each member of the team is using his or her SSR genes to query the EST database. We can also use the SSR primers to find bigger portions of the scaffolds until we get a hit in the ESTs (if possible). Once we have EST hits, we will download the EST sequences and use those to BLAST against the genome assembly scaffolds again to see if we get more scaffold hits.

After that, we may consider self-infertility, but we'll see if we have time.

Finally, we will schedule a trip to DHMRI in Kannoplis to see the sequencers.

Spring 2011 Personal Lab Notebooks









Team Lab Notebooks

Leland & Will

Dylan & Jared

Lauren & Puneet

Lexi & Laura

Team Foci For Projects

Priority List of Topics

Small-scale Projects

Large-scale Projects
Tutorials, Past and Present

Spring 2011

  1. Laura = rRNA gene identification File:RRNAtutorial.docx
  2. Lexi = find gene structure of orthologs File:Genomics Tutorial.docx
  3. Puneet = tRNAs identification File:Finding tRNAs.docx; powerpoint File:Finding tRNA tutorial.pptx
  4. Leland = Parsing Blast Results from Your Favorite Database
  5. Jared = Potential Gene Across-Species Phylogenetic Analysis with Mr. Bayes
  6. Lauren = how to deal with multi-named genes
  7. Dylan = tBLASTn and Protein Sequence Analysis
  8. Will = File:How to Deal With 3 Partial Genome.docx

Fall 2009

  1. Media:Creation of Sequence Logos Using WebLogo.doc (Katie)
  2. Determining whether genes called in JGI and RAST are identical (Karen)
  3. The Ins and Outs of ClustalW2 (Sarah)
  4. Mastering the Art of NCBI: It's a BLAST (Claudia)
  5. Media:ClustalW_Tutorial.doc - (Olivia, Fall 2009)
  6. Media:KEGG_pathway_tutorial.doc - (Megan)
  7. Olivia - perl script to compare proteomes (links to Katie's and Megan's pages)
  8. Katie - two web pages, one for downloading original perl scripts and one for sample small scale version (convert to fasta and compare proteomes)
    link Proteome Compare
  9. Claudia - How To Find and Format Genome Sequences
  10. Megan - Determining Unique and Conserved Proteins: How to Use Katie's Webpage
  11. Karen - how to deal with output from web pages
  12. Sarah - CRISPR resources

Fall 2008

  1. Will DeLoache - BioPerl Installation
  2. Max Win - Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)
  3. Pallavi - Conserved Domains Database (CDD) Media:CDDtutorial.doc
  4. Mary - Protein Data Bank (PDB) Media:PDB Tutorial.doc
  5. Laura Voss - Pfam Database Pfam Tutorial
  6. Samantha Simpson - NCBI BLAST
  7. Peter Bakke - Media:ShineDalgarnoTutorial.doc
  8. Jay McNair - Origin of Replication Tutorial
  9. Nick Carney - Navigating the JGI Database Media:NavigatingJGItutorial.doc
  10. Matt Lotz - SEED Viewer - Media:SEEDTutorial.doc
  11. Pallavi: I will compare RAST and KEGG in pathway annotations and use Glycolysis/Gluconeogenesis as my example: Media:Pallavitutorial.doc
  12. Matt: WikiPathways Media:WikiPathwaysTutorial2.doc
  13. Mary: ENZYME Media:ENZYME tutorial.doc
  14. Samantha: How To Determine EC Numbers
  15. Nick: Metacyc Media:MetaCyc tutorial.doc
  16. Max: KGML How to color EC numbers in KEGG maps and view it in KGML graph editor
  17. Jay: SEED Scenario Paths (a tool to determine completeness of pathways)
  18. Laura: Pathway Entrances and Exits
  19. Will: Running BLAST Locally
  20. Peter: Exploring Proteases: MEROPS Peptidase Database Tutorial - Media:MEROPStutorial_PB.doc

Links to Multiple Databases

Papers of Interest

Submitted Course Assignments

Glossary words (A - Z):

# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


5' Cap - a methylated guanine nucleotide that is added to the 5' end of a mRNA molecule in eukaryotes. It is added by a 5' to 5' triphosphate linkage, and it gives the mRNA resistance to 5' exonucleases. [1] (Laura M.)

16S rRNA - ribosomal RNA found in the small subunit of prokaryotic ribosomes. rRNA functions in decoding mRNA and interacting with tRNAs in translation. Particularly 16S rRNA is a well-conserved gene found in all organisms (in prokaryotes and eukaryotic mitochondria) often used in comparative genomes when studying phylogeny (Lecture, Olivia)

454 Sequencing - 454 instruments are pyrosequencers that carry out many reactions at a time (parallel sequencing) in wells of a PicoTiter Plate. Beads coated with thousands of homogeneous DNA fragments are added to individual wells on the plate. The DNA fragments are amplified in an oil emulsion mixture with DNA polymerase and primers. dNTPs are sequentially added to the wells one at a time and washed. The process of continuous washing and the sequencial addition of dNTPs, DNA polymerase, luciferase, and ATP-sulfurylase explains the high reagent costs of sequencing. ATP-sulfurylase converts the PPi released from each dNTP addition to the complementary strand of the original ssDNA to ATP. ATP fuels luciferase in each well. The light produced is detected with a flourescence microscope. The current (2009) 454 FLX system has the ability to sequence 100 Mb DNA in 8 hours with an average read of 250 bp and raw accuracy of 99.5%. [2] [3] (Jared)


abscisic acid- a key regulator of fruit ripening in nonclimacteric fruit, such as blueberries [4] (Stewart Dalton)

accession number - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [5] (Will).

acid invertase- an enzyme essential to sucrose metabolism, specifically in fruit, that hydrolyzes sucrose into fructose and glucose. Low levels of acid invertase have been shown to be associated with high levels of intracellular sucrose, and hence, to regulate storage and breakdown of sugar (sucrose) in fruit.[6] (Lauren)

acyltransferases - Enzymes that catalyze the transfer of an acyl group from a donor (such as acetyl CoA) to an acceptor. Activity of these enzymes adds a great deal of diversity athnocyanins, flavonoids, and phenolic compounds in Vaccinum Corymbosum[7] (Puneet)

adaptors- short DNA sequences that can be attached to DNA fragments and used for amplification and sequencing [8] (Stewart Dalton)

adsorption - the accumulation of molecules on the surface of a material. This can be part of a lab procedure to purify and isolate a specific portion of a cell or a protein (Wikipedia, Olivia)

alien genes - genes found in a genome that appear to have been inserted into an organism's genome from another species, more than likely through horizontal gene transfer ([1] Campbell, Claudia)

alternative splicing - the process by which one gene can be translated into different protein isoforms. This is done by reconnecting the exons of the RNA produced in transcription in multiple ways during RNA splicing. ([9] Dylan)

allele frequency - Allele frequency is a measure of the relative frequency of an allele at a genetic locus in a population. Ex: .36 indicates that 36% of individuals have that allele, and 64% do not. [10](Erich)

allogeneic - variation in alleles among members of the same species. ([11] William G.)

aneuploidy - an individual has a different number of chromosomes than typically found in the wild type. For example, sex-chromosome aneuploidy in humans can result in the phenotypes of Turner syndrome (XO) or Klinefelter syndrome (XXY). Griffiths et al, 2000 (Austin)

ANOVA - Analysis of Variance is a statistical test that analyzes the effect of a particular variable on multiple groups of data. By analyzing the variance between the means of each group of data, ANOVA can determine if there is any statistically significant difference between any of the means. [12] [13] (Chadinha)

anthocyanidin - a naturally occurring sugar-free plant pigment and a member of the flavonoid family. Anthocyanidins result from the degradation of anthocyanins. pH level affects the color of anthocyanidins. A low pH results in colored anthocyanidins, while higher pH gives anthocyanidins that are without color. [14] [15] (Daniel)

anthocyanins - a member of the flavonoid family that changes color with pH, giving various fruits their coloration. The health benefits of anthocyanin are potentially great, with laboratory results suggesting positive effects against cancer, aging and neurological diseases, inflammation, diabetes, and bacterial infections. It is, however, poorly conserved during digestion and would have to be modified somehow for medicinal use. [16] [17] (Dylan)

antisense (RNA or DNA)-a piece of DNA or RNA that binds to a complementary sequence of DNA or RNA. These segments of genetic material can be used to identify the existence of a disease gene and they can also be used to bind to specific DNA or mRNA sequences to inhibit their function (5 Pallavi).

Apollo - Gene annotation software that allows you to visualize genes you have identified, your annotations for them, and where they lie within a genome Berkeley(Lexi).

Arabidopsis thaliana - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics (, Jay)

Archaea - one of the three evolutionary domains. A group of unicellular prokaryotes that were previously grouped with Bacteria, but have some genes and metabolic pathways more similar to eukaryotes, such as those involved in transcription and translation. Many Archaea are extremophiles, such as Halobacteria that thrive in high-salt environments (Lecture, Olivia)

Archaeal rhodopsins - Archaeal rhodopsins are light-sensitive and light-activated transmembrane proteins only found in archaeal plasma membranes. Bacteriorhodopsin (BR) and Halorhodopsin (HR) are both archaeal rhodopsins that are proton and chloride light drive pumps, respectively, indicating that the functionality of archaeal rhodopsins is diverse [18] (Katie)

assembly - the process of taking many short sequences of DNA, often from whole genome shotgun sequencing, and compiling overlapping regions to create a representation of the chromosomes from which the DNA originated. ([19] Mike)

auxins a type of plant hormone that promotes cell growth and helps determine the shape of a plant. Auxins can positively affect both the vertical formation of the stem and the lateral growth of roots. Auxins also mediate the growth of fruit and the release of seeds.([20] [21] , Daniel)

Avirulence genes - the parasitic counterpart to a resistance, or "R" gene of a host. R genes protect against the gene product of the avirulence, or Arv gene. ([22] TK)


BAC - bacterial articifical chromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms (, Jay)

Bacteriorhodopsin- A transmembrane archaeal rhodopsin protein that uses light energy to move protons across membranes, creating an electrochemical gradient that is converted into chemical energy [23] (Katie).

Bacterioruberin - Bacterioruberin is a “carotenoid pigment” found in some halophiles giving them a red color and providing assumed protection from strong sunlight [24]. The structure also plays a stabilizing role in the archaeal rhodopsin proteins [25] (Katie).

bioinformatics - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [26] (Matt)

bit score - a measurement of a sequence’s alignment, describing the likelihood that a sequence is a random match rather than an authentic homologue to a defined sequence in the database. The calculation of a bit score accounts for the presence of gaps and the number of alignments between the experimental and database sequences. McEntyre and Ostell, 2002 (Austin)

BLAST - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [27] (Mary)

Blastula - is a hollow sphere of cells that transitions to the gastrula through a process of cell division known as clevage in the early stages of embryonic development. [28] (William G.)

BLASTx - a BLAST search (see BLAST) in which a nucleotide sequence is entered and translated by BLAST before comparing to a protein database. [29] (Aaron)

Bligh-Dyer method- A lipid extraction method that uses chloroform-methanol as a solvent but also includes a re-extraction of the sample, just with chloroform, before evaporation of the solvent to capture more non-polar lipids. [30] The lipid membrane of archaea is extremely unique not only in composition (see Isoprenoid lipids) but also in the archaeal rhodopsins that are scattered among the plasma membrane [31]. In order to study the uniqueness of archaeal membranes one needs to observe the lipids outside of the membrane, which the Bligh-Dyer method accomplishes (Katie)

bioinformatics - The science of managing and analyzing biological data using advanced computing techniques; bioinformatics is especially important in analyzing genomic research data. ([32] [33] Mike)

bioperl- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [34] (Wikipedia, Max Win)

bootstrap value - common reliability test of a phylogenetic tree, calculated as a percentage. In generating a phylogenetic tree, the sequences will be resampled, or rerun, multiple times. If a pair of sequences are consistently grouped together for 100 out of 100 resamplings, then the certainty that those sequences are correctly grouped would be very high, and the bootstrap value would be 100. If a pair of samples were grouped together only 50 out of 100 resamplings, the certainty that those sequences are correctly grouped would be lower; the bootstrap value would be 50. On phylogenetic trees, these values may be placed adjacent to the group to which they refer. (Lecture, Olivia)


CAGE - Cap Analysis Gene Expression. A technique for identifying the start sites for transcription and determining the amount of promoter usage in eukaryotic genomes. Small fragments (20-21 nucleotides) from the beginnings of mRNAs are extracted, reverse-transcribed to DNA, PCR amplified, and sequenced. These sequences (called "tags") are compared against a known genome to identify exact transcription start sites. ([35] Dylan)

carbon fixation - using carbon dioxide to create organic materials [36] (Samantha)

CCCP - carbonyl cyanide m-chlorophenyl hydrazone; a nitrile ionophore that inhibits oxidative phosphorylation and photophosphorylation. Ionophores are lipid-soluble molecules allowing them to transfer across membranes, creating pores that disrupt transmembrane ion gradients. (Sugiyama 1994 article, Olivia)

cell division control (Cdc) protein - for example, Cdc6 found in Halorhabdus utahensis; protein responsible for activating and maintaining mechanisms of cell division. Cell division control proteins are important in annotation because the presence of a Cdc gene is a good indicator for finding the origin of replication in a circular chromosome. (Bakke et al 2009, Olivia)

CDD (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [37] (Mary)

cDNA - DNA that is reverse-transcribed from mature mRNA. A cDNA library provides templates for genes that are expressed within an organism. [38]. (Pyfrom)

centimorgan (cM) - A unit of measure of genetic recombination frequency, and therefore genetic linkage. One centimorgan is equal to a 1% chance that a marker at one genetic locus will be separated from a marker at a second locus due to crossing over in a single generation. In human beings, one centimorgan is equivalent, on average, to about one million base pairs. ([39] [40] Mike)

chaperonin - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [41] (Matt)

chemoorganotrophic - refers to organisms that obtain energy from oxidation/reduction reactions using organic electron donors (Link, Earthlife Claudia)

chemotaxis - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [42] (Nick)

chemotaxonomy - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [43] (Mary)

chilling requirement - the minimum time period a fruit bearing plant must spend in cold weather in order to blossom, often expressed in chill hours, which are calculated from duration spent at certain temperatures. ([44] Mike)

chimeric genome - A genome that consists of a mixture of genes from distinct species Baliga et al., 2004 (Karen)

Chloroplast chromosome - circular DNA found in the photosynthesizing organelle (chloroplast) of plants instead of the cell nucleus where most genetic material is located. This genome codes mostly for redox proteins involved in electron transport in photosynthesis. ([45] Dylan)

Circos Plot - A circular representation of the genome(s) for one or more species. It illustrates the extent of gene duplication within the species and/or orthologous sequences between multiple species by connecting lines between regions of the chromosomes that share the same DNA. In many cases, Circos plots endow us with a "tangible" understanding of where gene duplication may have occurred. For example, in the figure below, the series of red lines that connect portions of chromosomes 17 and 18 show that those regions share the same DNA. On this particular plot, the dark blue coloring on certain areas of all chromosomes signifies the extent to which that genetic region is duplicated in other parts of the genome. (Berger et al, 2011Image courtesy of Imperial College London, Shamita)
cladogram - A visual representation of relatedness among species that shows common ancestry via the formation of branch points on the tree. The species similarity is computationally determined, and based on the similarity of their DNA and/or RNA sequences. ([46],Image courtesy of Curtis Clark Shamita)

cloud computing - dividing data processes, and inputting parts of these processes into nodes to spread out heavy computational workloads among many computers or sections of computers running simultaneously. Cloud computing has become especially popular in the field of genomics. Assembly algorithms may take days to sort through terabytes of data for a genome with high coverage. One option for external cloud services is Amazon's Elastic Computing Cloud (EC2). A labratory could also build an internal cloud, linking all computers in the lab together. Ubuntu, an open source, linux-based operating system, now has cloud support. [47],[48] (Jared)

climacteric/non-climacteric fruit–Some plants are susceptible to the effects of ethylene because it can trigger the maturation of fruit, opening of flower buds, and shedding of leaves. Such plants are referred to as climacteric, because their respirations increase with a concomitant increase in ethylene. Examples include bananas, apples, apricots, and peaches. Other plants, however, exhibit a decrease in respiration rates at fruit maturation and do not respond to an endogenous release of ethylene. Some examples include blueberries, strawberries, and grapes—all referred to as non-climacteric fruit. ([49], Shamita)

ClustalW - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [50] (Will).

CNV - copy number variants are a type of genomic variation that results in a varying number of copies of a particular section of the genome, much like a VNTR. The difference between a CNV and a VNTR is the scope, a VNTR will have a varying number of repeats of segment A. A CNV would be a sequence that started with segments A, B, C but became A, A, B, C or A, B, B, C – where a particular segment of a larger segment is copied. [51] (Chadinha)

COG (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs (COG Pallavi)

comparative genomics - the study of relationships between genomes of different strains and species. Comparative genomics aims to define similarities and differences in structure and/or function of different proteins, RNAs and regulation between organisms (Wikipedia and Lecture, Olivia)

complex traits - traits that are controlled by two or more genes and can be affected by environmental factors; therefore, Mendellian genetics rules do not directly apply to these traits [52] (Mark A.)

compression block - a compression block is a segment of repetitive DNA that has been shortened by the software used in shotgun sequencing. SSRs and VNTRs are candidate areas for compression blocks forming. [53] (Chadinha)

concatemer - long continuous DNA molecule that contains the same DNA sequence repeated in series [54](Samantha)

concordance rate - given that one individual has a certain inherited trait, the probability that the second individual in a pair with the first will have that same trait; often the pair described is a set of twins [55] (Mark A.)

congenic - two strains of an organism that are nearly identical, varying only at a single locus (also called coisogenic) [56] (Megan)

consensus sequence - a nucleotide sequence that is common, though not necessarily identical, in different genes and in genes from different organisms that are associated with a particular function. [57] (Megan)

conserved genes - regions of similar or identical sequences within DNA or proteins across species. Sequence conservation generally implies that there is a conserved gene in that location. Highly conserved genes are oftentimes necessary for survival and, therefore, any mutations are eliminated through natural selection. ([58] Dylan)

conserved ortholog set (COS)- a group of genes between different species that throughout evolution are conserved in sequence and copy number. ([59], Catherine Doyle)

contigs (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [60], Max Win)

controlled vocabulary - a set of terms used to standardize the description of characteristics in organisms' genomes, as designated by the Gene Ontology (GO) project ([1] Campbell, Claudia)

coding capacity- is the percentage of RNA-coding DNA present in a given genome (Discovery Genomics, Proteomics and Bioinformatics [61], Catherine Doyle )

coverage - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)

CPAN (Comprehensive Perl Archive Network) - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or which is used as an installer for Perl modules such as BioPerl [62](Will).

Cryptochrome - a class of blue-light sensitive flavoproteins found in plants and animals. In plants, these proteins regulate germination and light-dark cycles. ([63] TK)

Cytochrome P450 - a group of enzymes that catalyze the oxidation of organic substances. Plant cytochromes P450 are involved in several different reactions which lead to various hormones, fatty acid conjugates, and defensive compounds. Terpenoids are often substrates for plant CYPs. ([64], Daniel)

Cytogenetics-the study of normal and abnormal chromosomes. This involves studying the causes of chromosomal abnormalities and looking at the structure of chromosomes (7 Pallavi).

cytokinins - plant hormones that promote cell division in both plant shoots and roots. The levels of auxins and cytokinins are in a constant ratio in plants. This ratio is crucial for the proper formation of plants both laterally and vertically. Cytokinins are sometimes used by farmers to generate more crops. plants[65] [66], Daniel)


digenic phenotype - phenotype caused by two genes, not one. ([67], Leland)

Disease triangle - a concept applied to plant disease epidemiology that represents the necessity of a pathogen, host, and conducive environment for disease to occur. ([68] TK)

DCCD - dicyclohexylcarbodiimide; compound that acts as a proton ATPase inhibitor (Sugiyama 1994 article, Olivia)

de Bruijin graphs - graphic representations of groups of short letter strands (k-mers). Used in genomic assembly, the graphs consist of rectangles of short nucleotide sequences and their reverse complements. Sequences vertically protruding from these rectangles overlap and share these rectangle base sequences. Arcs connect nodes of linked overlapping sequences.

Zerbino and Birney (2008) developed Velvet, a set of algorithms designed to manipulate these graphs in order to assemble high coverage genomes consisting of short reads. [69] (Jared)

definition line (DEFLINE) - the first line of the FASTA format that starts with a “greater than” (>) symbol, followed by a description of the submitted sequence. McEntyre and Ostell, 2002 (Austin)

degenerate bases- have the ability to form stable base pairs with more than one base [70] (Stewart Dalton)

dehydrogenase - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [71] (Peter)

dendrogram - a tree diagram used to illustrate the arrangement of the clusters produced by hierarchial clustering based on the degree of similiarity of characteristics. Dendrograms are often used in computational biology to illustrate the grouping of genes or samples. [72](William G.)

de novo synthesis - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [73] (Matt)

de novo transcriptome assembly - assembly of all mRNA transcripts without a reference genome [74] (Lecture, Mark A.)

deoxyribodipyrimidine photolyase - enzyme which breaks the errant covalent bonds that form pydrimdine dimers. UV light is a common cause of this particular anomaly and causes covalent bonds to form between adjacent pyrimidines. Many archaea and bacteria use deoxyribodipyrimidine photolyases in order to break these bonds and avoid errors during replication or transcription [75]. (Pyfrom)

diatom - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [76] (Mary)

DICER1 - a protein in the RNA induced silencing complex (RISC). DICER cleaves double stranded mRNAs, rendering them untranslatable. The protein belongs to the helicase family. Defects in the enzyme have been implicated in pleuropulmonary blastoma, a developmental cancer of the lungs. [77] (Jared)

dicotyledon - a group of flowering plants that has two leaves in the embryo of the seed. Most have net-veined leaves, and the vessels in the stem are arranged in a circle near the stem surface. [78] Blueberries are dicotyledon. [79] (Laura M.)

diplotype - Diplotype (or haplotype pair) is the subset of every single-locus genotype; both genotype and diplotype represent the types of chromosome pairs in each individual. What allele is to genotype, haplotype is to diplotype [80](Erich)

dirigent proteins - a protein that controls the stereochemistry of a compound synthesized by other enzymes. Ex: In lignin formation, dirigent proteins are suggested to "direct the coupling of two monolignol radicals, producing a dimer with a sinlge regio- and stereo- configuration." [81] (William G.)

DNA (deoxyribonucleic acid) - The nucleic acid that forms the basis of the genetic material in most organisms. DNA is composed of the four nitrogenous bases Adenine, Cytosine, Guanine, and Thymine, covalently bonded to a backbone of deoxyribose-phosphate to form a DNA strand. Two complementary strands (where all Gs pair with Cs and As with Ts) form a double helical structure which is held together by hydrogen bonding between the complimentary bases. ( [82] [83] Mike)

domain (protein) - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. (Wikipedia article, Laura)

dormancy- A temporary period of minimal or no growth which allows plants to survive great variation in temperature or other conditions. Seed dormancy delays germination, and is induced by abscisic acid. Through chemical treatment, it is possible to reverse the effects of dormancy in plants. ([84], Daniel)

dot plot-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[85], Max Win)

downregulation - when a cell decreases production of a particular cellular component, such as a protein, in response to a stimulus. [86] (Chadinha)

draft genome- a genome that has been sequenced by computers and programs but has not yet been reviewed by humans in order to create a finished genome. Draft genomes usually contain gaps or mistakes due to the limited capacity of the programs used for sequencing (Lecture, Pyfrom).

durable resistance genes - R genes are termed "durable" when the resistance they confer continues to be effective through multiple conditions and generations of offspring, opposed to resistance that breaks down over widespread use due to pathogen mutations. ([87] TK)


epigenetic regulation - changes in phenotypes that are caused by mechanisms other than DNA sequence. DNA methylation is an example of this. ([88], Leland)

EC number (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [89] (Mary)

Edman degradation-A method for sequencing amino acids in a peptide chain. It allows the ordered protein sequence to be determined by proceeding from the N-terminus of the chain and piecing together fragmented sequenced chains of a protein [90] (Katie).

E-value (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[91], Max Win)

Endophyte - a bacterium or fungus that lives inside a plant for part or most of the plant's life without causing disease. ([92] TK)

ENZYME - an enzyme database with links to a variety of resources (KEGG, BRENDA, PubMed, etc.) specific to a query. Users can search based on enzyme commission (EC) number, enzyme family, cofactor, and more. [93] (Aaron)

epistasis - the interaction between two or more genes to control a single phenotype. Epistasis is not the same as dominance; dominance involves the interaction of two alleles for the same gene, whereas epistasis is the interaction of different genes. [94] (Megan)

Ericaceaea - The family of plants that blueberry belongs to. This family includes herbs, subshrubs, shrubs and trees, and grows best in acidic soils Flora of North America (Lexi).

ELIP - Early light induced proteins are stress-induced proteins that respond to light. These proteins are found in plants and algae and are usually localized to the thylakoid membrane. ELIPs are involved in photosynthesis. [95] (Chadinha)

ELSI - A research initiative funded by the US Department of Energy and National Institutes of Health to study the ethical, legal, and social issues (ELSI) brought about by the availability of genetic information. This program dealt with knowledge in both the Human Genome Project and other work of medicinal and health import. ([96] Dylan)

Ethylene - a gas that forms through the Yang Cycle from the breakdown of methionine. Ethylene is produced at a faster rate in rapidly growing cells and affects cell growth/shape.

eugenics - The study of improving a species by artificial selection; the term usually refers to the selective breeding of humans. ([97] [98] Mike)

exon - portions of a nucleic acid sequence represented in mature RNA, as opposed to introns which are spliced out. ( [99] Mike)

expressed sequence tag (EST) – a short piece (200-500bp) of transcribed cDNA that can be used to determine the position of an expressed gene within the genome [100]. (Pyfrom)

external guide sequence (EGS) - a method of inhibiting the expression of genes by the binding of short RNA sequences to mRNAs. mRNAs form a shape resembling tRNA precursors, which RNAseP can recognize and cleave. (Discovery Genomics, Proteomics and Bioinformatics[101], Catherine Doyle)

extremophile - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [102] (Will).


FASTA format - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [103] (Nick)

family (protein) - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. (Wikipedia article and lecture, Laura)

finished genome - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)

fold coverage - c= (L*N)/G, L= average read lengths, N= number of reads, G= genome size. A higher fold coverage allows for higher final accuracy statistically due to a larger sample size in calculating the mode nucleotide across point polymorphic sites (between reads) e.g. 12X coverage means 12X redundacy of bases, higher base accuracy and higher accuracy of assembly [104] (Jared)

Fragaria vesca - Strawberry, a fruit related to blueberry that had its genome sequenced in 2010. Strawberry has a relatively small genome (240 Mb), compared to the 487 Mb genome of the grape, demonstrating that there is great variability in the genomic structure of related species Strawberry Genome Paper Grape Genome Paper (Lexi).

frustule - a hard, porous cell wall made up of silica that makes up the outermost layer of diatoms. These structures have complex and elaborate designs (Wikipedia Claudia)

fusion mRNA-mRNA that results from the transcription of a gene after a chromosomal translocation event. This results in an mRNA sequence that comes from two different genes (Rowley and Blumenthal 2008 Science Pallavi)

Flavonoids - polyphenolic biochemical compounds that have been shown to have antioxidant effects. They are known to be found in fruits, vegetable, olive oil, cocoa and beverages such as tea and red wine. The most common flavonoids include anthocyanins, flavols, flavones, flavanones, flavan-3-ols, and isoflavones. [105] (Lauren)


GAF Domain - A GAF domain is a small-molecule binding unit present in all domains of life. It is a light-responsive domain found in plant and cyanobacterial phytochromes (a pigment photoreceptor used to detect light). This domain plays an important role in an organism's ability to respond to its environment. (Baliga et. al., Molecular Interventions, Ecomii Claudia)

gap - a region of the genome for which no sequence is currently available. Two types of gaps exist: heterochromatic gaps consist largely of a highly repetitive sequence (and is therefore difficult to determine the exact non-overlapping sequence of), and euchromatic gaps are more likely to contain genes. [106] (Megan)

gap penalty - The penalty applied due to gap(s) during sequence alignment, necessary to see similarities between sequences that would otherwise be considered radically dissimilar. Gaps arise during sequence comparison due to insertions or deletions. Gap penalties are usually subtracted from a cumulative score being determined by an optimization algorithm that attempts to maximize that score. A higher gap penalty will cause less favorable characters to be aligned, to avoid creating as many gaps. ([107] Mike)

GC Content - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [108] (Matt)

GC-skew – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[109], Max Win)

gene amplification - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [110] [111] (Matt)

gene calling - Determining which parts of a sequenced genome represent genes. This process could also be called gene finding. The process is generally fully automated. Magnaporthe grisea Automated Gene Calling(Karen)

gene cluster - genes that encode similar or related products that are located close together on the genome [112] (Mark A.)

gene fusion-occurs when DNA segments of two different genes come together. Can result in hybrid proteins (9 Pallavi)

gene family- is a group of genes that participate in the same processes and share similar characteristics such as, DNA sequence, structure, and function. ([113], Catherine Doyle)

Gene gun - A gene gun, or "biolistic gene delivery system," is a device designed for delivering genetic material by injecting a heavy metal particle coated with plasmid DNA at high speed. ([114] TK)

gene knockdown - similar to gene knockout, this technique involves the reduction of expression through use of complementary DNA or RNA that lasts only a short period of time before returning to normal. [115] (William G.)

gene knockout - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [116] (Matt)

gene mutation- a permanent change to the base pair sequence of a gene [117] (Stewart Dalton)

Gene Network - A network shows the interactions among parts of a whole and can be applied to any level of biology, from the genetic to the ecosystem level. Within the study of genomics, networks are typically represented as gene regulatory networks, which show how genes, transcripts and proteins interact to regulate a particular pathway. Institute for Systems Biology(Lexi)

gene oncology- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[118], Max Win)

gene patent - In genetics, a patent applies to a particular gene sequence discovery and reserves rights to it and any process involved in obtaining or using the gene product for the individual or group responsible for the discovery. ([119] Dylan)

gene transfer - the incorporation of a DNA segment into an organism's cells, or DNA. This usually occurs through a vector such as a virus. This method is used in gene therapy. ( Claudia)

gene density- the number of genes per million base pairs [120] (Stewart Dalton)

gene library- a collection of genomic material that represents the genetic sequence of a particular organism. This collection is usually stored in microorganisms via cloning [121] [122] (Stewart Dalton)

Gene-for-gene resistance - A theory of parasite-host interaction that states that a host's ability to resist disease and a parasite's ability to cause disease are controlled by a matching pair of genes -- the host's resistance (R) gene and the parasite's avirulence (Arv) gene. ([ TK)

Genome - The full set of an organism's hereditary information. The genome is encoded as either DNA or RNA and includes both genes and non-coding regions. Wikipedia article (Puneet)

genome annotation - the process of attaching biological meaning to sequence data. In other words, genome annotation involves determining where genes are located in a genome and discovering functions of these genes. Genome annotation: from sequence to biology (Karen)

genome coverage- also known as "depth of coverage", genome coverage refers to determining the base pair sequence of a section of the genome multiple times for accuracy in order to reassemble DNA fragments to their original order [123] (Stewart Dalton)

genomic islands - are mobile or immobile DNA segments that differ between orthologs. [124] (Catherine Doyle)

GenSAS - Genome Sequence Annotation Server for automated annotation of whole genomes. ([125] TK)

germination - a period of growth during which a plant begins to sprout from a seed. Germination is directly affected by temperature, light, water, and oxygen. Gibberellins also promote germination. Other forms of germination occur in bacteria and fungus. ([126], Daniel)

gibberellins (GA) - a plant hormone that promotes germination and breaks dormancy. Gibberellins reverse the inhibiting effects of abscisic acid. ([127], Daniel)

glaucophyte - freshwater algae that have not been studied well [128](Samantha)

GLIMMER- is a system that recognizes coding regions of DNA for finding genes in microbial DNA. ([129], Catherine Doyle)

gynandromorph - organisms that contain both male and female cells and thereby express both male and female characteristics. [130] (William G.)


haemolysin or hemolysin - a chemical produced by a bacteria that causes lysis of red blood cells [131] (Nick)

halophile - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [132] (Matt)

haplogroup - branches on the ancestry tree of Homo sapiens that reflect early migrations. Geneticists differentiate these groups by examining variations in mtDNA (origins of mother) and the Y chromosome (origins of father) [133] (Jared)

haplotype-collection of alleles that travel together (Lecture, Pallavi)

haptophyte - phylum of algae [134](Samantha)

hemizygous - the state of having unpaired gene(s). A common example is male humans, every male has a single X chromosome thus many of the genes are unpaired. Males are hemizygous for every unpaired gene on their X chromosome. [135] (Chadinha)

heterokont - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [136](Samantha)

heterodimer - a protein complex of two distinct, unique macromolecules [137] (Mark A.)

Heterologous -literally meaning, “derived from a different organism,” heterologous refers to the fact that the gene/protein of interest was taken from a different cell type or species than the gene/protein recipient [138]. (Katie)

Heterosis - the improved or increased function of any biological quality in a hybrid offspring. ( [139] Mike)

Hidden Markov Model - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. (Wikipedia and lecture, Laura)

hierarchical genome shotgun sequencing - a method for sequencing genomic DNA. Genomic DNA is cut into pieces of about 150 Mb and inserted into BAC vectors, transformed into E. coli where they are replicated and stored. The BAC inserts are isolated and mapped to determine the order of each cloned 150 Mb fragment. This is referred to as the Golden Tiling Path. Each BAC fragment in the Golden Path is fragmented randomly into smaller pieces and each piece is cloned into a plasmid and sequenced on both strands. These sequences are aligned so that identical sequences are overlapping. These contiguous pieces are then assembled into finished sequence once each strand has been sequenced about 4 times to produce 8X coverage of high quality data [140]. (Pyfrom)

High Throughput Biology (Sequencing, Genomics, etc) - Method of biology which utilizes new technologies to collect and analyze large volumes of data through biochemical manipulations of large numbers of samples 1 (Lexi)

HMM Logo - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. (How to read HMM Logos, on Pfam, Laura)

homeobox - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [141](Samantha)

homodimer - a protein made of paired identical polypeptides (, Jay)

Homolog - Protein or gene that is derived from a common ancestor (Lecture; Wikipedia article) (Puneet)

horizontal gene transfer-DNA transmission between species and incorporation of the DNA into the recipient's genome (horizontal gene transfer Pallavi)

Hox gene-a gene that contains a homeobox region that is involved in morphogenesis along the cranio-caudal body axis (4 Pallavi)

HMP - Human Microbiome Project; a genome initiative that aims to identify and characterize the microorganisms that inhabit the human body. ([142] TK)

hydrolase - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [143] (Nick)

Hydropathy analysis - This method determines the hydrophobic nature of an amino acid sequence. It uses a window moving through the sequence, summing the Gibbs free energy values for each amino acid and running these values through programs to determine hydrophobic segments. [144] In respect to halophiles, there is evidence to suggest that protein stability, in some cases, may be dependent upon high salt concentrations and since the hydrophobic nature of proteins increase stability, it is important to be able to measure stability in terms of hydrophathy [145] (Katie)

hypothetical protein - A hypothetical protein is a gene encoded by a genome that has a predicted function, but this function has not been experimentally tested or proved. The predicted function is determined by the protein's structural similarities to proteins of known function as well as the protein's sequence makeup. It has no analogs in the protein database. (Web Definitions Claudia)


inducer - a molecule that amplifies gene expression. ([146], Leland)

ideogram - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)

identities - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [] (Mary)

Illumina sequencing - Illumina instruments amplify DNA fragments in situ on a flow cell. Fragment colonies are dispersed on the flow cell at a low concentration at first, allowing for non-overlapping fragment colonies. Clusters are promoted by isothermal bridging amplification. The amplification increases the density of these colonies. Florescently labeled nucleotides are cyclically washed over the flow cell. These nucleotides are conjugated with reversible terminators so that the four nucleotide bases can be simultaneously incorporated base by base across the flow cell. Laser induced excitation of the cell allows imaging of the excited flourophores. The use of a flow cell and reversible terminator allows the Illumina Genome Analyzer to produce 600 Mb of DNA per day with only 36 bp reads. The tradeoff between pyrosequencing methods and the flow cell method is increased throughput for shorter reads. The raw accuracy of the Illumina genome analyzer is over 98.5%. Increased coverage is necessary when using sequencers with high raw error rates. [147] [148] (Jared)

immunopreciitation - the technique of precipitating a protein out of solution using an antibody that specifically binds to that particular protein. This process can be used to isolate and concentrate a particular protein from a sample containing many thousands of different proteins [149]. (Pyfrom)

imprinting - a genetic phenomenon by which certain genes are expressed depending on the parent of origin. For the vast majority of autosomal genes, expression occurs from both alleles simultaneously. However a small proportion of genes are imprinted, meaning that gene expression occurs from only one allele (which came from a specific parent). For example, in humans, the gene encoding Insulin-like growth factor 2 (IGF2/Igf2) is only expressed from the allele inherited from the father. ([150] Mike)

indel - term used to describe insertions or delations within a genome. Since an insertion in one genome is a deletion in another, "indel" is a catch-all term coined to remove the relative subjectivity of determining a mutation as being either an insertion or deletion (Lecture, Pyfrom).

indole-a chemical compound that is produced from the break down of tryptophan (indole Pallavi)

inclusion body - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [151] (Nick)

in situ hybridization - a process that detects a DNA sequence in a bacterial or eukaryotic cell from the binding of a complementary DNA or RNA probe. Fluorescence in situ hybridization (FISH) involves fluorescently labeling the DNA or RNA probe, which can then be used to map the presence of specific genes or DNA regions. Pevsner, 2009 McEntyre and Ostell, 2002 (Austin)

integrated microbial genome (IMG) system- is a data management and analysis tool of genomes, genes, and functions for microbial genomes of all three domains of life ([152], Catherine Doyle)

intergenic distance - The distance (in base pairs) between genes wikipedia (Karen)

intron - a region of DNA in a gene that is not part of the final coding sequence for the protein. [153] (Peter)

ion torrent - a high-throughput DNA sequencing technology. A plate of pH sensors is placed under a well array containing DNA and all machinery required for replication. Each well is given a small amount of one nucleotide type. If the nucleotide is added to the DNA, a proton is released as a natural byproduct. The change in pH is detected and recorded. If the nucleotide is repeated in the DNA sequence and multiple bases of the same nucleotide are added, the resulting change in pH is greater and recorded as a larger pH shift. Because each well is independently monitored, they can contain different strands of DNA. Thus, the parallel processing capabilities for this DNA sequencing method are massive. [154] (Aaron)

IS elements - (insertion sequence element) sequences of DNA that can transpose to new positions in the genome. This can cause disruptions in other gene coding regions and major reorganizations of the genome Baliga et al., 2004 (Karen)

isoelectric point - the pH at which a molecule is neutral [155] (Nick)

Isoprenoid lipids -lipids made from five carbon isoprene units, also known as isoterpene units which is the organic compound CH2=C(CH3)CH=CH2. [156]. The side chains in phospholipids are built from isoprene instead of fatty acids in archaea, making them isoprenoid lipids [157]. (Katie)

isozymes - members of a gene family with very similar cellular roles (Campbell-Heyer Genomics textbook, Jay)


Junk DNA - sections of DNA that do not code for genes, or a label for stretches of DNA for which no function has been identified. Non-coding DNA is often referred to as "junk DNA." [158] (Megan)


karyotype – a magnified photograph of an organism’s chromosomes, detailing the quantity and magnitude of each chromosome. At a rudimentary level, karyotypes can be used to detect large parental chromosomal recombination, which can potentially result in disease. Pevsner, 2009 Image courtesy of the National Human Genome Research Institute (Austin)

KEGG (Kyoto Encyclopedia of Genes and Genomes) - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [159](Will).

kinase - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [160] (Peter)

Kozak consensus sequence - a sequence present in eukaryotic mRNA and that is upstream of the start codon, and plays a major role in the initial binding of mRNA to ribosomes that facilitate translation. [161] (Lauren)

Kyte Doolittle Hydropathy plot - a plot used to determine the hydrophobic character of an amino acid sequence. Peaks higher than 1.6 on the plot, suggest the sequence in question contains hydrophobic regions and is possibly localized within or around a membrane. Peaks less than 1.6, suggest the amino acid sequence does not have a membrane spanning domain. [162] Lauren


lateral gene transfer - see "horizontal gene transfer" (Pallavi)

leucine-rich repeat - a protein motif that consists of repeating sections of 20-30 amino acids that contain a high number of leucines (a highly hydrophobic amino acid) that fold to form an α/β horseshoe; this region of the protein is often associated with protein-protein interactions [163] (Mark A.)

lignin - a protein found in the cell wall of plants. It is important in the stiffness and strength of the plant stem. It also makes the cell wall waterproof, allowing transport of water and solutes through the vascular system. [164] (Laura M.)

linkage groups- Genes that are often inherited as a single unit are said to form a linkage group and share an extremely low rate of recombination. ([165], Shamita)

Liposome - microscopic fluid filled vesicle whose phospholipid walls are identical to that of the cell membrane and are often used as models for artificial cell membranes, which is useful in studying the uniqueness of archaeal membranes outside of the archaea organism, and drug delivery [1] (Katie).

long terminal repeat (LTR) retrotransposons - retrotransposons whose coding region is flanked by repeating sequences of DNA that range from 100 bp to several Kb long; can be associated with specific genes [166] (Mark A.)


Manatee - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [167](Will).

mapping bin - a single region of a chromosomal reference map used in mapping studies. Bins are defined in relation to molecular markers (e.g. SSRs). [168] [169] (Aaron)

marker assisted selection - a process whereby a marker, in our case genetic, is used for indirect selection of a genetic determinant or determinants of a trait of interest. ( [170] Mike)

metabolism - chemical reactions organisms utilize in order to maintain life. Metabolism can be constructive such as anabolism in which energy is used to create cell components like protein, or it can be destructive such as catabolism where a substance such as sugar is systematically broken down in order to harvest energy for the organism. Wikipedia (Karen)

methylation - when DNA is methylated proteins (like transcription factors) can no longer bind to it. This is important to genomics because methylation is a way to activate or inactivate genes throughout the genome. A methylome is a complete description of the methylation status of a genome. (Discovering Genomics, Proteomics, & Bioinformatics pg 57, Leland)

metabolome - The complete set of small molecule metabolites (e.g. intermediates, products, etc.) found within an organism. The metabolome gives one an idea of the mechanisms underlying various metabolic pathways in an organism [171] (Puneet)

Metacyc - A database of metabolic pathways similar to KEGG. It can also be used to search for compounds, genes/proteins/RNAs, and reactions. [172] (Daniel)

microsatellites-stretches of repetitive, short DNA segments that can be used to track the inheritance of certain traits within families (3 Pallavi)

minisatellites-segments of DNA that can be used for individual identification (ex. DNA fingerprinting) or in determining relationships between people (ex. paternity cases) (2 Pallavi).

monocotyledon - a group of flowering plants that has one seed-leaf (cotyledon). In most, the leaf veins are parallel, and the vessels in the stem are scattered. [173] (Laura M.)

monosomy - only one copy of a chromosome is present instead of two (typically found in pairs, ex. humans). [174] (William G.)

mosaicism - the presence of two or more genetically different populations of cells that originated from the same zygote. Earliest examples involved the transplantation of a blastula stage embryo from one genetic background into another of a different genetic background. This allowed for expanding study of genes early in development. [175] (William G.)

motif - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[176], Max Win)

mycoplasma - genus of bacteria that lack a cell wall [177] (Nick)

Myb transcription factors - a family of proteins that regulate gene expression within the cell by binding directly to DNA. Absence of Myb factors has been shown to cause various types of cancer by inhibiting cell division. Myb proteins are identified by a number of imperfect tandem repeats known as the "Myb domain" which serve to identify where the protein binds to the DNA. Myb factors have been linked to various flavonoid pathways within plants. [178] (Dylan)


Nanopore sequencing - a sequence technology that measures changes in electric current when a single nucleotide passes through a nanopore, a piece of silicon containing a 1 nanometer hole. This process is still undergoing research for commercial viability. Wikipedia (Austin)

NextGen sequencing - is new sequence technology where small fragments of DNA are identified from signals emitted as each fragment is re-synthesized from a DNA template strand. ([179], Catherine Doyle )

NCBI - (The National Center for Biotechnology Information) is a division of the National Library of Medicine (NLM) in the National Institutes of Health (NIH). This organization seeks to develop and make available information technologies for use in discovering and deciphering the fundamental molecular and genetic processes affecting health and disease. (NCBI Claudia)

Nhx - Family of antiporter proteins in plants responsible for regulating intercellular pH. One member of the family, Nhx1, is a Na+/H+ antiporter. 1 (Lexi)

NORFs (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[180], Max Win)

nucleolar organizer - the region of a chromosome around which the nucleolus forms after cell division. It contains tandem repeats of rRNA genes, which are transcribed, processed and formed into ribosomes (with the addition of ribosomal proteins) in the nucleolus. [181] [182] (Laura M.)

nucleomorph - reduced eukaryotic nuclei found in plastids [183](Samantha)

N50 - median number of base pairs in a contig or supercontig; NOTE: the "50" is normally in superscript [184] (Lecture, Mark A.)


object-oriented programming - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will).

oligonucleotide - a short nucleic acid sequence (typically 50 or fewer bases) that is used as a DNA synthesis primer. They are formed from individual nucleotides to allow creation of any sequence necessary. Oligonucleotides are used in a number of procedures, including DNA microarrays, Southern blots, ASO analysis, fluorescent in situ hybridization (FISH), and the synthesis of artificial genes. ([185] Dylan)

ohnology - paralogous genes originating from a whole genome duplication. These genes are important to genomic analysis because they provide a series of genes that have all been diverging for the same amount of time since the duplication event. ([186] Dylan)

OLAP - Online Analytical Processing is used to answer multi-dimensional analytical queries. OLAP will gather data in a hypercube, from which three different operations can by carried out by the user: Consolidation (aggregation of data in one or more dimensions), drill-down (analyze the details of each component part), and slicing and dicing (analyzing a particular set of data from different view points). [187] (Chadinha)

Online Mendelian Inheritance in Man (OMIM) - a database detailing the genetic elements of all known diseases. On the whole, the database encompasses more than 12,000 genes. OMIM (Austin)

open reading frame (ORF)-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) ORF (Pallavi)

operon - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [188] (Nick)

opsin - In eukarya, this is a group of light sensitive G protein-coupled receptors often found in the retina. In prokaryotes, opsins are used to fix carbon by harvesting energy from light. Additionally, these receptors are independent of any chlorophyll pathway Wikipedia (Karen)

optical mapping-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome optical mapping (Pallavi)

origin of replication - the sequence in a genome where DNA replication( in Eukaryotes and Prokaryotes) or RNA replication (in RNA viruses) is initiated. In Eukaryotes there are multiple origins of replication that aid in speeding up the process of replication within the cell. [189], Lauren)

Origin Recognition Complex Subunit - 6-subunit DNA binding complex that binds in an ATP-dependent manner to the origin of replication. ([190], Catherine Doyle )

ortholog - one within a group of DNA sequences each found in separate genomes that look very similar. Orthologs may have an evolutionary relationship, but the term itself does not imply the presence or absence of one. (Lecture, Olivia)

osmolyte/osmoprotectant - Any compound that protects cells from desiccation by maintaining a high intracellular osmolality (osmotic concentration)[191](Erich)

oxidoreductase - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [192] (Nick)


polymerase chain reaction (PCR) - A technique used to amplify specific segments of DNA. The technique can be used to detect and amplify trace amounts of DNA into millions of copies. In a genomics setting, PCR has been adapted useful to quickly identify the species of an organism by using species specific primers. ([193] and Discovering Genomics, Proteomics, & Bioinformatics pg 146, Leland)

penetrance - refers to varying degrees of phenotypic expression of a gene. A gene with high penetrance always expresses the same phenotype. ([194], Leland)

paired ends- also known as mate ends, paired ends are two ends of the same DNA molecule (Lecture) (Stewart Dalton)

paralog- identical DNA sequences within a species (Lecture, Pallavi)

p-arm - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) (MedTerms Dictionary, Jay)

PDB - A database of proteins whose 3D structure has been ascertained. Includes various info about the structure and function of a protein. It's a great resource to check if you need more information about a protein. [195] (Erich)

pectin - a polysaccharide found in and between the cell walls of plants, which helps to keep cells rigid by regulating water flow between cells. It functions as a gelling agent in making fruit jellies and jams. [196] (Laura M.)

peptidyl transferase - an enzymatic part of the ribosome that catalyzes the peptide bonds between the amino acids during translation. Peptidyl transferase activity is done by rRNA in the large subunit (60S in eukaryotes) of the ribosome. [197] [198] (Laura M.)

Perl - Developed by Larry Wall in 1987, Perl is a high-level programming language used frequently by biologists and bioinformaticists [199] (Will).

periplasmic space - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [200] (Peter)

Pfam - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. (Pfam Help, Laura)

pharmacogenomics - how inherited genetic variations and the resulting genomic interactions alter the intended effects and side effects of drugs. Discovering Genomics, Proteomics, & Bioinformatics pg 333 (Jared)

phenylpropanoids - Plant-derived organic compounds derived from the amino acid phenylalanine. Phenylpropanoids are involved in a variety of essential functions such as plant defense, plant pollinator reactions, etc. [201] They potentially may be related to dietary health benefits seen in blueberries, as well. (Puneet)

phyletic pattern - a series of binary sequences from multiple sequenced genomes that detail specific genes. A “1” means that the particular gene is present in the genome, whereas a “0” means that the gene is absent. Mushegian, 2007 (Austin)

phylogenetic tree - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [202] (Nick)

phylotypes – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[203], Max Win)

phytanyl lipids - Organically, a phytanyl is a branched-chain hydrocarbon containing 20 carbon atoms [204]. Phytanyl lipids are often found in the membrane of archaea and are thought to contribute to increased membrane stability at high salt concentrations [van de Vossenberg et al. Extremophiles (1999) 3:253-257]. (Katie)

phytochrome - a pigment that acts as a photoreceptor that triggers a response or signaling cascade in many plants and bacterial organisms as well as some animals. It is made up of a chromophore, or a compound that absorbs visible light, which is bound to a protein. Phytochrome is one of the most intensely colored pigments found in nature. This intense pigmentation allows the organism to sense even dim light. (Ecomii, Phytochrome Claudia)

plasmid - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [205](Peter)

plastid - major organelles in plants or algae [206](Samantha)

pleiotropy - a single gene that causes many different physical traits like multiple disease symptoms. [207] (William G.)

polycistronic mRNA - mRNA that, when translated, can lead to multiple gene products due to their multiple ORFs with short untranslated regions between them [208] (Mark A.)

pleomorphism - the occurrence of two or more structural forms during a life cycle [209] (Mary)

polymorphism- A type of genetic variation that occurs at the same locus between individuals of the same species. The variation due to a polymorphism constitutes as different alleles of that gene. Some examples of common polymorphisms include SNPs (single nucleotide polymorphisms) and RFLPs (Restriction Fragment Length Polymorphism).([210], Shamita)

polyploid - cells and organisms containing more than two homologous sets of chromosomes. ([211], Daniel)

Populus trichocarpa - Also known as the California poplar, Populus is a deciduous broadleaf tree species often used as a model organism in plant biology. Its genome was published in 2006. [212] (Puneet)

positives - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [213] (Mary)

primer - A short oligonucleotide that provides a free 3’ hydroxyl binding site for DNA or RNA polymerase in order to initiate DNA or RNA synthesis ( [214] [215] Mike)

promoter - a region of DNA that facilitates transcription of a gene; promoters are typically located closely upstream of the gene they regulate [216] (Megan)

proteasome - A cellular protein complex responsible for the degradation of unneeded or damaged proteins. Usually recycles proteins tagged with ubiquitin, breaking their peptide bonds and cleaving them into short 7-8 amino acid bits. [217] (Erich)

proteome - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [218](Samantha)

protein mass fingerprinting (PMF)- a method to identify observed proteins in electrophoresis gels by matching masses of peptides contained in proteins to those of known proteins in a database.([219]; Discovery Genomics, Proteomics and Bioinformatics[220], Catherine Doyle)

proteogenomics - an emerging method of genome annotation that utilizes proteomic techniques such as mass spectrometry to better understand and annotate genes. ([221] TK)

proton pump - an integral membrane protein capable of transporting protons across a membrane. Mitochondria utilize proton pumps in order to create a proton gradient used for producing ATP. Wikipedia (Karen)

PSORT - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. (PSORT, Laura)

pseudogenes-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)

pseudochromosome - a chromosome comprised of contigs from a genome whose sequence is unfinished. [222] (Aaron)

purine - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [223] (Peter)

p-value - probability associated with a statistical test of the difference between populations. Populations are considered significantly different if the associated p-value is small (typically 0.1 or smaller). Discovery Genomics, Proteomics and Bioinformatics[224], Pyfrom)

pyrimidine - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [225] (Peter)

pyrosequencing - Pyro.jpg(image from [226]) (Jared)

pyrogram trace- The final step in the DNA sequencing technique of pyrosequencing. In this step, the nucleotide sequence is identified by the signal peaks generated by the pyrogram trace program. [227] (Stewart Dalton)


q-arm - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) (MedTerms Dictionary, Jay)

Quantative polymerase chain reaction (Q-PCR) - A method that serves to amplify and quantify the amount of a DNA in a sample. There are many variations of the method, but in Q-PCR, DNA polymerase produces a complementary DNA strand that binds to the template. Every time a replication event occurs on a specific sequence, a unit of fluorescence specific to that fragment is observed. The intensity of fluorescence is detected, which allows us to determine the amount of a specific sequence of DNA within a sample.(USCM Webpage, Shamita)

quantitative trait loci (QTL) - the effect of multiple loci on a trait that can be quantified phenotypically, and that varies in degree depending on the loci involved (Campbell & Heyer, 2007, Shamita)

query sequence - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. (BLAST on Wikipedia, Laura)


RAST - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([228], Max Win)

rDNA-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. (rDNA Pallavi)

reference genome - a genome that represents a standard for a species' genome, but it is not necessarily a "normal" example. The reference genome is used as a common point for comparisons among the implied variations that exist within the population. ( (Campbell & Heyer, 2007 [229] Mike)

refseq - Short for reference sequence. An NCBI project to create a database with a reference sequence for every 'central dogma' molecule - DNA, RNA, and proteins. [230](Erich).

replicon - a region of DNA or RNA that replicates from a single origin of replication [231] (Megan)

repressor - a protein that binds to a section of DNA in order to regulate one or more genes by decreasing the rate of transcription [232] (Megan)

residue (protein) - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. (Pfam Help, Laura)

resistance (R) genes - specific genes in a plant that, when expressed, increase resistance to certain pathogens; the product of the R gene recognizes the product of the pathogen's specific avirulence (Avr) gene, initiating a defense cascade [233] (Mark A.)

restriction enzyme - a protein that is commonly used in PCR and electrophoresis that cuts DNA at specific nucleotide sequence sites. Depending on the restriction enzyme and sequence selected, the number of restriction-enzyme cutting sites can vary from once every hundred base pairs to once every hundred thousand base pairs. Pevsner, 2009 (Austin)

Resveratrol - part of the stilbene family,a polyphenol compound found in grapes, blueberries,and other food that has been shown to have cancer-preventive antioxidant, antimutagen activity and anti-inflammatory activity. [234](Lauren)

retinal - vitamin A aldehyde; a chromophore (colour-producing molecule) that is bound to proteins called opsins. For example, Haloarcula and other halophilic archea have a light-driven proton pump such as bacteriorhodopsin. This pump contains a reddish-purple retinal that absorbs green visible light. (Wikipedia, Olivia)

retropseudogenes-these are genes that have been reverse-transcribed from mRNA and the resulting DNA sequence is incorporated back into the genome. They are non-functional segments of DNA and can be distinguished from pseudogenes in that they do not have intron sequences. (1 Pallavi)

retrotransposons - RNA transcribed back into DNA and added into the genome [235](Samantha)

Reverse Genetics - determining function from a known gene sequence. Traditional genetics works in the opposite direction. [236] (David)

RFLP - Restriction fragment length polymorphism. A type of polymorphism detectable in a genome by the size differences in DNA fragments generated by restriction enzyme analysis. [237] (Erich)

ribonuclease - a nuclease that catalyzes the degradation of RNA into smaller components [238] (Mary)

ribosome binding site (RBS) - short purine-rich sequence found directly (4-8 bp) upstream of the start codon of a protein coding sequence to which ribosomes bind to begin translation. The RBS sequence tends to be species-specific, and the consensus sequence acts as a good indicator of the start site of a gene (Bakke et al 2009 and Lecture, Olivia)

riboswitch - a 5' portion of mRNA that can regulate its own transcription. [239] (David)

ribozyme - an RNA molecule that acts as an enzyme to catalyze a reaction. Some ribozymes can catalyze self-splicing by folding in order to remove introns without the need for a protein. (Lecture, Olivia)

RNA (Ribonucleic Acid) - A category of nucleic acids in which the component sugar is ribose and consisting of the four nucleotides Thymidine, Uracil, Guanine, and Adenine. The three types of RNA are messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA). RNAs are essential to all known forms of life. ( [240] [241] Mike)

RNAi (RNA interference) - a process by which short pieces if RNA are used to degrade larger pieces of complementary RNA. It is found in all eukaryotes and is being considered as a possible approach for gene therapy where a reduced gene product would alleviate symptoms [242]. (Pyfrom)

RNA polymerase I - an enzyme in eukaryotic organisms that transcribes pre-rRNA 45S, which is processed to form 28, 18, and 5.8 rRNA molecules. These forms of RNA account for over 50% of the RNA synthesized in a typical cell. [243] [244] (Laura M.)

RNaseP - a ribozyme that cleaves off a precursor section of RNA from a tRNA molecule. Previously, it was thought that this gene was necessary for life and therefore ubiquitous. However, species of archaea have been discovered that have adapted to life without this ribozyme. Wikipedia; Life without RNaseP (Karen)


SAGE - Serial Analysis of Gene Expression. A technique for identifying and quantifying mRNA transcripts from eukaryotic genomes. This method is based on isolating and amplifying short sequence tags (~14 bp) from individual mRNAs into longer DNA molecules that are subsequently sequenced. A tag’s gene origin is determined via mapping of the tag to a reference genome. [245] (Erich)

Serovar-a subdivision of a species based on the characteristics of their cell surface antigens (serovar Pallavi)

sequence tag site (STS) - A sequence-tagged site (or STS) is a short (200 to 500 base pair) DNA sequence that has a single occurrence in the genome and whose location and base sequence are known [246]. (Pyfrom)

scaffold - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected (MedTerms Dictionary, Jay)

Section - A taxonomic term analogous to subgenus. High bush blueberry belongs to the cyanococcus section of vaccinium (Personal Communication, Grant Proposal). (Lexi)

Shadow enhancers - secondary enhancers that are thought to be important for natural selection to occur in regulatory DNA segments. They evolve much faster than primary enhancers, which suggests that they are under fewer functional constraints (Wray and Babbit 2008 Science Pallavi)

Shine-Dalgarno sequence - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and Wikipedia article, Laura)
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.

SignalP - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. (SignalP Output explained, Laura)

signal peptide - a short peptide chain that directs the post-translational transport of a protein [247] (Matt)

simple sequence repeat (SSR) - short, repetitive fragments of DNA that display a polymorphism in length, giving rise to allele variation in SSRs between individuals within a species. Also see microsatellite.(Soybean and Alfalfa Research Lab Shamita)

singleton - a segment of DNA with no overlapping sequences so it cannot be connected to other segments. [248] (Aaron)

small nuclear ribonucleic acid (snRNA) - small RNA molecules found in the nucleus of eukaryotic cells. They combine with specific proteins (called Sm proteins) to form ribonucleoprotein complexes (snRNPs), which function in removal of introns during RNA splicing. [249] (Laura M.)

Smith-Waterman alignment - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [250](Will).

SNP (Single Nucleotide Polymorphism) - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [251](Will).

SOAPdenovo - a package of algorithms developed by BGI for short-read de novo assembly of Homo sapien sized genomes. [252] (Jared)

Solanum lycopersicum - Commonly referred to as the tomato, Solanum lycopersicum is an effective model system for testing the functionality of various genes through transformation e.g. via agrobacteria (lecture) (Puneet)

SOLiD - a high-throughput DNA sequencing technology. The DNA sample is cleaved into fragments of a specific length. The fragments are hybridized to beads which are then covalently bound to a glass slide. DNA polymerase, a universal primer, and a collection of fluorescent dinucleotide probes (all 16 possible nucleotide combinations) are introduced to the beads. The appropriate probe is ligated and fluorescence is measured. The fluorescence dye is cleaved and the next probe is added. This process is replicated in 5 reading frames, offset by one base. [253] (Aaron)

spliceosome - a spliceosome is a protein and snRNA complex that removes introns from RNA before translation. [254] (Chadinha)

Stilbenes - polyphenolic compounds have been the focus of clinical research for cancer prevention. [4] One of the most commonly known stilbene, resveratrol, has been shown to have anticancer properties and the ability to suppress proliferation of cancer cells.[255] (Lauren)

subject sequence - In BLAST, the sequences retrieved from the database, which are compared for similarity to the query sequence, are considered subject sequences. As a general rule, subject sequences should be longer than the query sequence. BLAST searching (Karen)

subtracted cDNA library - The genetic library that results from a comparison of two different expression conditions (ie, two different tissues of an organism, two different species, or two different physical environments). The library is produced by gathering all expressed mRNAs from the two environments and constructing cDNAs from those mRNAs. Then, each set of cDNAs is mixed with the mRNAs from the opposite expression condition to observe whether formation of mRNA-cDNA complexes occurs. If some cDNAs from condition 1 fail to bind to the mRNAs from condition 2, it is assumed that those cDNAs are uniquely expressed in condition 1 only. The results unique cDNAs form a "subtracted" cDNA library. (PubMed: Subtracted cDNA Library, Shamita)

sucrose synthase - an enzyme essential to sucrose metabolism in fruits, that catalyzes the formation of the sugar sucrose from glucose and fructose. Loss or reduction of sucrose synthase has been shown to reduce both intracellular sugars and slow growth rates in fruits. [256] Lauren

supercontig - an as of yet uncommon term used to describe contigs with a known order but gaps prevent the creation of a scaffold. ([257] and lecture) (Aaron)

symporter - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [258] (Peter)

syngenic - members of the same species that are genetically identical. [259] (William G.)

synteny - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor (, Jay)

synthetase - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [260] (Peter)

Systems Biology - An emerging school of biology which utilizes high throughput data collection and analysis to study biological systems in a complex, integrated way that accounts for interactions within and among all levels of the system. The availability of full genome sequences has been crucial to the growth of this field. Institute for Systems Biology (Lexi)


t-test - a statistical test for determining the statistical relationship between experimental and hypothetical values. (Discovering Geneomics, Proteomics, & Bioinformatics p. 434 David)

tandem array - a series of copies of a gene back-to-back on a chromosome. These genes are transcribed at the same time and ensure that many copies of the gene product are made by the cell. Ribosomal RNA genes are often in tandem arrays. [261] (Laura M.)

tannin - a polyphenol molecule found in nuts, coffee, and fruits such as pomegranates, grapes, blueberries and cranberries that aids in the ripening of fruit and the aging process of wine. [262] (Lauren)

TATA box - a DNA sequence often found in promoters of archaea and eukaryotes. Useful in identifying possible promoter regions, and thereby genes after these regions. ([263], Leland)

taxonomy identifier (taxID) - a unique numerical identification for each member of the taxonomy database, such as species, a genus, or a family. Homo sapiens, for instance, has the taxID 9606. McEntyre and Ostell, 2002 (Austin)

tBLASTn - a BLAST search (see BLAST) in which a protein sequence is entered and compared to the translated nucleotide database. [264] (Aaron)

tBLASTx - a BLAST search (see BLAST) in which a nucleotide sequence is entered, recognized by the search engine to be a translated sequence, and compared to the translated nucleotide database. [265] (Aaron)

terpenoid - a type of naturally occurring organic chemicals often used for their aromatic qualities. Terpenoids are also used in pharmaceuticals and flavoring. Several terpenoids are substrates for plant Cytochrome P450. ([266][267], Daniel)

Threshold Stimulation - the level of stimulation necessary to activate a toggle switch. (Discovering Geneomics, Proteomics, & Bioinformatics p. 433 David)

transcription factors - a protein that binds to a specific sequence of DNA and regulates transcription (and thus expression). In genomics this concept is important because it means you can get more variation with less genes (different combinations can be on or off). ([268], Leland)

toxicogenomics - a subdiscipline of genomics that deals with gene and protein activity in order to determine how organisms respond to toxins in the environment. This has important implications for research concerning the effects of toxins on genetic material, and how that affects the organism in question (MedTerms, WebDefinitions Claudia).

transcriptome - the set of all mRNA molecules transcribed from a genome [269] (Megan)

transferase - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [270] (Matt)

transgenesis - The introduction of exogenous DNA into a cell. Typically, this term refers to the introduction of a gene into an embryo or other eukaryotic cell. [271] (Erich).

Transmembrane Domain - the portion of a membrane protein which passes through the phospholipid bilayer. This section is typically hydrophobic and approximately 20bp in length. (Discovering Geneomics, Proteomics, & Bioinformatics p. 433 David)

transmembrane helix - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [272](Mary)

transposons / transposable elements - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [273](Samantha)

transposon mutagenesis - a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene (transposon mutagenesis Pallavi)

'trans-splicing '- fragmented exon sequences fuse to form a mature species of mRNA. This process results in fusion mRNA (8 Pallavi).

tRNADB-CE - The tRNA gene database curated by experts is composed of 927 complete and 1301 draft genomes of Bacteria and Archaea, 171 complete virus genomes, 121 complete chloroplast genomes, 12 complete eukaryote (Plant and Fungi) genomes as of 2011. Inputs in this database were generated using tRNAscan-SE, a computer program widely used for tRNA gene searches, in combination with ARAGORN and tRNAfinder. [274](Puneet)

tRNA scan-SE - Supported by the Lowe lab, tRNA scan-SE is an online tool used to identify tRNA genes in DNA sequences. tRNA scan-SE can identify 99-100% of tRNA genes in a DNA sequence giving less than one false positive per 15 gigabases. [275] (Puneet)

tRNA splicing endonuclease - an enzyme that cleaves intervening sequences of precursor tRNA. [276] (Peter)

Tribe - Taxonomic term that ranks between a subfamily and a genus Wikipedia (Lexi)

type strain - an isolated sample of an organism that acts as the reference point for defining that species (Lecture, Olivia)

two-dimensional gel electrophoresis - proteomics method that separates protein based on two criteria ("dimensions); isoelectric point and molecular weight. Spots from gels can be sequenced. (Discovering Geneomics, Proteomics, & Bioinformatics p. 434 David)

two-component systems - a complex of paired proteins used by cells for environmental detection. One element senses outside stimuli whilst the other is involved in intracellular signalling. (Discovering Geneomics, Proteomics, & Bioinformatics p. 434 David)


ubiquitin - a small regulatory protein that is attached to other proteins. Often deactivates them by changing their structure, and often tags them for degradation by the proteasome. [277] (Erich)

ultracontig- similar to a supercontig, an ultracontig refers to an ordered set of scaffolds based on evidence such as ESTs, SNPs and other genetic markers [278] (Stewart Dalton)

Uniprot - an international database of protein structure, sequence, and function. [279] (David)

upregulation - when a cell increases production of a particular cellular component, such as a protein, in response to a stimulus. [280] (Chadinha)

UTR - the UTR refers to the portion of the gene that is not translated by the ribosome. An example would be introns in eukaryotic cells which are spliced out before translation. [281] (Chadinha)

upstream - a direction on strands of nucleic acid relative to the start site. For example, the start codon is upstream of the stop codon. (Discovering Geneomics, Proteomics, & Bioinformatics, p. 434 David)


Variable number tandem repeats (VNTRs)- locations in the genome that exhibit base pairs that occur in tandem repeats. Number of repeats varies between individuals. The collection of VNTRs across the genome is often referred to as one's genetic fingerprint, because the combination of tandem copy numbers is unique for each person. ([282], Shamita)

Vertical gene transfer-the transmission or absorption of genetic material that is associated with sexual reproduction and, thus, acknowledges species-specific boundaries (6 Pallavi)

Vitis vinifera - also known as grapes or grapevines and are dicotyledonous plants and close relative to the blueberry, both being in theplant family Vitaceae. Ranging from purple to red to black, grapevines are commonly used to make wine, and have been shown to exhibit antioxidant properties. [283], [284] (Lauren)

Vaccinium - A genus of shrubs in the family Ericaceae. Its fruits include the cranberry, blueberry, bilberry , lingonberry, and huckleberry; these fruits have health promoting properties most likely due to their athnocynanin, flavonoid, and polyproponoid content. Typically, they grow in acidic soil [Wikipedia article] (Puneet)

Vaccinium corymbosum - the Northern highbush blueberry plant, native to eastern North America. This genome was the basis of the Spring Genomics 2011 class. ([285], Leland)

Vaccinium macrocarpon - Cranberry, a fruit closely related to the blueberry belonging to the subgenus (or, section) Ocycoccos of Vaccinium (Lexi).


whole genome dupliction(WGD) - an evolutionary event characterized by the duplication of a species entire genome, that allows for gene innovation and genome diversity. Duplication events contribute to paralogs within species and orthologs between species that allow for the tracing of evolutionary relationships. [286] (Lauren)

whole genome shotgun sequencing - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [287](Samantha)

Western Blot - proteins that have been separated by size using SDS-PAGE, transferred to a special type of paper (called a membrane), and probed with an antibody. Western blots are used to determine molecular weight, tissue distribution, and relative amount of the protein of interest. (Discovering Geneomics, Proteomics, & Bioinformatics, p. 434. David)


xenobiotic - a substance that is found within an organism that is not normally produced or expected to be found within that organism [288] (Megan)

xenolog - homologs that are created by horizontal gene transfer between two different species [289] (Matt)


Yeast Artificial Chromosome (YAC) - an artificial chromosome used as a vector to clone or hold (as in a DNA library) DNA inserts from 150 kb to 1.5 Mb in size. (Discovering Geneomics, Proteomics, & Bioinformatics pg 50, Leland)

Yeast two-hybrid (Y2H) - proteomics method to detect protein-protein interactions. Variations of this method have been produced for mammalian and bacterial cells as well. THe protein of interest is used as a bait to "fish out" proteins that bind to it (called prey). (Discovering Geneomics, Proteomics, & Bioinformatics p. 434. David)