https://gcat.davidson.edu/GcatWiki/api.php?action=feedcontributions&user=Lavoss&feedformat=atomGcatWiki - User contributions [en]2024-03-28T13:31:17ZUser contributionsMediaWiki 1.28.2https://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=7196Halorhabdus utahensis Genome2008-11-11T15:42:32Z<p>Lavoss: /* Pathway Tutorials */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
__NOTOC__<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br> *[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18261238 RAST Publication in PubMed]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br><br />
*[http://wishart.biology.ualberta.ca/basys/cache/135af8726ad6f61ec4c5f1e9c4d139ac/index.html BASYs]<br><br><br />
*[http://gcat.davidson.edu/Registry/compare/ Pairwise comparisons of All Three Annotations]<br />
<br />
<br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI_5contigs.txt JGI Full genome, 5 separate contigs & 3.1 Mbp, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.txt JGI gene DNA sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.xls JGI gene annotations, Excel] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_proteins.txt JGI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_merged.txt CJVI Full genome, 5 contigs fused, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_ORFs.txt CJVI gene sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_proteins.txt CJVI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/GeneLengths.xls 3-way comparison, Excel] <br><br />
[[Venn_diagrams]] Venn diagram of 3-way comparison<br />
<br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
*[http://www.bio.davidson.edu/courses/genomics/2008/Win/ec/ Search EC number in RAST, JGI or Manatee] <br><br />
*[http://gcat.davidson.edu/Wideloache/Webfiles/ecNumBlast.html Blast an EC number against the H. utahensis genome]<br><br />
*[http://gcat.davidson.edu/Wideloache/Webfiles/AnnotationSearcher.html Perform a text-based search of the Rast, JGI, and Manatee protein calls]<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
* How do the 3 systems compare when one gene is called hypothetical and the other calls it a functional protein? How can they vary and who is getting it closer to correct (however you define that, possibly by date of matched entry: Pallavi and Mary)<br />
* Why did one system call a gene when the other two did not? (Matt and Lara)<br />
* How do the 3 sites compare for ease of use? What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working? (Samantha and Nick)<br />
* Where is the origin of replication and did the 3 systems attempt to identify this?<br />
* Did the 3 systems utilize Shine-Dalgarno sequences to help them call start codons? Did they utilize our species's consensus Shine-Dalgarno? (Peter)<br />
* We need to fill in the [[Venn diagrams]] for our 3-way comparison. Let's compare the size of ORFs and generate a [[Gene Length Histograms|graph comparing the distributions]] for all 3. (Max and Will - they also take requests). <br />
<br />
<hr><br />
=Our Favorites=<br />
== My favorite genes==<br />
*Pallavi - Monooxygenase vs. Peroxiredoxin<br />
<br />
*Mary - JGI gene 2500588521 (922976...924046) [[Media:My favorite gene.ppt]]<br />
<br />
*Max - [http://app.sliderocket.com/app/FullPlayer.aspx?id=f2058b94-845f-4a11-94eb-142f251a7fea JGI gene 2500587636 (2-1849)]<br />
<br />
*Samantha - JGI gene 2500575882 (80504-80878) [[Media:Earl.ppt]]<br />
<br />
*Nick - JGI gene 2300587691 (69942...72866) [[Media:Gene presentation.ppt]]<br />
<br />
*Will - JGI gene 2500590430 (2847205..2854335)<br />
<br />
*Jay - JGI gene 2500588397 (806410..807321) [http://www.bio.davidson.edu/courses/genomics/2008/McNair/Fav_Gene/FavoriteGenePresentation.pptx Co/Zn/Cd PowerPoint]<br />
<br />
*Matt - Transcriptional Regulator nrdR (3109722..3110204 + 7274..7765)<br />
<br />
*Peter - tRNA intron endonuclease [[Media:TRNAtrpintronendonuclease.ppt]]<br />
<br />
*Laura - 16S Small ribosomal subunit, JGI gene 2500590728 (2397347..2398825)<br />
<br />
== My Favorite Pathways==<br />
Pallavi - Carbohydrate Metabolism, specifically glycolysis<br />
<br />
Jay - Membrane Transport<br />
<br />
Will - Signal Transduction<br />
<br />
Max -energy<br />
<br />
Samantha - Purine Metabolism!!!<br />
<br />
Laura - Amino Acid Biosynthesis<br />
<br />
''Suggestions by Kjeld''<br><br />
'''[[Cellulase]]''' by Pallavi<br><br />
I think it would be very interesting to look for genes involved in cellulose degradation: endocellulases, exocellolases (=cellobiohydrolases) and b-glucosidases.<br />
Many cellulose degrades produce a range of each type. A cellolulyic system able to function at 4.6 M of NaCl is an interesting one. We either did not observed (or look for cellulose degradation). However, these systems are normally inducible and you need to test several substrates and inducers. It would be nice to have a compilation of putative “cellulase” genes.<br />
There are several good recent reviews on cellulases (also mentioning E.C. numbers and enzyme families) that your students could consult.<br />
<br />
'''[[Chitinase]]''' by Matt<br><br />
Apparently you detected a chitinase but according to our records it does not gorw on N-acetyl-glucosamine which is somewhat strange. It grows on glucose though. <br />
<br />
'''[[Lipases]]''' by Mary<br><br />
Lipases (/esterases) would also be interesting to look for – some lipases have important industrial applications.<br />
<br />
'''[[Amylases]]''' by Samantha<br><br />
We did not observed growth on starch. Did you find any “amylase-coding genes”?<br />
<br />
'''[[Xylose (glucose) isomerase)]]''' by Nick<br><br />
An enzyme of great commercial value. <br />
<br />
'''[[Amino acids]]''' lead by Laura and assisted by Max, Jay, Nick and Samantha<br><br />
According to our records AX-2 is able to grow in a “defined medium”. This is at variance with your “holes” for synthesis of amino acids. However, there could have been some “carry over” of amino acids when inoculating a culture grown in complex medium (e.g. containing yeast extract). However, we are normally aware of this problem and do repeated culturing to dilute out potential growth factors present in yeast extract.<br />
<br />
'''[[Proteases]]''' by Peter<br><br />
We did not detect protease activity – albeit only checking a few substrates.<br />
<br />
'''[[Protein Export]]''' by Malcolm <br><br />
We need to know how these proteins might reach outside the cell which is where the food would be and thus the digestive enzymes or importers need to reach the outside world or the cell membrane.<br />
<br />
= Student-created tutorials: =<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache - [http://www.bio.davidson.edu/courses/genomics/2008/DeLoache/BioPerlTutorial/BioPerl.htm BioPerl Installation] <br><br />
# Max Win - [http://www.bio.davidson.edu/courses/genomics/2008/Win/perl.html Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)]<br><br />
# Pallavi - Conserved Domains Database (CDD) [[Media:CDDtutorial.doc]] <br><br />
# Mary - Protein Data Bank (PDB) [[Media:PDB Tutorial.doc]] <br><br />
# Laura Voss - Pfam Database [http://www.bio.davidson.edu/Courses/Bio343/Pfam_tutorial.doc Pfam Tutorial] <br><br />
# Samantha Simpson - [http://www.bio.davidson.edu/courses/genomics/2008/Simpson/Tutorial.html NCBI BLAST]<br><br />
# Peter Bakke - [[Media:ShineDalgarnoTutorial.doc]]<br><br />
# Jay McNair - [http://www.bio.davidson.edu/courses/genomics/2008/McNair/Origin_Tutorial/OriginTutorial.doc Origin of Replication Tutorial]<br><br />
# Nick Carney - Navigating the JGI Database [[Media:NavigatingJGItutorial.doc]]<br><br />
# Matt Lotz - SEED Viewer - [[Media:SEEDTutorial.doc]] <br><br />
== Pathway Tutorials==<br />
[http://www.pathguide.org/ Pathguide] - a possible source of tutorials and extensive information<br />
<br />
[http://www.bigre.ulb.ac.be/Users/didier/pathfinding/ Shortest Path Tool]<br />
<hr><br />
*Pallavi: I will compare RAST and KEGG in pathway annotations and use Glycolysis/Gluconeogenesis as my example: [[Media:Pallavitutorial.doc]]<br />
<br />
*Matt: WikiPathways<br />
<br />
*Mary: ENZYME<br />
<br />
*Samantha: [http://www.bio.davidson.edu/courses/genomics/2008/Simpson/Tutorial2.html How To Determine EC Numbers]<br><br />
<br />
*Nick: Cell Circuits<br />
<br />
*Max: KGML (how to edit the KEGG map)<br />
<br />
*Jay: SEED Scenario Paths (a tool to determine completeness of pathways)<br />
<br />
*Laura: Pathway Entrances and Exits<br />
<br />
=Glossary words (A - Z):=<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''Antisense (RNA or DNA)'''-a piece of DNA or RNA that binds to a complementary sequence of DNA or RNA. These segments of genetic material can be used to identify the existence of a disease gene and they can also be used to bind to specific DNA or mRNA sequences to inhibit their function ([http://biotech.fyicenter.com/glossary/Bioinformatics_Glossary.html 5] Pallavi).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
'''Cytogenetics'''-the study of normal and abnormal chromosomes. This involves studying the causes of chromosomal abnormalities and looking at the structure of chromosomes ([http://www.vivo.colostate.edu/hbooks/genetics/medgen/chromo/index.html 7] Pallavi).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
'''fusion mRNA'''-mRNA that results from the transcription of a gene after a chromosomal translocation event. This results in an mRNA sequence that comes from two different genes (Rowley and Blumenthal 2008 ''Science'' Pallavi)<br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene fusion'''-occurs when DNA segments of two different genes come together. Can result in hybrid proteins ([http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-G/gene_fusion.html 9] Pallavi)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''''Hox'' gene'''-a gene that contains a homeobox region that is involved in morphogenesis along the cranio-caudal body axis ([http://www.uprightape.net/UA_Glossary.html 4] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will). <br><br />
<br />
'''microsatellites'''-stretches of repetitive, short DNA segments that can be used to track the inheritance of certain traits within families ([http://www.clanlindsay.com/genetic_dna_glossary.htm 3] Pallavi)<br />
<br />
'''minisatellites'''-segments of DNA that can be used for individual identification (ex. DNA fingerprinting) or in determining relationships between people (ex. paternity cases) ([http://www.clanlindsay.com/genetic_dna_glossary.htm 2] Pallavi).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''PSORT''' - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. ([http://psort.ims.u-tokyo.ac.jp/form.html PSORT], Laura)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura) <br><br />
<br />
'''retropseudogenes'''-these are genes that have been reverse-transcribed from mRNA and the resulting DNA sequence is incorporated back into the genome. They are non-functional segments of DNA and can be distinguished from pseudogenes in that they do not have intron sequences. ([http://genome.cshlp.org/cgi/content/full/10/5/672 1] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''"Shadow enhancers"'''-secondary enhancers that are thought to be important for natural selection to occur in regulatory DNA segments. They evolve much faster than primary enhancers, which suggests that they are under fewer functional constraints (Wray and Babbit 2008 ''Science'' Pallavi)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''Trans-splicing'''-fragmented exon sequences fuse to form a mature species of mRNA. This process results in fusion mRNA ([http://www.representinggenes.org/Glossary.html 8] Pallavi).<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
'''Vertical gene transfer'''-the transmission or absorption of genetic material that is associated with sexual reproduction and, thus, acknowledges species-specific boundaries ([http://www.gmo-compass.org/eng/glossary/#G 6] Pallavi)<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=7188Halorhabdus utahensis Genome2008-11-11T15:25:40Z<p>Lavoss: /* My Favorite Pathways */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
__NOTOC__<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br> *[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18261238 RAST Publication in PubMed]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br><br />
*[http://wishart.biology.ualberta.ca/basys/cache/135af8726ad6f61ec4c5f1e9c4d139ac/index.html BASYs]<br><br><br />
*[http://gcat.davidson.edu/Registry/compare/ Pairwise comparisons of All Three Annotations]<br />
<br />
<br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI_5contigs.txt JGI Full genome, 5 separate contigs & 3.1 Mbp, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.txt JGI gene DNA sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.xls JGI gene annotations, Excel] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_proteins.txt JGI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_merged.txt CJVI Full genome, 5 contigs fused, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_ORFs.txt CJVI gene sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_proteins.txt CJVI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/GeneLengths.xls 3-way comparison, Excel] <br><br />
[[Venn_diagrams]] Venn diagram of 3-way comparison<br />
<br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
*[http://www.bio.davidson.edu/courses/genomics/2008/Win/ec/ Search EC number in RAST, JGI or Manatee] <br><br />
*[http://gcat.davidson.edu/Wideloache/Webfiles/ecNumBlast.html Blast an EC number against the H. utahensis genome]<br><br />
*[http://gcat.davidson.edu/Wideloache/Webfiles/AnnotationSearcher.html Perform a text-based search of the Rast, JGI, and Manatee protein calls]<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
* How do the 3 systems compare when one gene is called hypothetical and the other calls it a functional protein? How can they vary and who is getting it closer to correct (however you define that, possibly by date of matched entry: Pallavi and Mary)<br />
* Why did one system call a gene when the other two did not? (Matt and Lara)<br />
* How do the 3 sites compare for ease of use? What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working? (Samantha and Nick)<br />
* Where is the origin of replication and did the 3 systems attempt to identify this?<br />
* Did the 3 systems utilize Shine-Dalgarno sequences to help them call start codons? Did they utilize our species's consensus Shine-Dalgarno? (Peter)<br />
* We need to fill in the [[Venn diagrams]] for our 3-way comparison. Let's compare the size of ORFs and generate a [[Gene Length Histograms|graph comparing the distributions]] for all 3. (Max and Will - they also take requests). <br />
<br />
<hr><br />
=Our Favorites=<br />
== My favorite genes==<br />
*Pallavi - Monooxygenase vs. Peroxiredoxin<br />
<br />
*Mary - JGI gene 2500588521 (922976...924046) [[Media:My favorite gene.ppt]]<br />
<br />
*Max - [http://app.sliderocket.com/app/FullPlayer.aspx?id=f2058b94-845f-4a11-94eb-142f251a7fea JGI gene 2500587636 (2-1849)]<br />
<br />
*Samantha - JGI gene 2500575882 (80504-80878) [[Media:Earl.ppt]]<br />
<br />
*Nick - JGI gene 2300587691 (69942...72866) [[Media:Gene presentation.ppt]]<br />
<br />
*Will - JGI gene 2500590430 (2847205..2854335)<br />
<br />
*Jay - JGI gene 2500588397 (806410..807321) [http://www.bio.davidson.edu/courses/genomics/2008/McNair/Fav_Gene/FavoriteGenePresentation.pptx Co/Zn/Cd PowerPoint]<br />
<br />
*Matt - Transcriptional Regulator nrdR (3109722..3110204 + 7274..7765)<br />
<br />
*Peter - tRNA intron endonuclease [[Media:TRNAtrpintronendonuclease.ppt]]<br />
<br />
*Laura - 16S Small ribosomal subunit, JGI gene 2500590728 (2397347..2398825)<br />
<br />
== My Favorite Pathways==<br />
Pallavi - Carbohydrate Metabolism<br />
<br />
Jay - Membrane Transport<br />
<br />
Will - Signal Transduction<br />
<br />
Max -energy<br />
<br />
Samantha - Purine Metabolism!!!<br />
<br />
Laura - Amino Acid Biosynthesis<br />
<br />
''Suggestions by Kjeld''<br><br />
'''[[Cellulase]]''' by Pallavi<br><br />
I think it would be very interesting to look for genes involved in cellulose degradation: endocellulases, exocellolases (=cellobiohydrolases) and b-glucosidases.<br />
Many cellulose degrades produce a range of each type. A cellolulyic system able to function at 4.6 M of NaCl is an interesting one. We either did not observed (or look for cellulose degradation). However, these systems are normally inducible and you need to test several substrates and inducers. It would be nice to have a compilation of putative “cellulase” genes.<br />
There are several good recent reviews on cellulases (also mentioning E.C. numbers and enzyme families) that your students could consult.<br />
<br />
'''[[Chitinase]]''' by Matt<br><br />
Apparently you detected a chitinase but according to our records it does not gorw on N-acetyl-glucosamine which is somewhat strange. It grows on glucose though. <br />
<br />
'''[[Lipases]]''' by Mary<br><br />
Lipases (/esterases) would also be interesting to look for – some lipases have important industrial applications.<br />
<br />
'''[[Amylases]]''' by Samantha<br><br />
We did not observed growth on starch. Did you find any “amylase-coding genes”?<br />
<br />
'''[[Xylose (glucose) isomerase)]]''' by Nick<br><br />
An enzyme of great commercial value. <br />
<br />
'''[[Amino acids]]''' lead by Laura and assisted by Max, Jay, Nick and Samantha<br><br />
According to our records AX-2 is able to grow in a “defined medium”. This is at variance with your “holes” for synthesis of amino acids. However, there could have been some “carry over” of amino acids when inoculating a culture grown in complex medium (e.g. containing yeast extract). However, we are normally aware of this problem and do repeated culturing to dilute out potential growth factors present in yeast extract.<br />
<br />
'''[[Proteases]]''' by Peter<br><br />
We did not detect protease activity – albeit only checking a few substrates.<br />
<br />
'''[[Protein Export]]''' by Malcolm <br><br />
We need to know how these proteins might reach outside the cell which is where the food would be and thus the digestive enzymes or importers need to reach the outside world or the cell membrane.<br />
<br />
= Student-created tutorials: =<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache - [http://www.bio.davidson.edu/courses/genomics/2008/DeLoache/BioPerlTutorial/BioPerl.htm BioPerl Installation] <br><br />
# Max Win - [http://www.bio.davidson.edu/courses/genomics/2008/Win/perl.html Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)]<br><br />
# Pallavi - Conserved Domains Database (CDD) [[Media:CDDtutorial.doc]] <br><br />
# Mary - Protein Data Bank (PDB) [[Media:PDB Tutorial.doc]] <br><br />
# Laura Voss - Pfam Database [http://www.bio.davidson.edu/Courses/Bio343/Pfam_tutorial.doc Pfam Tutorial] <br><br />
# Samantha Simpson - [http://www.bio.davidson.edu/courses/genomics/2008/Simpson/Tutorial.html NCBI BLAST]<br><br />
# Peter Bakke - [[Media:ShineDalgarnoTutorial.doc]]<br><br />
# Jay McNair - [http://www.bio.davidson.edu/courses/genomics/2008/McNair/Origin_Tutorial/OriginTutorial.doc Origin of Replication Tutorial]<br><br />
# Nick Carney - Navigating the JGI Database [[Media:NavigatingJGItutorial.doc]]<br><br />
# Matt Lotz - SEED Viewer - [[Media:SEEDTutorial.doc]] <br><br />
== Pathway Tutorials==<br />
[http://www.pathguide.org/ Pathguide] - a possible source of tutorials and extensive information<br />
<br />
[http://www.bigre.ulb.ac.be/Users/didier/pathfinding/ Shortest Path Tool]<br />
<hr><br />
*Pallavi: I will compare RAST and KEGG in pathway annotations and use Glycolysis/Gluconeogenesis as my example<br />
<br />
*Matt: WikiPathways<br />
<br />
*Mary: ENZYME<br />
<br />
*Samantha: [http://www.bio.davidson.edu/courses/genomics/2008/Simpson/Tutorial2.html How To Determine EC Numbers]<br><br />
<br />
*Nick: Cell Circuits<br />
<br />
*Max: KGML (how to edit the KEGG map)<br />
<br />
=Glossary words (A - Z):=<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''Antisense (RNA or DNA)'''-a piece of DNA or RNA that binds to a complementary sequence of DNA or RNA. These segments of genetic material can be used to identify the existence of a disease gene and they can also be used to bind to specific DNA or mRNA sequences to inhibit their function ([http://biotech.fyicenter.com/glossary/Bioinformatics_Glossary.html 5] Pallavi).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
'''Cytogenetics'''-the study of normal and abnormal chromosomes. This involves studying the causes of chromosomal abnormalities and looking at the structure of chromosomes ([http://www.vivo.colostate.edu/hbooks/genetics/medgen/chromo/index.html 7] Pallavi).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
'''fusion mRNA'''-mRNA that results from the transcription of a gene after a chromosomal translocation event. This results in an mRNA sequence that comes from two different genes (Rowley and Blumenthal 2008 ''Science'' Pallavi)<br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene fusion'''-occurs when DNA segments of two different genes come together. Can result in hybrid proteins ([http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-G/gene_fusion.html 9] Pallavi)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''''Hox'' gene'''-a gene that contains a homeobox region that is involved in morphogenesis along the cranio-caudal body axis ([http://www.uprightape.net/UA_Glossary.html 4] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will). <br><br />
<br />
'''microsatellites'''-stretches of repetitive, short DNA segments that can be used to track the inheritance of certain traits within families ([http://www.clanlindsay.com/genetic_dna_glossary.htm 3] Pallavi)<br />
<br />
'''minisatellites'''-segments of DNA that can be used for individual identification (ex. DNA fingerprinting) or in determining relationships between people (ex. paternity cases) ([http://www.clanlindsay.com/genetic_dna_glossary.htm 2] Pallavi).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''PSORT''' - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. ([http://psort.ims.u-tokyo.ac.jp/form.html PSORT], Laura)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura) <br><br />
<br />
'''retropseudogenes'''-these are genes that have been reverse-transcribed from mRNA and the resulting DNA sequence is incorporated back into the genome. They are non-functional segments of DNA and can be distinguished from pseudogenes in that they do not have intron sequences. ([http://genome.cshlp.org/cgi/content/full/10/5/672 1] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''"Shadow enhancers"'''-secondary enhancers that are thought to be important for natural selection to occur in regulatory DNA segments. They evolve much faster than primary enhancers, which suggests that they are under fewer functional constraints (Wray and Babbit 2008 ''Science'' Pallavi)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''Trans-splicing'''-fragmented exon sequences fuse to form a mature species of mRNA. This process results in fusion mRNA ([http://www.representinggenes.org/Glossary.html 8] Pallavi).<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
'''Vertical gene transfer'''-the transmission or absorption of genetic material that is associated with sexual reproduction and, thus, acknowledges species-specific boundaries ([http://www.gmo-compass.org/eng/glossary/#G 6] Pallavi)<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6787Halorhabdus utahensis Genome2008-10-09T10:36:24Z<p>Lavoss: /* My favorite genes */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
__NOTOC__<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br> *[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18261238 RAST Publication in PubMed]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br><br><br />
*[http://gcat.davidson.edu/Registry/compare/ Pairwise comparisons of All Three Annotations]<br />
<br />
<br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI_5contigs.txt JGI Full genome, 5 separate contigs & 3.1 Mbp, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.txt JGI gene DNA sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.xls JGI gene annotations, Excel] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_proteins.txt JGI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_merged.txt CJVI Full genome, 5 contigs fused, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_ORFs.txt CJVI gene sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_proteins.txt CJVI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/GeneLengths.xls 3-way comparison, Excel] <br><br />
[[Venn_diagrams]] Venn diagram of 3-way comparison<br />
<br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache - [http://www.bio.davidson.edu/courses/genomics/2008/DeLoache/BioPerlTutorial/BioPerl.htm BioPerl Installation] <br><br />
# Max Win - [http://www.bio.davidson.edu/courses/genomics/2008/Win/perl.html Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)]<br><br />
# Pallavi - Conserved Domains Database (CDD) [[Media:CDDtutorial.doc]] <br><br />
# Mary - Protein Data Bank (PDB) [[Media:PDB Tutorial.doc]] <br><br />
# Laura Voss - Pfam Database [http://www.bio.davidson.edu/Courses/Bio343/Pfam_tutorial.doc Pfam Tutorial] <br><br />
# Samantha Simpson - [http://www.bio.davidson.edu/courses/genomics/2008/Simpson/Tutorial.html NCBI BLAST]<br><br />
# Peter Bakke - [[Media:ShineDalgarnoTutorial.doc]]<br><br />
# Jay McNair - [http://www.bio.davidson.edu/courses/genomics/2008/McNair/OriginTutorial.doc Origin of Replication Tutorial]<br><br />
# Nick Carney - Navigating the JGI Database [[Media:NavigatingJGItutorial.doc]]<br><br />
# Matt Lotz - SEED Viewer - [[Media:SEEDTutorial.doc]] <br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
* How do the 3 systems compare when one gene is called hypothetical and the other calls it a functional protein? How can they vary and who is getting it closer to correct (however you define that, possibly by date of matched entry: Pallavi and Mary)<br />
* Why did one system call a gene when the other two did not? (Matt and Lara)<br />
* How do the 3 sites compare for ease of use? What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working? (Samantha and Nick)<br />
* Where is the origin of replication and did the 3 systems attempt to identify this?<br />
* Did the 3 systems utilize Shine-Dalgarno sequences to help them call start codons? Did they utilize our species's consensus Shine-Dalgarno? (Peter)<br />
* We need to fill in the [[Venn diagrams]] for our 3-way comparison. Let's compare the size of ORFs and generate a [[Gene Length Histograms|graph comparing the distributions]] for all 3. (Max and Will - they also take requests). <br />
<br />
<br />
<br />
<hr><br />
== My favorite genes==<br />
Pallavi - Monooxygenase vs. Peroxiredoxin<br />
<br />
Mary - JGI gene 2500588521 (922976...924046)<br />
<br />
Max - JGI gene 2500587636 (2-1849)<br />
<br />
Samantha - JGI gene 2500575882 (80504-80878)<br />
<br />
Nick - JGI gene 2300587691 (69942...72866)<br />
<br />
Will - JGI gene 2500590430 (2847205..2854335)<br />
<br />
Jay - JGI gene 2500588397 (806410..807321) [http://www.bio.davidson.edu/courses/genomics/2008/McNair/FavoriteGenePresentation.pptx Co/Zn/Cd PowerPoint]<br />
<br />
Matt - Transcriptional Regulator nrdR (3109722..3110204 + 7274..7765)<br />
<br />
Peter - tRNA intron endonuclease [[Media:TRNAtrpintronendonuclease.ppt]]<br />
<br />
Laura - 16S Small ribosomal subunit, JGI gene 2500590728 (2397347..2398825)<br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''Antisense (RNA or DNA)'''-a piece of DNA or RNA that binds to a complementary sequence of DNA or RNA. These segments of genetic material can be used to identify the existence of a disease gene and they can also be used to bind to specific DNA or mRNA sequences to inhibit their function ([http://biotech.fyicenter.com/glossary/Bioinformatics_Glossary.html 5] Pallavi).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
'''Cytogenetics'''-the study of normal and abnormal chromosomes. This involves studying the causes of chromosomal abnormalities and looking at the structure of chromosomes ([http://www.vivo.colostate.edu/hbooks/genetics/medgen/chromo/index.html 7] Pallavi).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
'''fusion mRNA'''-mRNA that results from the transcription of a gene after a chromosomal translocation event. This results in an mRNA sequence that comes from two different genes (Rowley and Blumenthal 2008 ''Science'' Pallavi)<br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene fusion'''-occurs when DNA segments of two different genes come together. Can result in hybrid proteins ([http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-G/gene_fusion.html 9] Pallavi)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''''Hox'' gene'''-a gene that contains a homeobox region that is involved in morphogenesis along the cranio-caudal body axis ([http://www.uprightape.net/UA_Glossary.html 4] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will). <br><br />
<br />
'''microsatellites'''-stretches of repetitive, short DNA segments that can be used to track the inheritance of certain traits within families ([http://www.clanlindsay.com/genetic_dna_glossary.htm 3] Pallavi)<br />
<br />
'''minisatellites'''-segments of DNA that can be used for individual identification (ex. DNA fingerprinting) or in determining relationships between people (ex. paternity cases) ([http://www.clanlindsay.com/genetic_dna_glossary.htm 2] Pallavi).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''PSORT''' - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. ([http://psort.ims.u-tokyo.ac.jp/form.html PSORT], Laura)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura) <br><br />
<br />
'''retropseudogenes'''-these are genes that have been reverse-transcribed from mRNA and the resulting DNA sequence is incorporated back into the genome. They are non-functional segments of DNA and can be distinguished from pseudogenes in that they do not have intron sequences. ([http://genome.cshlp.org/cgi/content/full/10/5/672 1] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''"Shadow enhancers"'''-secondary enhancers that are thought to be important for natural selection to occur in regulatory DNA segments. They evolve much faster than primary enhancers, which suggests that they are under fewer functional constraints (Wray and Babbit 2008 ''Science'' Pallavi)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''Trans-splicing'''-fragmented exon sequences fuse to form a mature species of mRNA. This process results in fusion mRNA ([http://www.representinggenes.org/Glossary.html 8] Pallavi).<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
'''Vertical gene transfer'''-the transmission or absorption of genetic material that is associated with sexual reproduction and, thus, acknowledges species-specific boundaries ([http://www.gmo-compass.org/eng/glossary/#G 6] Pallavi)<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6765Halorhabdus utahensis Genome2008-10-08T09:25:51Z<p>Lavoss: /* Tutorials for Annotating Genomes */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
__NOTOC__<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br> *[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18261238 RAST Publication in PubMed]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br><br><br />
*[http://gcat.davidson.edu/Registry/compare/ Pairwise comparisons of All Three Annotations]<br />
<br />
<br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI_5contigs.txt JGI Full genome, 5 separate contigs & 3.1 Mbp, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.txt JGI gene DNA sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.xls JGI gene annotations, Excel] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_proteins.txt JGI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_merged.txt CJVI Full genome, 5 contigs fused, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_ORFs.txt CJVI gene sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_proteins.txt CJVI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/GeneLengths.xls 3-way comparison, Excel] <br><br />
[[Venn_diagrams]] Venn diagram of 3-way comparison<br />
<br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- [http://www.bio.davidson.edu/courses/genomics/2008/DeLoache/BioPerlTutorial/BioPerl.htm BioPerl Installation] <br><br />
# Max Win- [http://www.bio.davidson.edu/courses/genomics/2008/Win/perl.html Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)]<br><br />
<br />
# Pallavi-Conserved Domains Database (CDD) [[Media:CDDtutorial.doc]] <br><br />
# Mary- Protein Data Bank (PDB) [[Media:PDB Tutorial.doc]] <br><br />
# Laura Voss - Pfam Database [[Media:Pfam_tutorial.doc]] <br><br />
# Samantha Simpson - [[http://www.bio.davidson.edu/courses/genomics/2008/Simpson/Tutorial.html NCBI BLAST]]<br><br />
# Peter Bakke - [[Media:ShineDalgarnoTutorial.doc]]<br><br />
# Jay McNair - [http://www.bio.davidson.edu/courses/genomics/2008/McNair/OriginTutorial.doc Origin of Replication Tutorial]<br><br />
# Nick Carney - Navigating the JGI Database [[Media:NavigatingJGItutorial.doc]]<br><br />
# Matt Lotz - SEED Viewer - [[Media:SEEDTutorial.doc]] <br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
* How do the 3 systems compare when one gene is called hypothetical and the other calls it a functional protein? How can they vary and who is getting it closer to correct (however you define that, possibly by date of matched entry: Pallavi and Mary)<br />
* Why did one system call a gene when the other two did not? (Matt and Lara)<br />
* How do the 3 sites compare for ease of use? What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working? (Samantha and Nick)<br />
* Where is the origin of replication and did the 3 systems attempt to identify this?<br />
* Did the 3 systems utilize Shine-Dalgarno sequences to help them call start codons? Did they utilize our species's consensus Shine-Dalgarno? (Peter)<br />
* We need to fill in the [[Venn diagrams]] for our 3-way comparison. Let's compare the size of ORFs and generate a [[Gene Length Histograms|graph comparing the distributions]] for all 3. (Max and Will - they also take requests). <br />
<br />
<br />
<br />
<hr><br />
== My favorite genes==<br />
Pallavi-Monooxygenase vs. Peroxiredoxin<br />
<br />
Mary- JGI gene 2500588521 (922976...924046)<br />
<br />
Max - JGI gene 2500587636 (2-1849)<br />
<br />
Samantha - JGI gene 2500575882 (80504-80878)<br />
<br />
Nick - JGI gene 2300587691 (69942...72866)<br />
<br />
Will - JGI gene 2500590430 (2847205..2854335)<br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''Antisense (RNA or DNA)'''-a piece of DNA or RNA that binds to a complementary sequence of DNA or RNA. These segments of genetic material can be used to identify the existence of a disease gene and they can also be used to bind to specific DNA or mRNA sequences to inhibit their function ([http://biotech.fyicenter.com/glossary/Bioinformatics_Glossary.html 5] Pallavi).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
'''Cytogenetics'''-the study of normal and abnormal chromosomes. This involves studying the causes of chromosomal abnormalities and looking at the structure of chromosomes ([http://www.vivo.colostate.edu/hbooks/genetics/medgen/chromo/index.html 7] Pallavi).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
'''fusion mRNA'''-mRNA that results from the transcription of a gene after a chromosomal translocation event. This results in an mRNA sequence that comes from two different genes (Rowley and Blumenthal 2008 ''Science'' Pallavi)<br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene fusion'''-occurs when DNA segments of two different genes come together. Can result in hybrid proteins ([http://www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-G/gene_fusion.html 9] Pallavi)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''''Hox'' gene'''-a gene that contains a homeobox region that is involved in morphogenesis along the cranio-caudal body axis ([http://www.uprightape.net/UA_Glossary.html 4] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will). <br><br />
<br />
'''microsatellites'''-stretches of repetitive, short DNA segments that can be used to track the inheritance of certain traits within families ([http://www.clanlindsay.com/genetic_dna_glossary.htm 3] Pallavi)<br />
<br />
'''minisatellites'''-segments of DNA that can be used for individual identification (ex. DNA fingerprinting) or in determining relationships between people (ex. paternity cases) ([http://www.clanlindsay.com/genetic_dna_glossary.htm 2] Pallavi).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''PSORT''' - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. ([http://psort.ims.u-tokyo.ac.jp/form.html PSORT], Laura)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura) <br><br />
<br />
'''retropseudogenes'''-these are genes that have been reverse-transcribed from mRNA and the resulting DNA sequence is incorporated back into the genome. They are non-functional segments of DNA and can be distinguished from pseudogenes in that they do not have intron sequences. ([http://genome.cshlp.org/cgi/content/full/10/5/672 1] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''"Shadow enhancers"'''-secondary enhancers that are thought to be important for natural selection to occur in regulatory DNA segments. They evolve much faster than primary enhancers, which suggests that they are under fewer functional constraints (Wray and Babbit 2008 ''Science'' Pallavi)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''Trans-splicing'''-fragmented exon sequences fuse to form a mature species of mRNA. This process results in fusion mRNA ([http://www.representinggenes.org/Glossary.html 8] Pallavi).<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
'''Vertical gene transfer'''-the transmission or absorption of genetic material that is associated with sexual reproduction and, thus, acknowledges species-specific boundaries ([http://www.gmo-compass.org/eng/glossary/#G 6] Pallavi)<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6680Halorhabdus utahensis Genome2008-10-02T14:07:35Z<p>Lavoss: /* S */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br> *[http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18261238 RAST Publication in PubMed]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br><br><br />
*[http://gcat.davidson.edu/Registry/compare/ Pairwise comparisons of All Three Annotations]<br />
<br />
<br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI_5contigs.txt JGI Full genome, 5 separate contigs & 3.1 Mbp, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.txt JGI gene DNA sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_genes.xls JGI gene annotations, Excel] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/JGI2500575004_proteins.txt JGI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_merged.txt CJVI Full genome, 5 contigs fused, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_ORFs.txt CJVI gene sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/h_utahensis_proteins.txt CJVI protein sequences, FASTA] <br><br />
[http://www.bio.davidson.edu/Courses/Bio343/sequences/GeneLengths.xls 3-way comparison, Excel] <br><br />
[[Venn_diagrams]] Venn diagram of 3-way comparison<br />
<br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
# Jay McNair - How to determine the origin of replication<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
* How do the 3 systems compare when one gene is called hypothetical and the other calls it a functional protein? How can they vary and who is getting it closer to correct (however you define that, possibly by date of matched entry: Pallavi and Mary)<br />
* Why did one system call a gene when the other two did not? (Matt and Lara)<br />
* How do the 3 sites compare for ease of use? What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working? (Samantha and Nick)<br />
* Where is the origin of replication and did the 3 systems attempt to identify this?<br />
* Did the 3 systems utilize Shine-Dalgarno sequences to help them call start codons? Did they utilize our species's consensus Shine-Dalgarno? (Peter)<br />
* We need to fill in the [[Venn diagrams]] for our 3-way comparison. Let's compare the size of ORFs and generate a [[Gene Length Histograms|graph comparing the distributions]] for all 3. (Max and Will - they also take requests). <br />
<br />
<br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''PSORT''' - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. ([http://psort.ims.u-tokyo.ac.jp/form.html PSORT], Laura)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is ccGGAGGt.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6606Halorhabdus utahensis Genome2008-09-25T03:26:57Z<p>Lavoss: /* P */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''PSORT''' - a prediction server that judges where a mature protein could be in the cell, based on its transmembrane domains, its predicted mature amino acid composition, and its signal sequences. ([http://psort.ims.u-tokyo.ac.jp/form.html PSORT], Laura)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6605Halorhabdus utahensis Genome2008-09-25T03:09:20Z<p>Lavoss: /* S */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''SignalP''' - a prediction server that judges whether or not a query protein is a signal peptide. SignalP measures each amino acid against the amino acid sequences of probable signal peptide matches and predicts the cleavage site of the signal peptide. ([http://www.cbs.dtu.dk/services/SignalP-3.0/output.php SignalP Output explained], Laura)<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6604Halorhabdus utahensis Genome2008-09-25T03:07:47Z<p>Lavoss: /* H */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''HMM Logo''' - a graphical representation of an HMM, detailing the possible amino acid sequences, the relative frequencies and probabilities of each amino acid in the sequence, the relative contribution each amino acid has to the overall protein family, and the charge or nature of the amino acids themselves. ([http://www.sanger.ac.uk/Software/analysis/logomat-m/help.shtml How to read HMM Logos, on Pfam], Laura)<br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6603Halorhabdus utahensis Genome2008-09-25T03:05:32Z<p>Lavoss: /* Q */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br><br />
<br />
'''query sequence''' - the sequence (whether amino acid or nucleotide) entered into a database’s search function and checked against the database entries. ([http://en.wikipedia.org/wiki/BLAST BLAST on Wikipedia], Laura)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6602Halorhabdus utahensis Genome2008-09-25T03:03:34Z<p>Lavoss: /* P */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''Pfam''' - a database for protein domain families that matches amino acid sequences or nucleotide sequences to the related group of proteins to which they most likely belong. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6601Halorhabdus utahensis Genome2008-09-25T03:01:36Z<p>Lavoss: /* R */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''residue (protein)''' - the remaining portion of an amino acid after a water molecule has been removed and it has been incorporated into a protein. Functional residues, referred to in Pfam, are the residues that perform some specific identifiable function or are part of a domain, and can be conserved across evolutionarily-related proteins. ([http://pfam.sanger.ac.uk/help Pfam Help], Laura)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6600Halorhabdus utahensis Genome2008-09-25T02:59:46Z<p>Lavoss: /* F */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acid sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''family (protein)''' - a group of evolutionarily-related proteins, often with one or several domains in common. Families are organized by domain overlap, structural/functional similarity, and sequence similarity. ([http://en.wikipedia.org/wiki/Protein_family Wikipedia article] and lecture, Laura)<br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6599Halorhabdus utahensis Genome2008-09-25T02:57:09Z<p>Lavoss: /* D */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''domain (protein)''' - the structural and functional groups of a protein, which can exist independently of the protein itself. Domains typically perform a specific function, such as binding to promoters or substrates, and many proteins can have one or several domains in common. Evolutionarily-linked proteins are more likely to have domains in common. Domains are used to organize proteins into families. ([http://en.wikipedia.org/wiki/Domain_(protein) Wikipedia article], Laura)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acide sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Halorhabdus_utahensis_Genome&diff=6598Halorhabdus utahensis Genome2008-09-25T02:55:31Z<p>Lavoss: /* H */</p>
<hr />
<div>This page will be used by Davidson College students in the [http://www.bio.davidson.edu/Courses/Bio343/LabMethods.html Genomics Laboratory course].<br />
<br />
== Links to Multiple Databases ==<br />
*[http://imgweb.jgi-psf.org/cgi-bin/img_edu_v260/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid=2500575004 JGI IMG EDU] <br> public access <br> *[[Media:JGIAnnotation.xls|JGI Annotation Excel Spreadsheet]]<br />
*[http://www.tigr.org/tigr-scripts/prok_manatee/shared/login.cgi Manatee at JCVI] <br> use the davidson number sent by email as username and password (database is nthu01 - this is case sensitive) <br> *[[Media:ManateeAnnotation.xls|Manatee Annotation Excel Spreadsheet]]<br />
*[http://rast.nmpdr.org/ SEED view via RAST] <br> use the username and password combination sent to you by SEED <br> *[[Media:RastAnnotation.xls|RAST Annotation Excel Spreadsheet]]<br />
*[http://www.genome.jp/kegg/kaas/ KEGG]<br> We can submit our genes to KEGG to have it mapped out, but SEED and Manatee may already do this. Do we want to ask them to upload it into their database? <br />
<br><br />
<br />
== RNA Genes ==<br />
<br />
*[[tRNA Genes Check List]]<br><br />
*[[rRNA operon]]<br><br />
*[[2 misc. RNA genes]] (short summary list)<br><br />
*[[Missing tRNA-trp gene found]]<br><br />
<br />
== Other Resources ==<br />
*[[Consensus Shine Dalgarno]] Excel File for ''H. utahensis'' <br><br />
*[[References]]<br><br />
*[[Gene Annotation Template]]<br><br />
*[[General Questions]]<br><br />
*[[Page for Annotated Genes]]<br><br />
<br />
== Tutorials for Annotating Genomes ==<br />
<br />
# Will DeLoache- BioPerl Installation <br><br />
# Max Win- Introduction to Perl for non-programmers.(with step by step explanations,simple exercises and solutions)<br><br />
# Pallavi-Conserved Domains Database (CDD) <br><br />
# Mary- Protein Data Bank <br><br />
# Laura Voss - Pfam Database <br><br />
# Samantha Simpson - NCBI Blast (protein, nucleotide, and blast2) <br><br />
# Peter Bakke - Finding species-specific Shine-Dalgarno sequence<br><br />
<br />
== Research Questions ==<br />
#How do the three systems compare for finding ORFs and RNA genes?<br />
#Is there a pattern of missed genes for any of the 3 sites? <br />
#Do the three systems differ in their ability to find good start codons and Shine-Dalgarno sequences? [We need a standard set of genes for comparison. Only highly conserved or a range of genes?]<br />
# Were Shine-Dalgarno sequences calculated for our species or default values used? If default, what sequence?<br />
#Can we fill any holes in their automated annotation? Is there a mechanism for users to add in genes?<br />
#How do the 3 sites compare for ease of use?<br />
#What are the strengths and weakness of each system? What did they publish as their special features and how do we see these working?<br />
#How does each of the 3 sites compare for pathway detection and visualization? <br />
#Do they find the origin of replication? Can we find it? <br />
<br />
<hr><br />
<br />
== This is a list of glossary words (A - Z): ==<br />
[[#A| A ]] [[#B| B ]] [[#C| C ]] [[#D| D ]] [[#E| E ]] [[#F| F ]] [[#G| G ]] [[#H| H ]] [[#I| I ]] [[#J| J ]] [[#K| K ]] [[#L| L ]] [[#M| M ]] [[#N| N ]] [[#O| O ]] [[#P| P ]] [[#Q| Q ]] [[#R| R ]] [[#S| S ]] [[#T| T ]] [[#U| U ]] [[#V| V ]] [[#W| W ]] [[#X| X ]] [[#Y| Y ]] [[#Z| Z ]] <br />
<br />
== A ==<br />
'''Accession Number''' - a unique identifier given to DNA and protein sequences to allow for tracking of sequence information within a single database [http://en.wikipedia.org/wiki/Accession_number_(bioinformatics)] (Will).<br />
<br />
'''<i>Arabidopsis thaliana</i>''' - the scientific name for the thale cress plant; it was the first plant to have its genome sequenced, and is a model organism for understanding plant biology and genetics ([http://en.wikipedia.org/wiki/Thale_cress Wikipedia.org], Jay)<br />
<br />
== B ==<br />
'''BAC''' - <i>b</i>acterial <i>a</i>rticifical <i>c</i>hromosome, a DNA construct used for transforming or cloning segments of DNA and often used to sequence the genetic code of organisms ([http://en.wikipedia.org/wiki/Bacterial_artificial_chromosome Wikipedia.org], Jay)<br />
<br />
'''bioinformatics''' - the multi-disciplinary approach of using biology, computer science and mathematics to solve or better understand biological problems [http://en.wikipedia.org/wiki/Bioinformatics] (Matt)<br />
<br />
'''BLAST''' - (Basic Local Alignment Search Tool) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. [http://blast.ncbi.nlm.nih.gov/Blast.cgi] (Mary)<br />
<br />
'''bioperl'''- a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications such as accessing sequence data from local and remote databases, transforming formats of database, manipulating individual sequences, searching for similar sequences, searching for genes and other structures on genomic DNA, or developing a machine readable sequence annotations. [http://en.wikipedia.org/wiki/BioPerl] (Wikipedia, Max Win)<br />
<br />
== C ==<br />
'''carbon fixation''' - using carbon dioxide to create organic materials [http://en.wikipedia.org/wiki/Carbon_fixation] (Samantha)<BR><br />
<br />
'''CDD''' (Conserved Domains Database)- a database used to identify the conserved domains present in a protein query sequence [http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml] (Mary)<br />
<br />
'''chaperonin''' - a protein complex that assists some newly formed polypeptide chains by folding them into their final, functional, three-dimensional form [http://en.wikipedia.org/wiki/Chaperonins] (Matt)<br />
<br />
'''chemotaxis''' - the process in which cells will seek out or flee from a high concentration of certain chemicals and is found in both uni- and multicellular organisms. This process is used to avoid toxins or find food in unicelllular organisms or tasks such as reproduction in multicellular organisms [http://en.wikipedia.org/wiki/Chemotaxis] (Nick)<br />
<br />
'''chemotaxonomy''' - the attempt to classify and identify organisms according to demonstrable differences and similarities in their biochemical compositions [http://en.wikipedia.org/wiki/Chemotaxonomy] (Mary)<br />
<br />
'''ClustalW''' - A web-based or command line tool that performs multiple sequence alignments to determine evolutionary relationships between three or more sequences [http://en.wikipedia.org/wiki/Clustal] (Will).<br />
<br />
'''COG''' (Cluster of Orthologous Groups)- corresponds to a highly conserved domain and generally consists of either individual proteins or groups of paralogs ([http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml COG] Pallavi) <br><br />
<br />
'''concatemer''' - long continuous DNA molecule that contains the same DNA sequence repeated in series [http://en.wikipedia.org/wiki/Concatemer](Samantha)<BR><br />
<br />
'''contigs''' (contiguous DNA)- overlapping DNA segments that as a collection from a longer and gapless segment of DNA. (Discovery Genomics, Proteomics and Bioinformatics [http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''coverage''' - refers to the number of times, on average, any piece of DNA in a sequenced genome has been individually sequenced (Lecture, Jay)<br />
<br />
'''CPAN (Comprehensive Perl Archive Network)''' - an archive of over 12,200 modules of software written in Perl, as well as documentation for it. It contains a module called CPAN (or CPAN.pm) which is used as an installer for Perl modules such as BioPerl [http://en.wikipedia.org/wiki/CPAN](Will).<br />
<br />
== D ==<br />
'''''de novo'' synthesis''' - the synthesis of complex molecules from simple molecules (e.g. sugars and nucleotides), rather than from recycled molecules; from the latin "of the new" [http://en.wikipedia.org/wiki/De_novo_synthesis] (Matt)<br />
<br />
'''dehydrogenase''' - a type of enzyme that oxidizes a substrate by transferring one or more protons and a pair of electrons to an acceptor. [http://en.wikipedia.org/wiki/Dehydrogenase] (Peter)<br />
<br />
'''diatom''' - a major group of eukaryotic algae, and one of the most common types of phytoplankton. A characteristic feature of diatom cells is that they are encased within a unique cell wall made of silica called a frustule. These frustules show a wide diversity in form, but usually consist of two asymmetrical sides with a split between them. [http://en.wikipedia.org/wiki/Diatom] (Mary)<br />
<br />
'''dot plot'''-graphical display comparing sequence conservation between two genomes with dots indicating strings of identical bases. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
== E ==<br />
<br />
'''EC number''' (Enzyme Commission Number)- a numerical classification scheme for enzymes, based on the chemical reactions they catalyze [http://en.wikipedia.org/wiki/EC_number] (Mary)<br />
<br />
'''E-value''' (Expect value)- When performing a BLAST search, you will obtain an E-value for each sequence that is retrieved. And E-value can be thought of as the probability that two sequences are similar to each other by chance. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''Extremophile''' - an organism that thrives in and may even require physically or geochemically extreme conditions that are detrimental to the majority of life on Earth [http://en.wikipedia.org/wiki/Extremophile] (Will).<br />
<br />
== F ==<br />
<br />
'''FASTA format''' - a format used to convey either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented by single-letter codes. The sequence name and other descriptors often precede the amino acide sequence. [http://en.wikipedia.org/wiki/FASTA_format] (Nick)<br><br />
<br />
'''finished genome''' - a genome that has been sequenced at least partly by hand, resulting at least 99.99% sequence accuracy (Lecture, Jay)<br><br />
<br />
== G ==<br />
<br />
'''GC Content''' - the percentage of bases within a certain sequence of DNA (e.g. a gene or a genome) that are either guanine or cytosine; a higher GC content is characteristic of a coding region of a gene; differences in GC content between a gene and a genome can be used as evidence for horizontal gene transfer [http://en.wikipedia.org/wiki/GC-content] (Matt)<br><br />
<br />
'''GC-skew''' – uneven distribution of guanine and cytosine bases between the two strands of DNA where GC base pairs occur. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''gene amplification''' - production of multiple copies of a gene in order to amplify the amount of protein that the gene encodes for [http://www.medterms.com/script/main/art.asp?articlekey=13537] [http://www.answers.com/topic/gene-amplification] (Matt)<br />
<br />
'''gene knockout''' - a process in which a gene is deactivated within a test organism in order to better understand the function of the gene in that organism [http://en.wikipedia.org/wiki/Gene_knockout] (Matt)<br />
<br />
'''gene oncology'''- a collaborative effort of investigators to unify and standardize terms associated with the role a gene or protein plays in an organism. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''glaucophyte''' - freshwater algae that have not been studied well [http://en.wikipedia.org/wiki/Glaucophyte](Samantha)<br><br />
<br />
== H ==<br />
<br />
'''haemolysin or hemolysin''' - a chemical produced by a bacteria that causes lysis of red blood cells [http://en.wikipedia.org/wiki/Hemolysis_(microbiology)] (Nick)<br />
<br />
'''halophile''' - an organism, most often of the Archaea domain, that lives in environments containing high concentrations of salt [http://en.wikipedia.org/wiki/Halophile] (Matt)<br />
<br />
'''haplotype'''-collection of alleles that travel together (Lecture, Pallavi)<br />
<br />
'''haptophyte''' - phylum of algae [http://en.wikipedia.org/wiki/Haptophyte](Samantha)<br />
<br />
'''heterokont''' - major line of eukaryotes consisting of about 10,500 known species, most of which are algae [http://en.wikipedia.org/wiki/Heterokont](Samantha)<br />
<br />
'''Hidden Markov Model''' - a statistical model used in protein recognition databases such as Pfam. A Hidden Markov Model keeps track of several variables and possible variations thereof, such as the possible amino acid sequences that make up a protein domain (since there can be some variance in an amino acid sequence) or the variations in the component sounds that make up a word, and uses those points to match a given sequence to the word, domain, or other complex sequence it most closely matches. An HMM in speech recognition software, for example, can identify that a certain set of sounds make up a certain word, even with the variations in pronunciation and accent that different people will give those sounds. ([http://en.wikipedia.org/wiki/Hidden_Markov_Model Wikipedia] and lecture, Laura) <br />
<br />
'''homeobox''' - DNA sequence within transcription factor genes that allow the cell to respond to patterns of development by having the transcription factors switch on gene cascades [http://en.wikipedia.org/wiki/Homeobox](Samantha)<br />
<br />
'''homodimer''' - a protein made of paired identical polypeptides ([http://www.answers.com/topic/homodimer Answers.com], Jay)<br />
<br />
'''horizontal gene transfer'''-DNA transmission between species and incorporation of the DNA into the recipient's genome ([http://www.csrees.usda.gov/nea/biotech/res/biotechnology_res_glossary.html horizontal gene transfer] Pallavi)<br />
<br />
'''hydrolase''' - an enzyme that catalyzes hydrolysis, the breakdown of water into oxygen and hydrogen atoms which often take part in subsequent reactions [http://en.wikipedia.org/wiki/Hydrolase] (Nick)<br />
<br />
== I ==<br />
<br />
'''ideogram''' - in genomics, usually describes a stylized representation of a chromosome with banding patterns (Campbell-Heyer Genomics textbook, Jay)<br />
<br />
'''identities''' - in a BLAST output, the number and fraction of total residues which are identical in a given alignment [www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''indole'''-a chemical compound that is produced from the break down of tryptophan ([http://medical-dictionary.thefreedictionary.com/indole indole] Pallavi)<br />
<br />
'''inclusion body''' - Inclusion bodies are collections of stainable substances, usually proteins, that are found either in the nucleus or the cytoplasm. It is thought that these bodies are often the result of viral proteins that misfolded [http://en.wikipedia.org/wiki/Inclusion_body] (Nick)<br />
<br />
'''intron''' - a region of DNA in a gene that is not part of the final coding sequence for the protein. [http://en.wikipedia.org/wiki/Intron] (Peter)<br />
<br />
'''isoelectric point''' - the pH at which a molecule is neutral [http://en.wikipedia.org/wiki/Isoelectric_point] (Nick)<br />
<br />
'''isozymes''' - members of a gene family with very similar cellular roles (Cambpell-Heyer Genomics textbook, Jay)<br />
<br />
== J ==<br />
<br />
== K ==<br />
'''KEGG (Kyoto Encyclopedia of Genes and Genomes)''' - a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. The Pathway database records networks of molecular interactions in the cells, and variants of them specific to particular organisms [http://en.wikipedia.org/wiki/KEGG](Will).<br />
<br />
'''kinase''' - a type of enzyme that transfers a phosphate group from a high-energy donor molecule to a target molecule in a process called phosphorylation. [http://en.wikipedia.org/wiki/Kinase] (Peter)<br />
<br />
== L ==<br />
<br />
== M ==<br />
'''Manatee''' - a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. This on-going, open source initiative was developed with two missions. One, to allow biologists the ability to functionally annotate their genomes using a powerful, stand-alone web application with a robustly designed relational annotation database. And secondly, to invite outside developers the opportunity to contribute their own ideas and requirements to enhance Manatee's ability to accomplish biological goals [http://manatee.sourceforge.net/](Will).<br />
<br />
'''motif''' - a sequence of amino acids or nucleotides that performs a particular role and is often conserved in other species or molecules. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''mycoplasma''' - genus of bacteria that lack a cell wall [http://en.wikipedia.org/wiki/Mycoplasma] (Nick)<br />
<br />
== N ==<br />
<br />
'''NORFs''' (nonannotated open reading frame) - on open reading frame that was considered not to be a real gene when the genome was annotated.( Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''nucleomorph''' - reduced eukaryotic nuclei found in plastids [http://en.wikipedia.org/wiki/Nucleomorph](Samantha)<br />
<br />
== O ==<br />
'''object-oriented programming''' - a programming paradigm in which collections of data, associated with operations on that data, are modularly defined and then built upon (CSC 121 Lecture, Will). <br />
<br />
'''open reading frame (ORF)'''-a segment of DNA that can potentially encode for a protein and it begins with a start codon (usually ATG) [http://www.fao.org/DOCREP/003/X3910E/X3910E18.htm ORF] (Pallavi)<br />
<br />
'''operon''' - a segment of DNA involving an operator, promoter, and one or more genes that operate as a single unit during transcription [http://en.wikipedia.org/wiki/Operon] (Nick)<br />
<br />
'''optical mapping'''-DNA sequences of the organism in question are compared against a karyotype that specifically looks at restriction sites found within the DNA to correctly order the DNA sequences on a chromosome. This methodology gives very detailed haplotype information and allows for the detection of sequence variations across an entire genome [http://www.geocities.com/bioinformaticsweb/genomicglossary.html optical mapping] (Pallavi)<br />
<br />
'''ortholog'''-different DNA sequences that look very similar, but have no evolutionary relationship (Lecture, Pallavi)<br />
<br />
'''oxidoreductase''' - an enzyme that catalyzes redox reactions by transferring electrons from one molecule (the reductant) to another (the oxidant) [http://en.wikipedia.org/wiki/Oxidoreductase] (Nick)<br />
<br />
== P ==<br />
<br />
'''paralog'''-identical DNA sequences within a species (Lecture, Pallavi)<br />
<br />
'''p-arm''' - the shorter arm of a chromosome's two arms separated by the centromere (compare to q-arm, the longer arm) ([http://www.medterms.com/script/main/art.asp?articlekey=4715 MedTerms Dictionary], Jay)<br />
<br />
'''Perl''' - Developed by Larry Wall in 1987, Perl is a [http://en.wikipedia.org/wiki/High-level_programming_language high-level programming language] used frequently by biologists and bioinformaticists [http://en.wikipedia.org/wiki/Perl] (Will). <br />
<br />
'''periplasmic space''' - the space between the inner cytoplasmic membrane and external outer membrane in bacteria or archaea. [http://en.wikipedia.org/wiki/Periplasmic_space] (Peter)<br />
<br />
'''plasmid''' - an extra-chromosomal DNA molecule that is capable of replicating independently of the chromosomal DNA. Commonly found in bacteria and archaea. [http://en.wikipedia.org/wiki/Plasmid](Peter)<br />
<br />
'''plastid''' - major organelles in plants or algae [http://en.wikipedia.org/wiki/Plastid](Samantha)<br />
<br />
'''pleomorphism''' - the occurrence of two or more structural forms during a life cycle [http://en.wikipedia.org/wiki/Pleomorphism] (Mary)<br />
<br />
'''phylogenetic tree''' - a diagram showing the evolutionary relationships between biological species that are thought to share a common ancestor [http://en.wikipedia.org/wiki/Phylogenetic_tree] (Nick)<br />
<br />
'''phylotypes''' – a term intended to resolve the challenge of “species” when classifying prokaryotes using DNA sequence comparisons. (Discovery Genomics, Proteomics and Bioinformatics[http://wps.aw.com/bc_campbell_genomics_2/43/11232/2875502.cw/index.html], Max Win)<br />
<br />
'''positives''' - in a BLAST output, the number and fraction of residues for which the alignment scores have positive rather than negative values [http://www.ncbi.nlm.nih.gov/blast/blast_help.shtml] (Mary)<br />
<br />
'''proteome''' - entire set of proteins expressed by a genome, cell, tissue, or organism. It may refer to expressed proteins under certain conditions [http://en.wikipedia.org/wiki/Proteome](Samantha)<br />
<br />
'''psuedogenes'''-A sequence of DNA that looks like a gene, but most likely contains many stop codons. It may have evolved away from a real gene or a paralog might have taken its place (Lecture, Pallavi)<br />
<br />
'''purine''' - a category of nitrogenous base consisting of a pyrimidine ring fused to an imidazole ring. Notable purine bases are adenine and guanine. [http://en.wikipedia.org/wiki/Purine] (Peter)<br />
<br />
'''pyrimidine''' - a category of nitrogenous base consisting of a heterocyclic aromatic ring containing two nitrogen atoms at positions 1 and 3 of the six-member ring. Notable pyrimidine bases are cytosine, thymine, and uracil. [http://en.wikipedia.org/wiki/Pyrimidine] (Peter)<br />
<br />
== Q ==<br />
<br />
'''q-arm''' - the longer arm of a chromosome's two arms separated by the centromere (compare to p-arm, the shorter arm) ([http://www.medterms.com/script/main/art.asp?articlekey=5152 MedTerms Dictionary], Jay)<br />
<br />
== R ==<br />
<br />
'''RAST''' - (Rapid Annotation using Subsystem Technology)- a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. ([http://rast.nmpdr.org/], Max Win)<br />
<br />
'''rDNA'''-These are DNA sequences that encode for ribosomal RNA. Note that rDNA can also stand for recombinant DNA. ([http://en.wikipedia.org/wiki/Ribosomal_DNA rDNA] Pallavi)<br />
<br />
'''retrotransposons''' - RNA transcribed back into DNA and added into the genome [http://en.wikipedia.org/wiki/Retrotransposon](Samantha)<br />
<br />
'''ribonuclease''' - a nuclease that catalyzes the degradation of RNA into smaller components [http://en.wikipedia.org/wiki/Ribonuclease] (Mary)<br />
<br />
== S ==<br />
'''Serovar'''-a subdivision of a species based on the characteristics of their cell surface antigens ([http://www.biology-online.org/dictionary/Serovar serovar] Pallavi)<br />
<br />
'''scaffold''' - a section of a sequenced genome composed of contigs that are in the right order but not necessarily connected ([http://www.medterms.com/script/main/art.asp?articlekey=25223 MedTerms Dictionary], Jay)<br />
<br />
'''Shine-Dalgarno sequence''' - A ribosomal binding site on an mRNA, usually a sequence of six base pairs about six or seven base pairs upstream of the start codon. An anti-Shine-Dalgarno sequence exists on the rRNA in the small subunit of the ribosome; when the two sequences align, the mRNA is lined up and prepared for transcription. (Lecture and [http://en.wikipedia.org/wiki/Shine-dalgarno Wikipedia article], Laura)<br><br />
Note: The Shine-Dalgarno consensus sequence for our genome is TAGGAGG.<br />
<br />
'''signal peptide''' - a short peptide chain that directs the post-translational transport of a protein [http://en.wikipedia.org/wiki/Signal_peptide] (Matt)<br />
<br />
'''Smith-Waterman alignment''' - A well-known algorithm for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith-Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure [http://en.wikipedia.org/wiki/Smith_waterman](Will).<br />
<br />
'''SNP (Single Nucleotide Polymorphism)''' - a DNA sequence variation occurring when a single nucleotide in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual) [http://en.wikipedia.org/wiki/Single_nucleotide_polymorphism](Will).<br />
<br />
'''symporter''' - an integral membrane protein that is involved in movement of two or more different molecules or ions across a phospholipid membrane. [http://en.wikipedia.org/wiki/Symporter] (Peter)<br />
<br />
'''synteny''' - a neologism from the Greek for "on the same ribbon". Genes that are syntenic in one species are on the same chromosome; genes that are syntenic across species retain the same order on respective chromosomes as a result of descent from a common ancestor ([http://www.answers.com/synteny Answers.com], Jay)<br />
<br />
'''synthetase''' - a type of enzyme that creates a new covalent bond and requires direct input of energy from a high-energy phosphate. [http://books.google.com/books?id=bB8XnCykRmIC&pg=PA522&lpg=PA522&dq=%22synthetase+is+an+enzyme%22&source=web&ots=wkws4ksMsg&sig=zWLkDIk7T78hcf9S84nWs3u5Apw&hl=en&sa=X&oi=book_result&resnum=9&ct=result] (Peter)<br />
<br />
== T ==<br />
'''transferase''' - an enzyme that catalyzes the transfer of a functional group from one molecule (the donor) to another (the acceptor) [http://en.wikipedia.org/wiki/Transferase] (Matt)<br />
<br />
'''transmembrane helix''' - a single transmembrane alpha helix of a transmembrane protein, usually about twenty amino acids in length. They are usually predicted by hydrophobicity. [http://en.wikipedia.org/wiki/Transmembrane_domain](Mary)<br />
<br />
'''transposons / transposable elements''' - DNA sequences that can move around to different positions in a single cell's genome. Transposons can cause mutations and change the length of the genome. [http://en.wikipedia.org/wiki/Transposon](Samantha)<br />
<br />
'''Transposon Mutagenesis'''-a procedure in which a transposon is inserted into a gene, which inactivates the gene and can lead to the discovery of the phenotype associated with this gene ([http://cancerweb.ncl.ac.uk/cgi-bin/omd?transposon+mutagenesis transposon mutagenesis] Pallavi)<br />
<br />
'''tRNA splicing endonuclease''' - an enzyme that cleaves intervening sequences of precursor tRNA. [http://cancerweb.ncl.ac.uk/cgi-bin/omd?splicing+endonuclease] (Peter)<br><br />
<br />
== U ==<br />
<br />
== V ==<br />
<br />
== W ==<br />
<br />
'''whole genome shotgun sequencing''' - a method of sequencing where DNA is cut into small pieces and cloned into vectors, then both ends of every vector are sequenced in about 500 bps to form mate pairs. Mate pairs rarely overlap, but are used to reassemble the sequence using software. [http://en.wikipedia.org/wiki/Whole_genome_shotgun](Samantha)<br />
<br><br />
<br />
== X ==<br />
'''xenolog''' - homologs that are created by horizontal gene transfer between two different species [http://en.wikipedia.org/wiki/Xenolog#Xenology] (Matt)<br><br />
<br />
== Y ==<br />
<br />
== Z ==<br />
<br />
<BR><br />
<HR><br />
<HR><br />
<br />
== This is a list of the student-created tutorials: ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4354Modeling Promoter Activity2007-12-06T21:51:54Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
To test their statistical analysis, Jensen et al generated different variants of a single promoter via error-prone PCR, fused the promoter into a plasmid with a GFP reporter gene, and then measured the amount of GFP via flow cytometry. The promoters were then sequenced, and any with insertions or deletions were removed until 69 promoters remained.<br />
<br />
Now, assume that each mutant can be classified into one of an unknown number or phenotypic (descriptive) classes; let's call that number M. So there would be n(m) mutants in each class, with the summation of n(m) equalling all hypothetical mutants. Now, say you have a set of mutated promoters of size X, where X < N, all with one particular point mutation. If that mutation has no effect on the phenotype of the promoter, then the number of mutants in any given class with that point mutation would equal X/N - the total number of those mutants divided by the total number of promoters. In other words, they would be distributed evenly. <br />
<br />
In multinomial statiestics, the probability that any one set X will take on another set of values y is:<br />
<br />
[[Image:Fd2_1.gif]]<br />
<br />
Where the summation of y is equal to X. Given that summation, the probability that q or more of any specific mutant appearing in a particular class (P(i)) is:<br />
<br />
[[Image:Fd5_4.gif]]<br />
<br />
The 69 promoters being examined were divided into two phenotypic classes based on their fluorescence: the top 50th percentile (brightest) and the bottom 50th percentile (dimmest). Because there are only two classes, the statistical analysis is simplified somewhat. The complete statistical analysis can be seen here in Figure 5:<br />
<br />
[[Image:Zam0050667180002.gif]]<br />
<br />
Figure 5. <small>Statistical distribution of mutations and their effects on mutant fluorescence. In panel A, the vertical axis shows the mutant number, where the mutants are sorted in descending order by their relative fluorescence. In general, the single-cell fluorescence distribution for each mutant strain was log normal distributed. The horizontal axis shows the mean of the log relative fluorescence for each mutant strain, where the error is the standard deviation of this distribution. Reading to the right from panel A into panel B reveals the point mutations present in each mutant. For each location in a mutant (where location is indicated on the horizontal axis) that was changed via the error-prone PCR, a black dot is indicated. With only two exceptions, all of these changes are base transitions rather than transversions, so the sequence of each of the 69 clones can be inferred from the wild-type sequence shown in panel D. (All of the mutations indicated in panel B are transitions with the exception of one A-C transversion at –125 bp in clone 53 and one T-G transversion at –8 in clone 68. These were treated as though they were transitions in our analysis.) Reading down from panel B into panel C shows how mutations at a particular location partition between the two classes of mutants: the top and bottom 50th percentiles. Sites that have no effect on the fluorescence phenotype should partition equally between the two classes, i.e., they should follow a binomial distribution with P = 0.5. Sites that deviate from this distribution are labeled with a dot and are colored either green or red, corresponding to the apparent effect of a mutation at the site. For these sites, P values are indicated, where this value is the probability of seeing a distribution at least as skewed to one side. Sites that were subsequently tested experimentally (see text) are indicated with an asterisk, where the color of the asterisk denotes the expected effect of a mutation at the site. We chose a range of sites to test experimentally from sites with high-confidence (low P value) positive effects to those with low-confidence (P value 0.5) negative effects (Table 1). These sites are also shown in panel D, which contains the wild-type nucleotide sequence of the promoter region that was subjected to mutation.</small> From Jensen et all (2006). Permission pending.<br />
<br />
Statistical analysis revealed seven nucleiotide positions that were correlated with one of the two classes in a significant manner. These seven positions were then tested individually, to see if their phenotype when isolated matched their phenotype when the mutation was random (and accompanied by many other mutations).<br />
<br />
When tested, six out of the seven mutants proved to have a similar phenotype in isolation to the phenotype they had in the random mutations, meaning that the statistical model used to predict the significant mutations was accurate and predicted correctly.<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4342Modeling Promoter Activity2007-12-06T21:36:59Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
To test their statistical analysis, Jensen et al generated different variants of a single promoter via error-prone PCR, fused the promoter into a plasmid with a GFP reporter gene, and then measured the amount of GFP via flow cytometry. The promoters were then sequenced, and any with insertions or deletions were removed until 69 promoters remained.<br />
<br />
Now, assume that each mutant can be classified into one of an unknown number or phenotypic (descriptive) classes; let's call that number M. So there would be n(m) mutants in each class, with the summation of n(m) equalling all hypothetical mutants. Now, say you have a set of mutated promoters of size X, where X < N, all with one particular point mutation. If that mutation has no effect on the phenotype of the promoter, then the number of mutants in any given class with that point mutation would equal X/N - the total number of those mutants divided by the total number of promoters. In other words, they would be distributed evenly. <br />
<br />
In multinomial statiestics, the probability that any one set X will take on another set of values y is:<br />
<br />
[[Image:Fd2_1.gif]]<br />
<br />
Where the summation of y is equal to X. Given that summation, the probability that q or more of any specific mutant appearing in a particular class (P(i)) is:<br />
<br />
[[Image:Fd5_4.gif]]<br />
<br />
The 69 promoters being examined were divided into two phenotypic classes based on their fluorescence: the top 50th percentile (brightest) and the bottom 50th percentile (dimmest). Because there are only two classes, the statistical analysis is simplified somewhat. The complete statistical analysis can be seen here in Figure 5:<br />
<br />
[[Image:Zam0050667180002.gif]]<br />
<br />
Figure 5. <small>Statistical distribution of mutations and their effects on mutant fluorescence. In panel A, the vertical axis shows the mutant number, where the mutants are sorted in descending order by their relative fluorescence. In general, the single-cell fluorescence distribution for each mutant strain was log normal distributed. The horizontal axis shows the mean of the log relative fluorescence for each mutant strain, where the error is the standard deviation of this distribution. Reading to the right from panel A into panel B reveals the point mutations present in each mutant. For each location in a mutant (where location is indicated on the horizontal axis) that was changed via the error-prone PCR, a black dot is indicated. With only two exceptions, all of these changes are base transitions rather than transversions, so the sequence of each of the 69 clones can be inferred from the wild-type sequence shown in panel D. (All of the mutations indicated in panel B are transitions with the exception of one A-C transversion at –125 bp in clone 53 and one T-G transversion at –8 in clone 68. These were treated as though they were transitions in our analysis.) Reading down from panel B into panel C shows how mutations at a particular location partition between the two classes of mutants: the top and bottom 50th percentiles. Sites that have no effect on the fluorescence phenotype should partition equally between the two classes, i.e., they should follow a binomial distribution with P = 0.5. Sites that deviate from this distribution are labeled with a dot and are colored either green or red, corresponding to the apparent effect of a mutation at the site. For these sites, P values are indicated, where this value is the probability of seeing a distribution at least as skewed to one side. Sites that were subsequently tested experimentally (see text) are indicated with an asterisk, where the color of the asterisk denotes the expected effect of a mutation at the site. We chose a range of sites to test experimentally from sites with high-confidence (low P value) positive effects to those with low-confidence (P value 0.5) negative effects (Table 1). These sites are also shown in panel D, which contains the wild-type nucleotide sequence of the promoter region that was subjected to mutation.</small> From Jensen et all (2006). Permission pending.<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=File:Zam0050667180002.gif&diff=4339File:Zam0050667180002.gif2007-12-06T21:35:35Z<p>Lavoss: </p>
<hr />
<div></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4338Modeling Promoter Activity2007-12-06T21:34:32Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
To test their statistical analysis, Jensen et al generated different variants of a single promoter via error-prone PCR, fused the promoter into a plasmid with a GFP reporter gene, and then measured the amount of GFP via flow cytometry. The promoters were then sequenced, and any with insertions or deletions were removed until 69 promoters remained.<br />
<br />
Now, assume that each mutant can be classified into one of an unknown number or phenotypic (descriptive) classes; let's call that number M. So there would be n(m) mutants in each class, with the summation of n(m) equalling all hypothetical mutants. Now, say you have a set of mutated promoters of size X, where X < N, all with one particular point mutation. If that mutation has no effect on the phenotype of the promoter, then the number of mutants in any given class with that point mutation would equal X/N - the total number of those mutants divided by the total number of promoters. In other words, they would be distributed evenly. <br />
<br />
In multinomial statiestics, the probability that any one set X will take on another set of values y is:<br />
<br />
[[Image:Fd2_1.gif]]<br />
<br />
Where the summation of y is equal to X. Given that summation, the probability that q or more of any specific mutant appearing in a particular class (P(i)) is:<br />
<br />
[[Image:Fd5_4.gif]]<br />
<br />
The 69 promoters being examined were divided into two phenotypic classes based on their fluorescence: the top 50th percentile (brightest) and the bottom 50th percentile (dimmest). Because there are only two classes, the statistical analysis is simplified somewhat. The complete statistical analysis can be seen here in Figure 5:<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=File:Fd5_4.gif&diff=4334File:Fd5 4.gif2007-12-06T21:30:04Z<p>Lavoss: </p>
<hr />
<div></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4333Modeling Promoter Activity2007-12-06T21:29:45Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
To test their statistical analysis, Jensen et al generated different variants of a single promoter via error-prone PCR, fused the promoter into a plasmid with a GFP reporter gene, and then measured the amount of GFP via flow cytometry. The promoters were then sequenced, and any with insertions or deletions were removed until 69 promoters remained.<br />
<br />
Now, assume that each mutant can be classified into one of an unknown number or phenotypic (descriptive) classes; let's call that number M. So there would be n(m) mutants in each class, with the summation of n(m) equalling all hypothetical mutants. Now, say you have a set of mutated promoters of size X, where X < N, all with one particular point mutation. If that mutation has no effect on the phenotype of the promoter, then the number of mutants in any given class with that point mutation would equal X/N - the total number of those mutants divided by the total number of promoters. In other words, they would be distributed evenly. <br />
<br />
In multinomial statiestics, the probability that any one set X will take on another set of values y is:<br />
<br />
[[Image:Fd2_1.gif]]<br />
<br />
Where the summation of y is equal to X. Given that summation, the probability that q or more of any specific mutant appearing in a particular class (P(i)) is:<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=File:Fd2_1.gif&diff=4331File:Fd2 1.gif2007-12-06T21:27:42Z<p>Lavoss: </p>
<hr />
<div></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4330Modeling Promoter Activity2007-12-06T21:27:26Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
To test their statistical analysis, Jensen et al generated different variants of a single promoter via error-prone PCR, fused the promoter into a plasmid with a GFP reporter gene, and then measured the amount of GFP via flow cytometry. The promoters were then sequenced, and any with insertions or deletions were removed until 69 promoters remained.<br />
<br />
Now, assume that each mutant can be classified into one of an unknown number or phenotypic (descriptive) classes; let's call that number M. So there would be n(m) mutants in each class, with the summation of n(m) equalling all hypothetical mutants. Now, say you have a set of mutated promoters of size X, where X < N, all with one particular point mutation. If that mutation has no effect on the phenotype of the promoter, then the number of mutants in any given class with that point mutation would equal X/N - the total number of those mutants divided by the total number of promoters. In other words, they would be distributed evenly. <br />
<br />
In multinomial statiestics, the probability that any one set X will take on another set of values y is:<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4326Modeling Promoter Activity2007-12-06T21:14:45Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
To test their statistical analysis, Jensen et al generated 69 different variants of a single promoter via error-prone PCR, fused the promoter into a plasmid with a GFP reporter gene, and then measured the amount of GFP via flow cytometry.<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4325Modeling Promoter Activity2007-12-06T21:12:28Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein ''without there being much more blue protein than red or green protein'', mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4324Modeling Promoter Activity2007-12-06T21:12:00Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
Say, for example, that you are creating a mutant library of a protein that can fluoresce one of three colors: red, blue, or green. If a given point mutation – let’s call it A – has no effect on the color of the fluorescence, then (assuming the mutagenesis is truly random) that mutation should appear in every phenotype proportional to the amount of protein with that phenotype. It will not appear in one phenotype significantly more than the others unless there is significantly more protein with that phenotype. It follows, then, that if point mutation B appears much more often in, say, blue protein without there being much more blue protein than red or green protein, mutation B might have some effect on the protein’s phenotype. It is probably not the sole cause of the blue color, but it is associated with it.<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4319Modeling Promoter Activity2007-12-06T21:06:23Z<p>Lavoss: /* Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
In this paper, Jensen et al tried to determine exactly why some promoters in a promoter library were stronger than others, and which mutations might cause the change in strength. Jensen et al propose to examine promoter libraries statistically rather than via assays; they will determine which mutations are associated with which phenotypes based on when they appear.<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4314Modeling Promoter Activity2007-12-06T20:55:34Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. <br />
<br />
Jensen and Hammer constructed a library of synthetic promoters that could be constitutively expressed and covered a range of activity levels, but it was still not known for certain what caused a certain promoter to be active at a certain rate. Jensen and Hammer suggested in their Discussion that "it seems that the overall three-dimensional structure which arises from a particular nucleiotide sequence could be important".<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4313Modeling Promoter Activity2007-12-06T20:51:13Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
The mutant promoters expressed a wide range of activity, increasing in small increments. Note that not all of the clones were "perfect" - a few had mutations in the oligonucleotide sequences that were supposed to be preserved across the library. Those clones are indicated in the graph above. However, their data was not removed because it was within range of the data from the perfect clones - they caused no break in the general data trend. In addition, all clones were tested to ensure that they were truly constitutive.<br />
<br />
When the promoters were cloned into ''E. coli'', the same basic trend was observed. While the promoters did not demonstrate the same level of activity as they did in ''L. lactis'', there was still a wide range of activity observed, with the activity level increasing in steady increments. The activity of the synthetic promoters is demonstrated in Figure 5.<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4309Modeling Promoter Activity2007-12-06T20:41:15Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 3.<br />
<br />
[[Image:Am0180933003.gif]]<br />
Figure 3. <small>Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding -galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which promoter clones contain errors in either the 35 or the 10 consensus sequence or in the length of the spacer between these sequences. </small> From Jensen and Hammer (1997). Permission Pending.<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=File:Am0180933003.gif&diff=4308File:Am0180933003.gif2007-12-06T20:40:03Z<p>Lavoss: </p>
<hr />
<div></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4306Modeling Promoter Activity2007-12-06T20:39:33Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity. The activity of each promoter (in Miller units, or beta-galactosidase concentration) is described in Figure 1.<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4301Modeling Promoter Activity2007-12-06T20:34:15Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
After the promoter library was synthesized, promoters were cloned into both ''L. lactis'' and ''E. coli''; each cell culture containing a different promoter was tested for the level of beta-galactosidase activity.<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4297Modeling Promoter Activity2007-12-06T20:27:44Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants, then allowing the oligonucleiotides to be joined together by random spacer sequences.<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4294Modeling Promoter Activity2007-12-06T20:26:58Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the ''Lactococcus lactis'' prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state ''L. lactis'' promoter without using an inducer, Jensen and Hammer had to create a library of ''L. lactis'' mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous ''L. lactis'' promoters and mutants.<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4293Modeling Promoter Activity2007-12-06T20:26:18Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the Lactococcus lactis prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state L. lactis promoter without using an inducer, Jensen and Hammer had to create a library of L. lactis mutant promoters, all with various levels of activity. To generate the library, they used the method described in [[Promoters and Reporters in Synthetic Biology]]: constructing oligonucleiotides that matched the genes common to all previous L. lactis promoters and mutants<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4292Modeling Promoter Activity2007-12-06T20:25:30Z<p>Lavoss: /* Jensen and Hammer (1997): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
In this 1997 paper, Jensen and Hammer constructed a library of synthetic promoters based on the Lactococcus lactis prokaryotic promoter in order to better determine how gene sequence of promoters was tied to the promoter strength. Specifically, Jensen and Hammer were looking for a way to construct a constitutively active promoter – one that was always turned on, without needing an inducer – that could be safely used to tune gene expression in industrial-scale metabolic engineering projects, where inducers might be impractical or hazardous.<br />
<br />
In order to tune the steady-state L. lactis promoter without using an inducer, Jensen and Hammer had to create a library of L. lactis mutant promoters, all with various levels of activity. To generate the library, they used the method described in "...": constructing oligonucleiotides that matched the genes common to all previous L. lactis promoters and mutants<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4289Modeling Promoter Activity2007-12-06T20:17:08Z<p>Lavoss: /* Jensen and Hammer (2006): Spacer Sequences */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (1997): Spacer Sequences ===<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Predicting_Gene_Circuit_Activity&diff=4240Predicting Gene Circuit Activity2007-12-06T19:21:44Z<p>Lavoss: </p>
<hr />
<div>== Modeling and Predicting Gene Circuit Activity ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Reporter_Activity&diff=4239Modeling Reporter Activity2007-12-06T19:20:02Z<p>Lavoss: </p>
<hr />
<div>== Modeling Reporter Activity ==<br />
<br />
=== The Hill Function and Michaelis-Menten Kinetics ===<br />
<br />
<br />
=== Leveau and Lindow (2001): Modeling GFP as a function of promoter activity ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4235Promoters and Reporters in Synthetic Biology2007-12-06T19:17:21Z<p>Lavoss: /* Works Cited */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*[[Modeling Promoter Activity | Modeling Promoter Activity: Developing a model for predicting promoter activity based on mutations and gene sequence.]]<br />
<br />
*[[Modeling Reporter Activity | Modeling Reporter Activity: The kinetics of promoter protein degradation and developing a model thereof.]]<br />
<br />
*[[Predicting Gene Circuit Activity | Predicting gene circuit activity based on promoter and reporter modeling.]]<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> In Cox, Surette, and Elowitz 2007. Permission Pending.<br />
<br />
[[Image:Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*De Mey M, Maertens J, Lequeux GJ, Soetaert WK, and Vandamme EJ (2007) Construction and model-based analysis of a promoter library for ''E. coli'': an indispensable tool for metabolic engineering. ''BMC Biotechnology''7(34). Epub 2007 June.<br />
<br />
*Jensen, PR and Hammer, K (1997). The sequence of spacers between the consensus sequences modulates the strength of prokaryotic promoters. ''Applied and Environmental Microbiology''64(1). <br />
<br />
*Jensen, PR and Hammer, K (1997). Artificial promoters for metabolic optimization. ''Biotechnology and Bioengineering''58(2-3).<br />
<br />
*Jensen K, Alper H, Fischer C and Stephanopoulos G (2006). Identifying functionally important mutations from phenotypically diverse sequence data. ''Applied and Environmental Microbiology''72(5).<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4230Promoters and Reporters in Synthetic Biology2007-12-06T19:02:27Z<p>Lavoss: /* Measuring, Testing, Tuning, and Modeling Promoters and Reporters */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*[[Modeling Promoter Activity | Modeling Promoter Activity: Developing a model for predicting promoter activity based on mutations and gene sequence.]]<br />
<br />
*[[Modeling Reporter Activity | Modeling Reporter Activity: The kinetics of promoter protein degradation and developing a model thereof.]]<br />
<br />
*[[Predicting Gene Circuit Activity | Predicting gene circuit activity based on promoter and reporter modeling.]]<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> In Cox, Surette, and Elowitz 2007. Permission Pending.<br />
<br />
[[Image:Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]<br />
<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4229Modeling Promoter Activity2007-12-06T18:59:29Z<p>Lavoss: /* Modeling Promoter Activity */</p>
<hr />
<div>== Modeling Promoter Activity ==<br />
<br />
In order to use synthetic promoters to their fullest potential, we have to understand how they work. Sythetic promoters cannot help us model gene circuit activity unless models are developed for the activity of the promoter itself. Determining how exactly a promoter's strength correlates to its mutations is not easy, since for the most part it requires working with promoters on the level of individual sets of nucleiotides.<br />
<br />
=== Jensen and Hammer (2006): Spacer Sequences ===<br />
<br />
<br />
=== Jensen, Alper, Fischer, and Stephanopoulis (2006): Statistical Modeling and Critical Mutation Sites ===<br />
<br />
<br />
=== De Mey, Maertens, Lequeux, Soetart, and Vandamme (2007): Probability and Partial Least Squares Modeling ===</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Modeling_Promoter_Activity&diff=4218Modeling Promoter Activity2007-12-06T18:44:36Z<p>Lavoss: </p>
<hr />
<div>== Modeling Promoter Activity ==</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4216Promoters and Reporters in Synthetic Biology2007-12-06T18:44:12Z<p>Lavoss: /* Measuring, Testing, Tuning, and Modeling Promoters and Reporters */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*[[Modeling Promoter Activity | Modeling Promoter Activity: Developing a model for predicting promoter activity based on mutations and gene sequence.]]<br />
<br />
*[[Modeling Reporter Activity | Modeling Reporter Activity: The kinetics of promoter protein degradation and developing a model thereof.]]<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> In Cox, Surette, and Elowitz 2007. Permission Pending.<br />
<br />
[[Image:Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]<br />
<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4214Promoters and Reporters in Synthetic Biology2007-12-06T18:42:07Z<p>Lavoss: /* Measuring, Testing, Tuning, and Modeling Promoters and Reporters */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*[Modeling Promoter Activity | Modeling Promoter Activity: Developing a model for predicting promoter activity based on mutations and gene sequence.]<br />
<br />
*[Modeling Reporter Activity | Modeling Reporter Activity: The kinetics of promoter protein degradation and developing a model thereof.]<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> In Cox, Surette, and Elowitz 2007. Permission Pending.<br />
<br />
[[Image:Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]<br />
<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4210Promoters and Reporters in Synthetic Biology2007-12-06T18:29:46Z<p>Lavoss: /* Figures */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*Protein Degradation Modeling - as with GFP in the Lindow Paper.<br />
<br />
*Tuning - The use of random mutations or combined promoters to increase a promoter's sensitivity to a stimulus. <br />
<br />
*Cooperativity in promoters<br />
<br />
<br />
<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> In Cox, Surette, and Elowitz 2007. Permission Pending.<br />
<br />
[[Image:Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]<br />
<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4204Promoters and Reporters in Synthetic Biology2007-12-06T18:27:50Z<p>Lavoss: /* Figures */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*Protein Degradation Modeling - as with GFP in the Lindow Paper.<br />
<br />
*Tuning - The use of random mutations or combined promoters to increase a promoter's sensitivity to a stimulus. <br />
<br />
*Cooperativity in promoters<br />
<br />
<br />
<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html]<br />
<br />
[[Image:Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]<br />
<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=Promoters_and_Reporters_in_Synthetic_Biology&diff=4203Promoters and Reporters in Synthetic Biology2007-12-06T18:27:24Z<p>Lavoss: /* Figures */</p>
<hr />
<div><br />
<br />
== What Are Promoters and Reporters? ==<br />
<br />
[http://en.wikipedia.org/wiki/Promoter Promoters] and [http://en.wikipedia.org/wiki/Reporter_gene reporters] are genetic components used in engineering gene circuits. Promoters are DNA sequences located 'upstream', or ahead, of the DNA sequences encoding genes. Promoters provide binding sites for [http://en.wikipedia.org/wiki/Transcription_factors transcription factors], small proteins that control how and whether DNA is transcribed. Transcription factors bind to promoters in order to give [http://en.wikipedia.org/wiki/RNA_polymerase RNA polymerase] a place to bind to, so that the genes can be transcribed. RNA polymerase binds to DNA and transcribes complimentary RNA from the DNA sequence so that proteins can be formed from the DNA code. If a promoter is being repressed, then transcription cannot occur, as RNA polymerase will not have a place to bind.<br />
<br />
Reporters are not as specific as promoters; they are genes that convey some easily-identifiable and measurable characteristic when they are transcribed, such as fluorescence or beta-galactoside proteins. Reporters are generally attached to other gene sequences so the scientist has a way of knowing if the gene is being transcribed - if the reporter is being transcribed, one can assume that the gene of interest is being transcribed as well.<br />
<br />
== Synthetic, Artificial, and Mutated Promoters and Reporters ==<br />
<br />
[http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Directed evolution] is often used to mutate promoters or reporters in order to obtain desirable attributes. Directed evolution of a gene or protein sequence generally mutates or scrambles the sequence in question, screens it for a certain mutation (any cell not displaying the desirable phenotype is removed), and then amplifies the surviving cells so that the process can begin again. Many mutation and screening cycles can be performed, producing DNA sequences far removed from the original DNA code and increasing the likelyhood that a mutant sequence or cell will have desirable properties. <br />
<br />
Another method is the synthesis of combinatorial promoters, as demonstrated in [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Cox, Surette and Elowitz (2007)]. In their experiment, Elowitz et al designed modular sequence units corresponding to the three coding segments of a promoter gene. These segments, assembled at random, can create a diverse and new promoter library made up of fragments of existing promoters, even promoters that are unrelated. See Figure 1 for a diagram of combinatorial promoter synthesis.<br />
<br />
In addition, promoters can be specifically synthesized based on the structure of an existing promoter, as in Jensen and Hammer (1997). In order to construct a series of synthetic promoters similar to the ''L. Lactis'' promoter, Jensen and Hammer observed consensus sequences within existing ''L. Lactis.'' mutants, or sequences that were found to be similar in all or most mutants, no matter how their activity rate varied. For example, the Pribnow box, consisting of the -10 sequence TATAAT and the -35 sequence TTGACA, was consistent in many prokaryotic promoters; other sequences, such as the TG sequence one base pair upstream from the -10 sequence, are more specific to ''L. Lactis''. In order to generate a promoter library, Jensen and Hammer constructed oligonucleotides for the sequences that were common in ''L. Lactis'' promoters. These oligonucleotides were then seperated by spacers of random sequences; promoters with different spacer sequences made up the promoter library. See Figure 2 for an illustration of the process. <br />
<br />
=== Why use synthetic/mutated promoters and reporters? ===<br />
Since much of synthetic biology is based on modeling genetic and molecular mechanisms before they are built, a scientist has to be able to predict how the components of a mechanism or gene circuit will work in order to predict how the whole mechanism will work. Because they have been specifically designed and selected for, synthetic promoters and reporters make gene circuit modeling much easier.<br />
<br />
[http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Rosenfeld, Young, Alon, Swain, and Elowitz (2007)] have demonstrated that the behavior of a gene circuit can be accurately modeled based on its promoter and repressor activity, but note that in order to accurately construct their model, they needed a specific promoter and repressor gene that followed a certain pattern of behavior (specifically, a negative regulatory circuit, in which a repressor regulates its own expression, as that circuit is the simplest to model).<br />
<br />
Of course, the noise and randomness inherent in cellular interactions mean that no promoter or reporter's activity can be perfectly predicted.<br />
<br />
Also, synthetic promoters and reporters are useful for when a wild-type promoter or reporter is not sufficient or lacks some property necessary for a cellular mechanism to work. For example, a reporter protein such as GFP does not degrade as soon as it is produced, so in any mechanism that has to detect a transient signal, GFP would not be a useful reporter. However, a mutated GFP, which degrades faster or in the presence of a certain compound, would negate this effect. The same principle applies for reporters which are more active at lower-than-normal or higher-than-normal temperatures. See [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Patterson GH et al (1997)].<br />
<br />
== Measuring, Testing, Tuning, and Modeling Promoters and Reporters ==<br />
<br />
*Protein Degradation Modeling - as with GFP in the Lindow Paper.<br />
<br />
*Tuning - The use of random mutations or combined promoters to increase a promoter's sensitivity to a stimulus. <br />
<br />
*Cooperativity in promoters<br />
<br />
<br />
<br />
<br />
== Figures ==<br />
[[Image:Msb4100187-f1.jpg]]<br />
Figure 1. <small>Random assembly ligation generates a diverse promoter library. Promoters can be assembled out of modular sequence units. (A) The assembled sequence of an example promoter. The 5' overhangs of each unit are shown in red. The RNA polymerase boxes (-10 and -35) are highlighted in yellow, and the predicted start site of transcription (+1) is capitalized. Operator colors are consistent throughout the figure. (B) Steps in promoter assembly and ligation into the luciferase reporter vector: promoters are assembled by mixed ligations using 1-bp or 2-bp cohesive ends, and then ligated into a luciferase reporter plasmid. (C) Luminescence measurements in 16 inducer conditions ( each of four inducers, as indicated) for the promoter shown in (A). The output levels determine promoter logic. Note that this promoter does not respond to LuxR regulation at the distal region. (D) The 48 unique units used in the library contain operators responsive to the four TFs (indicated by color) in the regions distal, core, and proximal. </small> [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html]<br />
<br />
[[Am0180933001.gif]]<br />
Figure 2. <small>Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative promoter activities. (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the strategies used for cloning pCP1 through pCP29 and pCP30 through pCP46, respectively. (c) Restriction map of clones pCP1 through pCP29. (d) Restriction map of clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. BI, BamHI; AII, AflII; Ss, SspI; N, NsiI (PstI compatible); Nr, NruI; Sc, ScaI; HII, HincII; P, PstI; PII, PvuII; E, EcoRI; Sa, SacI; Xh, XhoI; BII, BglII; Sm, SmaI; Xb, XbaI (not drawn to scale).</small> In Jensen and Hammer 1997. Permission Pending.<br />
<br />
== Works Cited ==<br />
* Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, and Netravali I (2003). Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2 (1). Epub 2004 November 02. [http://www.springerlink.com/content/h885l73711912672/ Abstract]<br />
<br />
*Arnold FH (1997). Design by Directed Evolution. ''Acc. Chem. Res.,''31 (3). Epub 1998 February 28. [http://pubs.acs.org/cgi-bin/article.cgi/achre4/1998/31/i03/html/ar960017f.html Full Text]<br />
<br />
*Cox III, RS, Surette MG & Elowitz MB (2007). Programming gene expression with combinatorial promoters. ''Molecular Systems Biology''3(145). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100187.html Full Text]<br />
<br />
*Rosenfeld N, Young JW, Alon U, Swain PS, and Elowitz MB (2007). Accurate prediction of gene feedback circuit behavior from component properties. ''Molecular Systems Biology''3(143). Epub 2007 November 13. [http://www.nature.com/msb/journal/v3/n1/full/msb4100185.html Full Text]<br />
<br />
*Patterson GH, Knobel SM, Sharif WD, Kain SR, and Piston DW (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. ''Biophysical Journal'' 73. Epub 1998. [http://www.biophysj.org/cgi/content/abstract/73/5/2782 Abstract]<br />
<br />
*Leveau, JHJ and Lindow, SE (2001). Predictive and interpretive simulation of green fluorescent protein expression in reporter bacteria. ''Journal of Bacteriology''183(23). Epub 2001 September. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=95514 Full text]<br />
<br />
*Miller WG, Brandl MT, Quinones B, and Lindow SE (2001). Biological sensor for sucrose availability: relative sensitivities of various reporter genes. ''Applied Environmental Microbiology''67(3).</div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=File:Am0180933001.gif&diff=4202File:Am0180933001.gif2007-12-06T18:25:22Z<p>Lavoss: </p>
<hr />
<div></div>Lavosshttps://gcat.davidson.edu/GcatWiki/index.php?title=File:Am0180933001_-_bio_image_1.jpg&diff=4200File:Am0180933001 - bio image 1.jpg2007-12-06T18:24:32Z<p>Lavoss: </p>
<hr />
<div></div>Lavoss