Difference between revisions of "Genome Assembly Project: Leland Taylor '12"

From GcatWiki
Jump to: navigation, search
({{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}})
Line 1: Line 1:
 
== {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} ==
 
== {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} ==
 
#Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
 
#Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
 +
**Use De Brujin graphs to estimate "completeness" of genomes
 +
***Find Eulerian path(s) in these graphs
 +
***Note the assumptions made in the paper
 +
**Lists compression techniques and the order to employ them
 +
**Can use this method to compute N50
 +
***N50 = the length of the largest contig (m) such that at least 50% of genome covered by contigs of size >= m.
 +
***A higher N50 score usually correlates to a more "correct" genome
 +
**Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).
 +
 
#Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).
 
#Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).

Revision as of 14:23, 23 May 2011

November 21 2024

  1. Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
    • Use De Brujin graphs to estimate "completeness" of genomes
      • Find Eulerian path(s) in these graphs
      • Note the assumptions made in the paper
    • Lists compression techniques and the order to employ them
    • Can use this method to compute N50
      • N50 = the length of the largest contig (m) such that at least 50% of genome covered by contigs of size >= m.
      • A higher N50 score usually correlates to a more "correct" genome
    • Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).
  1. Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).