Difference between revisions of "Genome Assembly Project: Leland Taylor '12"

From GcatWiki
Jump to: navigation, search
({{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}})
({{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}})
Line 1: Line 1:
 
== {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} ==
 
== {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} ==
 
'''Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).'''
 
'''Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).'''
*Use De Brujin graphs to estimate "completeness" of genomes
+
''Notes''
 +
*Use De Brujin graphs to estimate "completeness" of genomes assembled via ''de novo'' assembly
 
**Find Eulerian path(s) in these graphs
 
**Find Eulerian path(s) in these graphs
 
**Note the assumptions made in the paper
 
**Note the assumptions made in the paper
Line 9: Line 10:
 
**A higher N50 score usually correlates to a more "correct" genome
 
**A higher N50 score usually correlates to a more "correct" genome
 
*Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).
 
*Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).
 +
 +
''Thoughts''
 +
*Look for assembler that uses De Brujin graph?
 +
*This paper showed how to get an upper limit of correctness of genome. Compare several existing ''de novo'' assemblers using the methods here as comparison.
 +
*Is it possible to get the code used in this project?
  
 
'''Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).'''
 
'''Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).'''

Revision as of 14:28, 23 May 2011

November 21 2024

Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010). Notes

  • Use De Brujin graphs to estimate "completeness" of genomes assembled via de novo assembly
    • Find Eulerian path(s) in these graphs
    • Note the assumptions made in the paper
  • Lists compression techniques and the order to employ them
  • Can use this method to compute N50
    • N50 = the length of the largest contig (m) such that at least 50% of genome covered by contigs of size >= m.
    • A higher N50 score usually correlates to a more "correct" genome
  • Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).

Thoughts

  • Look for assembler that uses De Brujin graph?
  • This paper showed how to get an upper limit of correctness of genome. Compare several existing de novo assemblers using the methods here as comparison.
  • Is it possible to get the code used in this project?

Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).