Difference between revisions of "Genome Assembly Project: Leland Taylor '12"
From GcatWiki
(→{{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}}) |
(→{{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}}) |
||
Line 1: | Line 1: | ||
== {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} == | == {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} == | ||
'''Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).''' | '''Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).''' | ||
− | *Use De Brujin graphs to estimate "completeness" of genomes | + | ''Notes'' |
+ | *Use De Brujin graphs to estimate "completeness" of genomes assembled via ''de novo'' assembly | ||
**Find Eulerian path(s) in these graphs | **Find Eulerian path(s) in these graphs | ||
**Note the assumptions made in the paper | **Note the assumptions made in the paper | ||
Line 9: | Line 10: | ||
**A higher N50 score usually correlates to a more "correct" genome | **A higher N50 score usually correlates to a more "correct" genome | ||
*Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads). | *Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads). | ||
+ | |||
+ | ''Thoughts'' | ||
+ | *Look for assembler that uses De Brujin graph? | ||
+ | *This paper showed how to get an upper limit of correctness of genome. Compare several existing ''de novo'' assemblers using the methods here as comparison. | ||
+ | *Is it possible to get the code used in this project? | ||
'''Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).''' | '''Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).''' |
Revision as of 14:28, 23 May 2011
November 21 2024
Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010). Notes
- Use De Brujin graphs to estimate "completeness" of genomes assembled via de novo assembly
- Find Eulerian path(s) in these graphs
- Note the assumptions made in the paper
- Lists compression techniques and the order to employ them
- Can use this method to compute N50
- N50 = the length of the largest contig (m) such that at least 50% of genome covered by contigs of size >= m.
- A higher N50 score usually correlates to a more "correct" genome
- Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).
Thoughts
- Look for assembler that uses De Brujin graph?
- This paper showed how to get an upper limit of correctness of genome. Compare several existing de novo assemblers using the methods here as comparison.
- Is it possible to get the code used in this project?
Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).