Difference between revisions of "Genome Assembly Project: Leland Taylor '12"
From GcatWiki
					
										
					
					 (→Useful Links)  | 
				 (→Useful Links)  | 
				||
| Line 1: | Line 1: | ||
== Useful Links ==  | == Useful Links ==  | ||
| − | http://phagesdb.org/  | + | http://phagesdb.org/ - phage database. Assembled versions of the raw files we have are located here  | 
| − | http://www.cbcb.umd.edu/  | + | http://www.cbcb.umd.edu/ - UMD bioinformatics center. Good open source programs. Also includes AMOS  | 
| + | |||
| + | http://seqanswers.com/forums/showthread.php?t=43 - a good list of assembly programs  | ||
== {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} ==  | == {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} ==  | ||
Revision as of 14:59, 23 May 2011
Useful Links
http://phagesdb.org/ - phage database. Assembled versions of the raw files we have are located here
http://www.cbcb.umd.edu/ - UMD bioinformatics center. Good open source programs. Also includes AMOS
http://seqanswers.com/forums/showthread.php?t=43 - a good list of assembly programs
November 4 2025
Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).
Notes
- Use De Brujin graphs to estimate "completeness" of genomes assembled via de novo assembly
- Find Eulerian path(s) in these graphs
 - Note the assumptions made in the paper
 - TOOL: Jellyfish - counts k-mers http://www.cbcb.umd.edu/software/jellyfish/
 
 - Lists compression techniques and the order to employ them
 - Can use this method to compute N50
- N50 = the length of the largest contig (m) such that at least 50% of genome covered by contigs of size >= m.
 - A higher N50 score usually correlates to a more "correct" genome
 
 - Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).
 
Thoughts
- Look for assembler that uses De Brujin graph?
 - This paper showed how to get an upper limit of correctness of genome. Compare several existing de novo assemblers using the methods here as comparison.
 - Is it possible to get the code used in this project?
 
Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).
Notes
Thoughts