Genome Assembly Project: Leland Taylor '12

From GcatWiki
Revision as of 15:22, 23 May 2011 by Letaylor (talk | contribs)
Jump to: navigation, search

Useful Links - phage database. Assembled versions of the raw files we have are located here - UMD bioinformatics center. Good open source programs. Also includes AMOS - a good list of assembly programs


hybrid de novo assembly

Big Questions

De novo or Reference based assembly?

July 2 2024

Kingsford, C., Schatz, M.C. & Pop, M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11, 21 (2010).


  • Use De Brujin graphs to estimate "completeness" of genomes assembled via de novo assembly
  • Lists compression techniques and the order to employ them
  • Can use this method to compute N50
    • N50 = the length of the largest contig (m) such that at least 50% of genome covered by contigs of size >= m.
    • A higher N50 score usually correlates to a more "correct" genome
  • Regardless of correctness of genome, for nearly all read sizes (1000nt > size > 25nt), 85%+ of genes accurately identified (85% is for 25nt reads).


  • Look for assembler that uses De Brujin graph?
  • This paper showed how to get an upper limit of correctness of genome. Compare several existing de novo assemblers using the methods here as comparison.
  • Is it possible to get the code used in this project?

Pop, M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10, 354-366 (2009).



Basic Timeline

  • 1st – 2nd Week
    • Learn how to manipulate and handle raw read files.
    • Familiarize myself with key sources listed above.
    • Write module to calculate fold coverage using genome size estimate and total size of all reads.
    • Write a prioritized list of features and goals for my program.
  • 3rd – 6th week
    • Develop my program in modules according to the prioritized features.
    • Compare my program’s genome to previously assembled genomes from this raw data.
    • Quantify the accuracy of my genome by testing for the size of a predicted gap or feature in the genome to size of that actual segment of DNA in the blueberry genome.
    • Edit the program based on any issues encountered with the full data set.
  • 7th – 10th week (Ending: July 29, 2011)
    • Finish wet-lab accuracy tests
    • Fine–tune the program based on any issues encountered with the full data set.
    • Attempt to assemble the “Meatball” phage genome.