Difference between revisions of "Explaining My Project"

From GcatWiki
Jump to: navigation, search
(Created page with 'File:Shotgun.png')
 
(Counting Kmers to Tell you about Genome)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
=Shotgun Sequencing=
 
[[File:Shotgun.png]]
 
[[File:Shotgun.png]]
 +
 +
=Counting Kmers to Tell you about Genome=
 +
(taken from https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish)
 +
[[File:Kmer.png]]
 +
 +
*Bad kmer rate = bad multiplicity kmers/total number of all kmers
 +
*Seq Error Rate = bad kmer Rate/kmer size
 +
*Genome Coverage = use gamma fit on the good multiplicity values of the best kmer (usually largest). The peak of this line gives genome coverage (see red line) (here about 47.11x)
 +
*Genome size = number of unique good multiplicity kmers/coverage
 +
*1st peak (@ low multiplicity) = from seq errors
 +
*2nd peak = multiple copies of the same location in the genome
 +
**If a k-mer occurs n times in the genome, we would expect to see it n times as often in the sequencing, so there should be additional peaks for k-mers that occur in repeats.

Latest revision as of 20:30, 8 June 2011

Shotgun Sequencing

Shotgun.png

Counting Kmers to Tell you about Genome

(taken from https://banana-slug.soe.ucsc.edu/bioinformatic_tools:jellyfish) Kmer.png

  • Bad kmer rate = bad multiplicity kmers/total number of all kmers
  • Seq Error Rate = bad kmer Rate/kmer size
  • Genome Coverage = use gamma fit on the good multiplicity values of the best kmer (usually largest). The peak of this line gives genome coverage (see red line) (here about 47.11x)
  • Genome size = number of unique good multiplicity kmers/coverage
  • 1st peak (@ low multiplicity) = from seq errors
  • 2nd peak = multiple copies of the same location in the genome
    • If a k-mer occurs n times in the genome, we would expect to see it n times as often in the sequencing, so there should be additional peaks for k-mers that occur in repeats.