Difference between revisions of "General Statistical Calculation Tool"
From GcatWiki
(→Features) |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=Features= | =Features= | ||
===Calculate Distribution of Read Lengths=== | ===Calculate Distribution of Read Lengths=== | ||
− | ==Calculate How Many Substrings== | + | ===Calculate How Many Substrings=== |
*http://genometools.org/index.html - for both k-mer calculation tool and substring tool | *http://genometools.org/index.html - for both k-mer calculation tool and substring tool | ||
*[http://adenine.biz.fh-weihenstephan.de/cgi-bin/shustring/shustring.cgi.pl Shustring] (SHortest Unique subSTRING) [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1166540/ paper] | *[http://adenine.biz.fh-weihenstephan.de/cgi-bin/shustring/shustring.cgi.pl Shustring] (SHortest Unique subSTRING) [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1166540/ paper] | ||
*[http://adenine.biz.fh-weihenstephan.de/cgi-bin/shulen/shulen.cgi.pl Shulen] - Program for Computing the Null-Distribution of Shortest Unique Substring Lengths in DNA Sequences | *[http://adenine.biz.fh-weihenstephan.de/cgi-bin/shulen/shulen.cgi.pl Shulen] - Program for Computing the Null-Distribution of Shortest Unique Substring Lengths in DNA Sequences | ||
− | ==Calculate N50== | + | ===Calculate N50=== |
*http://code.google.com/p/biopieces/wiki/calc_N50 - part of biopieces | *http://code.google.com/p/biopieces/wiki/calc_N50 - part of biopieces | ||
*[http://genomics-array.blogspot.com/2011/02/calculating-n50-of-contig-assembly-file.html Perl Script] - and step by step idea | *[http://genomics-array.blogspot.com/2011/02/calculating-n50-of-contig-assembly-file.html Perl Script] - and step by step idea | ||
*[http://seqanswers.com/forums/showthread.php?t=2857 Python code] | *[http://seqanswers.com/forums/showthread.php?t=2857 Python code] | ||
− | ==Calculate GC Content== | + | ===Calculate GC Content=== |
− | ==Calculate k-mer Distributions== | + | ===Calculate k-mer Distributions=== |
*http://genometools.org/index.html - for both k-mer calculation tool and substring tool | *http://genometools.org/index.html - for both k-mer calculation tool and substring tool | ||
*[http://www.cbcb.umd.edu/software/jellyfish/ Jellyfish] - more recent k-mer counting tool | *[http://www.cbcb.umd.edu/software/jellyfish/ Jellyfish] - more recent k-mer counting tool | ||
− | ==Estimate Genome Size== | + | ===Estimate Genome Size=== |
*http://seqanswers.com/forums/showthread.php?t=11434 | *http://seqanswers.com/forums/showthread.php?t=11434 | ||
+ | *http://www.dolphing.com/?p=508 | ||
*[http://gsizepred.sourceforge.net/ GSP] - incorporate the Bayesian framework and EM algorithm for the genome size prediction | *[http://gsizepred.sourceforge.net/ GSP] - incorporate the Bayesian framework and EM algorithm for the genome size prediction | ||
**People skeptical of this method http://seqanswers.com/forums/showthread.php?t=10988 | **People skeptical of this method http://seqanswers.com/forums/showthread.php?t=10988 | ||
*Other Method - http://www.cmb.usc.edu/papers/msw_papers/msw-149.pdf | *Other Method - http://www.cmb.usc.edu/papers/msw_papers/msw-149.pdf | ||
+ | *http://www.nature.com/nature/journal/v463/n7279/extref/nature08696-s1.pdf | ||
+ | |||
+ | *https://banana-slug.soe.ucsc.edu/bioinformatic_tools:quake | ||
=Motivation= | =Motivation= | ||
http://phagesdb.org/sort/ - it looks like genome size and gc% roughly correlates to the phage cluster | http://phagesdb.org/sort/ - it looks like genome size and gc% roughly correlates to the phage cluster |
Latest revision as of 18:34, 9 June 2011
Contents
[hide]Features
Calculate Distribution of Read Lengths
Calculate How Many Substrings
- http://genometools.org/index.html - for both k-mer calculation tool and substring tool
- Shustring (SHortest Unique subSTRING) paper
- Shulen - Program for Computing the Null-Distribution of Shortest Unique Substring Lengths in DNA Sequences
Calculate N50
- http://code.google.com/p/biopieces/wiki/calc_N50 - part of biopieces
- Perl Script - and step by step idea
- Python code
Calculate GC Content
Calculate k-mer Distributions
- http://genometools.org/index.html - for both k-mer calculation tool and substring tool
- Jellyfish - more recent k-mer counting tool
Estimate Genome Size
- http://seqanswers.com/forums/showthread.php?t=11434
- http://www.dolphing.com/?p=508
- GSP - incorporate the Bayesian framework and EM algorithm for the genome size prediction
- People skeptical of this method http://seqanswers.com/forums/showthread.php?t=10988
- Other Method - http://www.cmb.usc.edu/papers/msw_papers/msw-149.pdf
- http://www.nature.com/nature/journal/v463/n7279/extref/nature08696-s1.pdf
Motivation
http://phagesdb.org/sort/ - it looks like genome size and gc% roughly correlates to the phage cluster