Difference between revisions of "Jared"

From GcatWiki
Jump to: navigation, search
(Read Length)
(Read Length)
Line 21: Line 21:
 
'''The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together.  There are many repetitive regions of the genome. With only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive fragments. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in ''de novo'' fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [http://www.cseweb.ucsd.edu/~dbrinza/cv/pub/preprint.gr.079053.108.pdf]. The top two sets of ''de novo'' assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [http://www.genomeweb.com//node/959135?hq_e=el&hq_m=904523&hq_l=1&hq_v=239c994d88].
 
'''The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together.  There are many repetitive regions of the genome. With only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive fragments. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in ''de novo'' fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [http://www.cseweb.ucsd.edu/~dbrinza/cv/pub/preprint.gr.079053.108.pdf]. The top two sets of ''de novo'' assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [http://www.genomeweb.com//node/959135?hq_e=el&hq_m=904523&hq_l=1&hq_v=239c994d88].
  
'''The good people from NCSU and the David H. Murdock institute probably used the 454 FLX system to sequence the ''Vaccinium corymbosum'' genome. Longer reads produced by this system facilitates ''de novo'' assembly.
+
'''The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the ''Vaccinium corymbosum'' genome. Longer reads produced by this system facilitates ''de novo'' assembly.

Revision as of 18:00, 17 January 2011

454 Sequencing

454 instruments are pyrosequencers that carry out many reactions at a time (parallel sequencing) in wells of a PicoTiter Plate. Beads coated with thousands of homogeneous DNA fragments are added to individual wells on the plate. The DNA fragments are amplified in an oil emulsion mixture with DNA polymerase and primers[1]. dNTPs are sequentially added to the wells one at a time and washed. The process of continuous washing and the sequencial addition of dNTPs, DNA polymerase, luciferase, and ATP-sulfurylase explains the high reagent costs of sequencing. ATP-sulfurylase converts the PPi released from each dNTP addition to the complementary strand of the original ssDNA to ATP. ATP fuels luciferase in each well [2]. The light produced is detected with a flourescence microscope [3]. The current (2009) 454 FLX system has the ability to sequence 100 Mb DNA in 8 hours with an average read of 250 bp and raw accuracy of 99.5%. [4]

Pyro.jpg(image from [5])

Illumina sequencing

Illumina instruments amplify DNA fragments in situ on a flow cell. Fragment colonies are dispersed on the flow cell at a low concentration at first, allowing for non-overlapping fragment colonies. Clusters are promoted by isothermal bridging amplification [6]. The amplification increases the density of these colonies. Fluorescently labeled nucleotides are cyclically washed over the flow cell. These nucleotides are conjugated with reversible terminators so that the four nucleotide bases can be simultaneously incorporated base by base across the flow cell. Laser induced excitation of the cell allows imaging of the excited flourophores [7]. The use of a flow cell and reversible terminator allows the Illumina Genome Analyzer to produce 600 Mb of DNA per day with only 36 bp reads. The tradeoff between pyrosequencing methods and the flow cell method is increased throughput for shorter reads. The raw accuracy of the Illumina genome analyzer is over 98.5%. Increased coverage is necessary when using sequencers with high raw error rates [8].

Chart.jpg


(image from [9])

Read Length

The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together. There are many repetitive regions of the genome. With only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive fragments. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in de novo fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [10]. The top two sets of de novo assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [11].

The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the Vaccinium corymbosum genome. Longer reads produced by this system facilitates de novo assembly.