Difference between revisions of "Jared"
(→Single Molecule Real Time Sequencing) |
(→Illumina Sequencing) |
||
(10 intermediate revisions by 2 users not shown) | |||
Line 15: | Line 15: | ||
</center> | </center> | ||
+ | <center> | ||
{{#ev:youtube|kYAGFrbGl6E}} | {{#ev:youtube|kYAGFrbGl6E}} | ||
+ | </center> | ||
== Illumina Sequencing == | == Illumina Sequencing == | ||
Line 21: | Line 23: | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
+ | <center> | ||
{{#ev:youtube|77r5p8IBwJk}} | {{#ev:youtube|77r5p8IBwJk}} | ||
+ | </center> | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
Line 34: | Line 38: | ||
== Read Length == | == Read Length == | ||
− | The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together. There are many repetitive regions of the genome. Using only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive segments. While the error rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2652199/]. The high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a time) with no error-fixing machinery found in a normal cell. Long reads are difficult because DNA molecules in the clusters can get out of sync [http://www.genomesunzipped.org/2010/09/basics-second-generation-sequencing.php]. Paired end reads of circularized DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454 Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired ends [http://www.454.com/products-solutions/experimental-design-options/multi-span-paired-end-reads.asp]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in ''de novo'' fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [http://www.cseweb.ucsd.edu/~dbrinza/cv/pub/preprint.gr.079053.108.pdf], [http://www.cbcb.umd.edu/~salzberg/docs/AMOScmp-reprint.pdf]. The top two sets of ''de novo'' assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [http://www.genomeweb.com//node/959135?hq_e=el&hq_m=904523&hq_l=1&hq_v=239c994d88]. | + | The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together. There are many repetitive regions of the genome. Using only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive segments. While the error rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2652199/]. The high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a time) with no error-fixing machinery found in a normal cell. Long reads are difficult because replicating, homogeneous DNA molecules in the clusters can get out of sync [http://www.genomesunzipped.org/2010/09/basics-second-generation-sequencing.php]. Paired end reads of circularized DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454 Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired ends [http://www.454.com/products-solutions/experimental-design-options/multi-span-paired-end-reads.asp]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in ''de novo'' fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [http://www.cseweb.ucsd.edu/~dbrinza/cv/pub/preprint.gr.079053.108.pdf], [http://www.cbcb.umd.edu/~salzberg/docs/AMOScmp-reprint.pdf]. The top two sets of ''de novo'' assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [http://www.genomeweb.com//node/959135?hq_e=el&hq_m=904523&hq_l=1&hq_v=239c994d88]. |
The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the ''Vaccinium corymbosum'' genome ''de novo''. Longer reads produced by this system facilitate ''de novo'' assembly. They probably used an Illumina system to resequence and examine the quality of the assembly. This resequencing also increases coverage. The authors of "The genome of woodland strawberry (Fragaria vesca)" in ''Nature Genetics'' used a combination of Roche/454, Illumina/Solexa, and Life technologies/SOLiD to sequence and resequence [http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.740.html]. Curiously, Illumina advocates the use of the Genome Analyzer for ''de novo'' sequencing. Illumina points to over 100 bp reads achieved by some researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn graph-based assembly program [http://www.illumina.com/Documents/products/.../technote_denovo_assembly.pdf]. | The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the ''Vaccinium corymbosum'' genome ''de novo''. Longer reads produced by this system facilitate ''de novo'' assembly. They probably used an Illumina system to resequence and examine the quality of the assembly. This resequencing also increases coverage. The authors of "The genome of woodland strawberry (Fragaria vesca)" in ''Nature Genetics'' used a combination of Roche/454, Illumina/Solexa, and Life technologies/SOLiD to sequence and resequence [http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.740.html]. Curiously, Illumina advocates the use of the Genome Analyzer for ''de novo'' sequencing. Illumina points to over 100 bp reads achieved by some researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn graph-based assembly program [http://www.illumina.com/Documents/products/.../technote_denovo_assembly.pdf]. | ||
Line 58: | Line 62: | ||
No clustering or washing is necessary, so long, cheap reads are possible. DNA libraries are incorporated into SMRTbell constructs of ligated circularized DNA bound to a single DNA polymerase molecule [http://www.nejm.org/doi/full/10.1056/NEJMoa1012928]. A laser is shined through the glass covering the zero-mode wavelength (ZMW) that excites only the 30nm bottom of the ZMW where the polymerase is actively incorporating fluorescently labeled nucleotides. Interference from non-incorporated nucleotides in the micropore is minimized. Accuracy of sequencing is still a major issue. | No clustering or washing is necessary, so long, cheap reads are possible. DNA libraries are incorporated into SMRTbell constructs of ligated circularized DNA bound to a single DNA polymerase molecule [http://www.nejm.org/doi/full/10.1056/NEJMoa1012928]. A laser is shined through the glass covering the zero-mode wavelength (ZMW) that excites only the 30nm bottom of the ZMW where the polymerase is actively incorporating fluorescently labeled nucleotides. Interference from non-incorporated nucleotides in the micropore is minimized. Accuracy of sequencing is still a major issue. | ||
− | A team at Harvard used PacBio's sequencing methods to sequence five strains of ''Vibrio colerae'' at | + | A team at Harvard used PacBio's sequencing methods to sequence five strains of ''Vibrio colerae'' at varying coverages (in between 20X and 60X per strain) over two days. That translates to approximately 368Mb/day throughput (of two strains). That throughput is on par with or worse than 454's throughput. Authors of the paper do not indicate cost of sequencing, assembly time, nor raw error rate. By matching PacBio raw reads to the ''Vibrio colerae'' reference genome, Dr. Elemento determined a raw error of 20 percent [http://oelemento.wordpress.com/2011/01/03/a-closer-look-at-the-first-pacbio-sequence-dataset/]. The average read length was more impressive: 954 bp. No paired end, no reference genome necessary for assembly [http://www.nejm.org/doi/full/10.1056/NEJMoa1012928]. With throughput of only 5.3Mb/30 minute run, the single molecule method is still toddling around. We will see how this technology progresses. |
<center> | <center> | ||
[[Image:smrt_Technology.png]] | [[Image:smrt_Technology.png]] | ||
+ | <br> | ||
(image from [http://www.pacificbiosciences.com/smrt-biology/smrt-technology]) | (image from [http://www.pacificbiosciences.com/smrt-biology/smrt-technology]) | ||
</center> | </center> |
Latest revision as of 01:09, 23 January 2011
Contents
454 Sequencing
454 instruments are pyrosequencers that carry out many reactions at a time (parallel sequencing) in wells of a PicoTiter Plate. Beads coated with thousands of homogeneous DNA fragments are added to individual wells on the plate. The DNA fragments are amplified in an oil emulsion mixture with DNA polymerase and primers[1]. dNTPs are sequentially added to the wells one at a time and washed. The process of continuous washing and the sequencial addition of dNTPs, DNA polymerase, luciferase, and ATP-sulfurylase explains the high reagent costs of sequencing. ATP-sulfurylase converts the PPi released from each dNTP addition to the complementary strand of the original ssDNA to ATP. ATP fuels luciferase in each well [2]. The light produced is detected with a flourescence microscope [3]. The current (2009) 454 FLX system has the ability to sequence 100 Mb DNA in 8 hours with an average read of 250 bp and raw accuracy of 99.5% [4].
{{#ev:youtube|kYAGFrbGl6E}}
Illumina Sequencing
Illumina instruments amplify DNA fragments in situ on a flow cell. Fragment colonies are dispersed on the flow cell with lanes at a low concentration at first, allowing for non-overlapping fragment colonies. Clusters are promoted by isothermal bridging amplification [7]. The video below illustrates sample preparation and isothermal bridging amplification.
{{#ev:youtube|77r5p8IBwJk}}
The amplification of DNA using universal primers covalently bonded to the surface of the flow cell produces 500-1000 clonal copies of the DNA fragments [8]. Fluorescently labeled nucleotides are cyclically washed over the flow cell. These nucleotides are conjugated with reversible terminators so that the four nucleotide bases can be simultaneously incorporated base by base across the flow cell. Laser induced excitation of the cell allows imaging of the excited flourophores [9]. Before the next cycle, tris(2-carboxyethyl)pho-sphine (TCEP) is added to knock off the flourescent dye and side chain (reversible terminator) and bring back the 3' hydroxyl group, allowing for the next nucleotide addition [10]. The use of a flow cell and reversible terminator allows the Illumina Genome Analyzer to produce 600 Mb of DNA per day with only 36 bp reads. The trade-off between pyrosequencing methods and the flow cell method is increased throughput for shorter reads. The raw accuracy of the Illumina genome analyzer is over 98.5%. Increased coverage is necessary when using sequencers with high raw error rates [11].
Read Length
The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together. There are many repetitive regions of the genome. Using only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive segments. While the error rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [14]. The high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a time) with no error-fixing machinery found in a normal cell. Long reads are difficult because replicating, homogeneous DNA molecules in the clusters can get out of sync [15]. Paired end reads of circularized DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454 Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired ends [16]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in de novo fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [17], [18]. The top two sets of de novo assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [19].
The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the Vaccinium corymbosum genome de novo. Longer reads produced by this system facilitate de novo assembly. They probably used an Illumina system to resequence and examine the quality of the assembly. This resequencing also increases coverage. The authors of "The genome of woodland strawberry (Fragaria vesca)" in Nature Genetics used a combination of Roche/454, Illumina/Solexa, and Life technologies/SOLiD to sequence and resequence [20]. Curiously, Illumina advocates the use of the Genome Analyzer for de novo sequencing. Illumina points to over 100 bp reads achieved by some researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn graph-based assembly program [21].
(image from [22])
(image from [23])
Single Molecule Real Time Sequencing
The future? No PCR? Direct detection of methylated segments? [24] No clustering or washing is necessary, so long, cheap reads are possible. DNA libraries are incorporated into SMRTbell constructs of ligated circularized DNA bound to a single DNA polymerase molecule [25]. A laser is shined through the glass covering the zero-mode wavelength (ZMW) that excites only the 30nm bottom of the ZMW where the polymerase is actively incorporating fluorescently labeled nucleotides. Interference from non-incorporated nucleotides in the micropore is minimized. Accuracy of sequencing is still a major issue.
A team at Harvard used PacBio's sequencing methods to sequence five strains of Vibrio colerae at varying coverages (in between 20X and 60X per strain) over two days. That translates to approximately 368Mb/day throughput (of two strains). That throughput is on par with or worse than 454's throughput. Authors of the paper do not indicate cost of sequencing, assembly time, nor raw error rate. By matching PacBio raw reads to the Vibrio colerae reference genome, Dr. Elemento determined a raw error of 20 percent [26]. The average read length was more impressive: 954 bp. No paired end, no reference genome necessary for assembly [27]. With throughput of only 5.3Mb/30 minute run, the single molecule method is still toddling around. We will see how this technology progresses.
(image from [28])