Difference between revisions of "Jared"

From GcatWiki
Jump to: navigation, search
(Read Length)
(Read Length)
Line 29: Line 29:
 
'''The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together.  There are many repetitive regions of the genome. With only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive fragments. While the error rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2652199/]. The high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a time) with no error-fixing machinery found in a normal cell. Long reads are difficult because DNA molecules in the clusters can get out of sync [http://www.genomesunzipped.org/2010/09/basics-second-generation-sequencing.php]. Paired end reads of circularized DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454 Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired ends [http://www.454.com/products-solutions/experimental-design-options/multi-span-paired-end-reads.asp]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in ''de novo'' fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [http://www.cseweb.ucsd.edu/~dbrinza/cv/pub/preprint.gr.079053.108.pdf], [http://www.cbcb.umd.edu/~salzberg/docs/AMOScmp-reprint.pdf]. The top two sets of ''de novo'' assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [http://www.genomeweb.com//node/959135?hq_e=el&hq_m=904523&hq_l=1&hq_v=239c994d88].
 
'''The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together.  There are many repetitive regions of the genome. With only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive fragments. While the error rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2652199/]. The high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a time) with no error-fixing machinery found in a normal cell. Long reads are difficult because DNA molecules in the clusters can get out of sync [http://www.genomesunzipped.org/2010/09/basics-second-generation-sequencing.php]. Paired end reads of circularized DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454 Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired ends [http://www.454.com/products-solutions/experimental-design-options/multi-span-paired-end-reads.asp]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in ''de novo'' fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [http://www.cseweb.ucsd.edu/~dbrinza/cv/pub/preprint.gr.079053.108.pdf], [http://www.cbcb.umd.edu/~salzberg/docs/AMOScmp-reprint.pdf]. The top two sets of ''de novo'' assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [http://www.genomeweb.com//node/959135?hq_e=el&hq_m=904523&hq_l=1&hq_v=239c994d88].
  
'''The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the ''Vaccinium corymbosum'' genome. Longer reads produced by this system facilitate ''de novo'' assembly. Curiously, Illumina advocates the use of the Genome Analyzer for ''de novo'' sequencing. Illumina points to over 100 bp reads achieved by some researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn graph-based assembly program [http://www.illumina.com/Documents/products/.../technote_denovo_assembly.pdf].  
+
'''The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the ''Vaccinium corymbosum'' genome. Longer reads produced by this system facilitate ''de novo'' assembly. They then probably used an Illumina system to resequence and examine the quality of the assembly. This resequencing also increases coverage. The authors of "The genome of woodland strawberry (Fragaria vesca)" in Nature Genetics used a combination of Roche/454, Illumina/Solexa, and Life technologies/SOLiD to sequence and resequence [http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.740.html]. Curiously, Illumina advocates the use of the Genome Analyzer for ''de novo'' sequencing. Illumina points to over 100 bp reads achieved by some researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn graph-based assembly program [http://www.illumina.com/Documents/products/.../technote_denovo_assembly.pdf].  
  
 
<center>
 
<center>

Revision as of 19:04, 19 January 2011

454 Sequencing

454 instruments are pyrosequencers that carry out many reactions at a time (parallel sequencing) in wells of a PicoTiter Plate. Beads coated with thousands of homogeneous DNA fragments are added to individual wells on the plate. The DNA fragments are amplified in an oil emulsion mixture with DNA polymerase and primers[1]. dNTPs are sequentially added to the wells one at a time and washed. The process of continuous washing and the sequencial addition of dNTPs, DNA polymerase, luciferase, and ATP-sulfurylase explains the high reagent costs of sequencing. ATP-sulfurylase converts the PPi released from each dNTP addition to the complementary strand of the original ssDNA to ATP. ATP fuels luciferase in each well [2]. The light produced is detected with a flourescence microscope [3]. The current (2009) 454 FLX system has the ability to sequence 100 Mb DNA in 8 hours with an average read of 250 bp and raw accuracy of 99.5% [4].

Pyro.jpg Pico.jpg


(image from [5]) (image from [6])

Illumina Sequencing

Illumina instruments amplify DNA fragments in situ on a flow cell. Fragment colonies are dispersed on the flow cell with lanes at a low concentration at first, allowing for non-overlapping fragment colonies. Clusters are promoted by isothermal bridging amplification [7]. The amplification of DNA using universal primers covalently bonded to the surface of the flow cell produces 500-1000 clonal copies of the DNA fragments [8]. Fluorescently labeled nucleotides are cyclically washed over the flow cell. These nucleotides are conjugated with reversible terminators so that the four nucleotide bases can be simultaneously incorporated base by base across the flow cell. Laser induced excitation of the cell allows imaging of the excited flourophores [9]. Before the next cycle, tris(2-carboxyethyl)pho-sphine (TCEP) is added to knock off the flourescent dye and side chain (reversible terminator) and bring back the 3' hydroxyl group, allowing for further After excitation The use of a flow cell and reversible terminator allows the Illumina Genome Analyzer to produce 600 Mb of DNA per day with only 36 bp reads. The trade-off between pyrosequencing methods and the flow cell method is increased throughput for shorter reads. The raw accuracy of the Illumina genome analyzer is over 98.5%. Increased coverage is necessary when using sequencers with high raw error rates [10].

Illumina4.jpg Genome analyzer iix.jpg


(image from [11]) (image from [12])

Read Length

The issue with short reads (20-40 nt fragments sequenced at a time) is in assembly. We must use algorithms to find overlapping sections of fragments, then piece these fragments together. There are many repetitive regions of the genome. With only 20-40 nt fragments we may have a hard time finding overlapping regions and determining the correct linear chromosomal location of repetitive fragments. While the error rate of sequencing is only 2 percent for the first 30 nucleotides at the head of reads using Illumina technology, the error rate quickly increases to 20 percent at the tails of reads at 50 nucleotides [13]. The high error rate results from the incorporation of wrong bases by DNA polymerase (all four present at a time) with no error-fixing machinery found in a normal cell. Long reads are difficult because DNA molecules in the clusters can get out of sync [14]. Paired end reads of circularized DNA of adapted kilobase fragments can be used to link repetitive segments to general location. The 454 Titanium FLX instrument can perform extra long reads of 400-600 bps to eliminate the need for paired ends [15]. Sequencing with short reads is generally cheaper and offers higher throughput, but, problems arise in de novo fragment assembly. Assembly from short read data is usually accomplished with the help of a reference genome [16], [17]. The top two sets of de novo assembly algorithms are Beijing Genomics Institute's SOAPdenovo and the Broad Institute's ALLPATHS-LG [18].

The good people from NCSU and the David H. Murdock Institute probably used the 454 FLX system to sequence the Vaccinium corymbosum genome. Longer reads produced by this system facilitate de novo assembly. They then probably used an Illumina system to resequence and examine the quality of the assembly. This resequencing also increases coverage. The authors of "The genome of woodland strawberry (Fragaria vesca)" in Nature Genetics used a combination of Roche/454, Illumina/Solexa, and Life technologies/SOLiD to sequence and resequence [19]. Curiously, Illumina advocates the use of the Genome Analyzer for de novo sequencing. Illumina points to over 100 bp reads achieved by some researchers. I question the accuracy of these reads. The company promotes the use of Velvet, a Bruijn graph-based assembly program [20].

Chart.jpg


(image from [21])

Paireden.gif


(image from [22])

On another note, love the blog Genomes Unzipped. Check out Mr. Vorhaus. He works on Tryon in Charlotte. Good connection to have if you are interested in genomics as a career. He is involved in the 1000 Genomes Project. Perhaps a good guest speaker? Not a scientist though.