JP Jan 21 16

From GcatWiki
Revision as of 19:04, 21 January 2016 by Jupreziosi (talk | contribs) (Created page with "'''Looking at reports downloaded 1-19-16''' split_1no_i.fastq, etc No = not fed i = intestine Left: green check = good; orange ! = suspect; red X = something wrong. Per ba...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Looking at reports downloaded 1-19-16

split_1no_i.fastq, etc No = not fed i = intestine

Left: green check = good; orange ! = suspect; red X = something wrong.


Per base sequence quality 40 = perfect score for each base. Unsure bases get lower scores. >= 20 is good.


Per tile sequence quality cDNA all sequenced at once. Positions on the chip where sequences were read.


Per sequence quality scores A few reads below 30, but more sequences had quality around 38.


Per base sequence content First few bases are the bar codes. Each set of sequences has its own bar code (ex 1_i = AGG, 2_i = CGG) The bar codes aren't a part of the RNA sequence; they need to be removed from any analysis. Trim off the first 4; if any are scored below 15, they will be thrown away; sequences that are less than 30bp left are thrown out of analysis.


Per sequence GC content intestines' content all closely match the theoretical distribution. Liver has multiple peaks of distribution. Maybe this is biological and not data error.


Per base N content Almost 0 n; program was able to determine bases.


Sequence Length Distribution About 76 bp


Sequence Duplication Levels Unclear how to translate this. Deduplicated sequence? Almost all the reads are single copy (1 - 85). Might change when we delete bar codes.


Overrepresented sequences Some samples have more than others.

  • want to blast them ourselves


Kmer Content Repeat units at very early positions; could go to zero when bar codes are eliminated.