JP Jan 21 16
Looking at reports downloaded 1-19-16
split_1no_i.fastq, etc No = not fed i = intestine
Left: green check = good; orange ! = suspect; red X = something wrong.
Per base sequence quality
40 = perfect score for each base. Unsure bases get lower scores.
>= 20 is good.
Per tile sequence quality
cDNA all sequenced at once. Positions on the chip where sequences were read.
Per sequence quality scores
A few reads below 30, but more sequences had quality around 38.
Per base sequence content
First few bases are the bar codes.
Each set of sequences has its own bar code (ex 1_i = AGG, 2_i = CGG)
The bar codes aren't a part of the RNA sequence; they need to be removed from any analysis.
Trim off the first 4; if any are scored below 15, they will be thrown away; sequences that are less than 30bp left are thrown out of analysis.
Per sequence GC content
intestines' content all closely match the theoretical distribution. Liver has multiple peaks of distribution. Maybe this is biological and not data error.
Per base N content
Almost 0 n; program was able to determine bases.
Sequence Length Distribution
About 76 bp
Sequence Duplication Levels
Unclear how to translate this. Deduplicated sequence? Almost all the reads are single copy (1 - 85). Might change when we delete bar codes.
Overrepresented sequences
Some samples have more than others.
- want to blast them ourselves
Kmer Content
Repeat units at very early positions; could go to zero when bar codes are eliminated.