DM Notes 2.02.16

From GcatWiki
Jump to: navigation, search

Last class:

Intestine group: found ortholog from another animal. Found an uptake. 80-90% were mitochondrial. Effort to clean up with PolyA purification didn't clean everything up. Liver group: found anti-hemorrhagic factor. Also overexposed in one of the unfed snakes. Lots of mitochondrial DNA.

Group that was looking at online resources that Todd Castoe had deposited. They figured out how to download one run, but not everything. Figured out how to search a sequence (query) within the sequences they had. No magic list of ~2000 genes that are differentially expressed. They have been requested to make a link to...

In class:

Looking at the geneResult folder in Bio343 folder in Excel. 25000 rows (each row for a gene). Question is, are there any transcripts for that gene. Need to normalize to how many genes we have, and how long each gene is. Why- if gene is long and it's randomly fragmented, there'll be overrepresentation for that gene. So, we're looking at the FPKM. Ran DESeq to get those numbers. No difference between gene_id and transcript_id(s) columns. DESeq is software package we used to normalize expression for all 12 samples, and it gave us normalized for length of gene per million reads. By downloading DESeq data, we don't have to go back to the main frame.

Python_analysis_LH_MC.R is the R code we'll use today. Follow instructions in Tues_Feb_2 file. Lines 2-5: installing packages, necessary set up. Only have to do this portion once. Output: The downloaded source packages are in ‘/private/var/folders/2l/0r9ptyrd53dc2vd9_m854jvh_c316v/T/RtmpPUEgdO/downloaded_packages’ Line 13: change macampbell to dymaghini Line 17: headcount: for each gene name, gives normalized number. Look at heat met: Fed and non fed are not necessarily clustered. Clustering was done based off of every single gene, which may not be the best way to cluster the samples.

Challenge time: For the intestine, copy and paste any commands we need to line 55. Generate the next graph (compare fed to not fed). Line 14: have to change "geneResult" to "geneResultIntestine" (new folder) so that it's counting six files rather than all 12. Saving all of this gunk to deOut_intestine_fedvsintestnine_no.Rdata. Some other notes: change myQvals to pull from pal, not padj. Change file you're saving to, and change that name in the middle of myQvals variable assignment. In heat map.mark, change each=3 rather than 6 (each category has 3).