February 23, 2016
I think we are doing R-studio Check katherine's 2/16 notes. Get everything up to date by Spring Break.
"expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
Strategic clustering. #filtering to keep only those genes who mean expression is >10, play wiht this value to see how the size of toSearch changes. nothing is special about the number 10 myMeans <- apply(as.matrix(myCountData), 1, mean) toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]
we can differentiate between fed and non-fed with all of our knowledge. We need help with the geneontology search- search for function instead of just a transcription factor.
Keep in mind scientific goal: find genes that are differentially expressed between fed and non-fed and try to find candidates for genes that are at the beginning of the cascade.
we can coordinate with other organs group and see if we all converge on same gene from different approaches.
See where our work takes us, and then hunt down/research the genes we find.
Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience
look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
GENES I TRIED TODAY USING SUPERVISED CLUSTERING: (what about correlation cluster (Feb 18 syllabus) correlation- just give me what is similar to this in fed vs. non-fed 394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
Forkhead box proteins are a family of transcription factors. Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens
it seems like our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and it has appeared as an outlier in each cluster. Heat maps didn't show anything super significant so I did not include them.
I think we need to set a more strict threshold?