February 23, 2016
Classwork
Coding Notes:
- Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
- We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
myMeans <- apply(as.matrix(myCountData), 1, mean) toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]
- Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
- Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!
CHECK KATHRYN'S RSTUDIO NOTES.
Gene Search:
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.
Remember to keep in mind the scientific goal: find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.
From there, we will see where our work takes us and hunt down the genes that we find.
Supervised Clustering Attempts:
- 394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
- Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)
After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.
The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)
Questions to Consider:
- Should we set a more strict threshold?
