JP Feb 23 16
Heyer's code, most recent http://www.bio.davidson.edu/courses/Bio343/2016/goi_supervised.txt
Set up a way to filter out genes whose data has a lot of 0s. Can supervise clustering when the data set is small enough (11,000 is too big for R to graph). Can find a gene listed in the "toSearch" list = goi (gene of interest). Can set a threshold for correlation to gene of interest.
Between correlation distance and supervised clustering, we can locate genes of interest. Still need to be able to sort by Gene Ontology term. (That's what we're doing).
- Find genes differentially regulated b/t fed and nonfed; that are differentially expressed at the beginning of the cascade.
- May have to chase individual genes to find a good candidate gene.
Graphics as figures, methods section. Each group writes their own paper. Need a new game plan moving forward.
GO TERMS:
Used cloud blast on the 3596 proteins of unknown function.
"Save Results" should be in a folder. It produces many zip files when it packages them for blasting.
Cloud blast packages the sequences in group of ~50. Used the 10 results setting. Used the protein database as the search forum.
Went very fast. Like 2 minutes. Then mapped and annotated.
254 / 3596 now have annotations. There's an error in extracting just these genes. 187 are fully annotated.