JP Feb 23 16
Heyer's code, most recent http://www.bio.davidson.edu/courses/Bio343/2016/goi_supervised.txt
Set up a way to filter out genes whose data has a lot of 0s. Can supervise clustering when the data set is small enough (11,000 is too big for R to graph). Can find a gene listed in the "toSearch" list = goi (gene of interest). Can set a threshold for correlation to gene of interest.
Between correlation distance and supervised clustering, we can locate genes of interest. Still need to be able to sort by Gene Ontology term. (That's what we're doing).
- Find genes differentially regulated b/t fed and nonfed; that are differentially expressed at the beginning of the cascade.
- May have to chase individual genes to find a good candidate gene.
Graphics as figures, methods section. Each group writes their own paper. Need a new game plan moving forward.
GO TERMS: CLOUD BLAST
Used cloud blast on the 3596 proteins of unknown function.
"Save Results" should be in a folder. It produces many zip files when it packages them for blasting.
Cloud blast packages the sequences in group of ~50. Used the 10 results setting.
Used the protein database as the search forum (PDB).
Went very fast. Like 2 minutes. Then mapped and annotated.
254 / 3596 now have annotations. There's an error in exporting just these genes to a new file. 187 are fully annotated (Blue).
Cloud blast from reference data base taking much longer (REF) ~25 minutes.
We found 1078 / 3596 with this method. 356 are fully annotated. 488 green and blue.
We took both fully annotated lists (GAF Files) and gave them to Dylan's group.