Notes 2/25/16
Last class we realized that the fed and not fed were not clustering correctly
Could possibly get them to cluster by filtering out extremely high values or change the correlation for FPKM. However, Dr. Heyer pointed out that we already found genes that had differential expression among the two groups so clustering doesn't matter too much. Will still be important to try to fix however for the pictures that we will use.
Today we will continue to look for genes. Look mainly at genes that are transcription factors or kinases, things that could be in charge of amplification of the cycle to cause cell growth (List of these genes on google doc)
Because Dr. Heyer said that the genes were already separated among the two groups I decided to write deOut to a file to see the data with the hopes that i would be able to clearly see that fed and not fed were different from each other, but was not able to understand exactly what it was saying. Saw some differential expression but the numbers were pretty close to each other, assume the the program is working like it is supposed to.
Dr. Campbell told us to remove mean expression values that were greater than 8000. This allows us to correlate genes better. However still not able to cluster fed and not fed with each other (example below using Contig8459_SVS1_Protein_SVS1_Saccharomyces_cerevisiae_strain_ATCC_204508_/_S288c gene
New Direction
Decided to use correlation clustering to find a list of genes and try to get the clusters to be correct. Decided to get rid of values of 5000 then used correlation clustering. This finds genes that are differentially expressed among the two groups then finds genes that are correlated to each other. Saved the list of genes that came from correlation into a file. Will look at these genes for any genes that really stand out as great candidate genes. Will use these later to do supervised clustering. (Heat map using correlation clustering below)