Difference between revisions of "Notes 2/25/16"

From GcatWiki
Jump to: navigation, search
Line 10: Line 10:
  
 
[[File:Not_correct_clustering.png‎]]
 
[[File:Not_correct_clustering.png‎]]
 +
 +
 +
'''New Direction'''
 +
 +
Decided to use correlation clustering to find a list of genes and try to get the clusters to be correct. Decided to get rid of values of 5000 then used correlation clustering. Saved the list of genes that came from correlation into a file. will use these later to do supervised clustering. (Heat map using correlation clustering below)

Revision as of 19:56, 25 February 2016

Last class we realized that the fed and not fed were not clustering correctly

Could possibly get them to cluster by filtering out extremely high values or change the correlation for FPKM. However, Dr. Heyer pointed out that we already found genes that had differential expression among the two groups so clustering doesn't matter too much. Will still be important to try to fix however for the pictures that we will use.

Today we will continue to look for genes. Look mainly at genes that are transcription factors or kinases, things that could be in charge of amplification of the cycle to cause cell growth (List of these genes on google doc)

Because Dr. Heyer said that the genes were already separated among the two groups I decided to write deOut to a file to see the data with the hopes that i would be able to clearly see that fed and not fed were different from each other, but was not able to understand exactly what it was saying. Saw some differential expression but the numbers were pretty close to each other, assume the the program is working like it is supposed to.

Dr. Campbell told us to remove mean expression values that were greater than 8000. This allows us to correlate genes better. However still not able to cluster fed and not fed with each other (example below using Contig8459_SVS1_Protein_SVS1_Saccharomyces_cerevisiae_strain_ATCC_204508_/_S288c gene

Not correct clustering.png


New Direction

Decided to use correlation clustering to find a list of genes and try to get the clusters to be correct. Decided to get rid of values of 5000 then used correlation clustering. Saved the list of genes that came from correlation into a file. will use these later to do supervised clustering. (Heat map using correlation clustering below)