Notes 2/9/16
Clustering: grouping like items together
Why cluster?
Allows us to analyze representative data points, not the whole data set
Just graphing expression levels makes it easy to see induction but harder to see repression....if you use the log are able to see both induction and repression of genes...allows you to see some sort of coregulation
for clustering do we want to cluster opposites too or only similar (probably want opposites too because shows coregulation
Proximity Measures
many ways of measuring relationship between things
Linkage Methods
find average or central part of cluster and compare that "fake point" with the other point
only going to let in if close to everyone (max)
only going to let in if close to someone (min)
Hierarchical Clustering
What we are seeing with our results from DeSeq
How we do it: find the most similar genes, join net two most similar objects, repeat until all genes have been joined starting at 1 correlation and stopping at -1 (can't ever be pulled apart no matter what you find in the rest of the data
Cutting the Tree: group together all the things that are still joined when you draw a line through the tree, big question: WHERE DO YOU CUT THE TREE?!?!
K-means Clustering
Specify how many clusters to form
randomly assign each gene to one of k different clusters (how do you know what number of clusters to make) then group so similar to each other
Supervised Clustering
Find genes in expression file whose patterns are highly similar to a desired gene or pattern
could define a pattern and then use data to pull in all those genes that are like that pattern
Quality Clustering
Each gene builds a supervised cluster (how big is the group going to be and how similar does it have to be to join) have some rule about what makes the best cluster, decided to take the one with the biggest number