Notes 2/9/16

From GcatWiki
Jump to: navigation, search

Clustering: grouping like items together

Why cluster?

Allows us to analyze representative data points, not the whole data set

Just graphing expression levels makes it easy to see induction but harder to see repression....if you use the log are able to see both induction and repression of genes...allows you to see some sort of coregulation

for clustering do we want to cluster opposites too or only similar (probably want opposites too because shows coregulation

Proximity Measures

many ways of measuring relationship between things

Linkage Methods

find average or central part of cluster and compare that "fake point" with the other point

only going to let in if close to everyone (max)

only going to let in if close to someone (min)

Hierarchical Clustering

What we are seeing with our results from DeSeq

How we do it: find the most similar genes, join net two most similar objects, repeat until all genes have been joined starting at 1 correlation and stopping at -1 (can't ever be pulled apart no matter what you find in the rest of the data

Cutting the Tree: group together all the things that are still joined when you draw a line through the tree, big question: WHERE DO YOU CUT THE TREE?!?!

K-means Clustering

Specify how many clusters to form

randomly assign each gene to one of k different clusters (how do you know what number of clusters to make) then group so similar to each other

Supervised Clustering

Find genes in expression file whose patterns are highly similar to a desired gene or pattern

could define a pattern and then use data to pull in all those genes that are like that pattern

Quality Clustering

Each gene builds a supervised cluster (how big is the group going to be and how similar does it have to be to join) have some rule about what makes the best cluster, decided to take the one with the biggest number