Difference between revisions of "2/9"
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Nick Balanda]] | ||
+ | |||
'''What is Gene Clustering?''' | '''What is Gene Clustering?''' | ||
Line 24: | Line 26: | ||
'''hierarchical clustering:''' | '''hierarchical clustering:''' | ||
− | |||
− | |||
-join two most similar genes, repeat until all genes have been clustered (no gene left behind-- starts at +1 correlation, end at -1) | -join two most similar genes, repeat until all genes have been clustered (no gene left behind-- starts at +1 correlation, end at -1) | ||
Line 49: | Line 49: | ||
'''there is no perfect answer for clustering, you have to experiment based on some biological meaning in order to draw most accurate conclusions''' | '''there is no perfect answer for clustering, you have to experiment based on some biological meaning in order to draw most accurate conclusions''' | ||
+ | |||
+ | '''clustering info/practice:''' | ||
+ | |||
+ | [http://gcat.davidson.edu/DGPB/clust/home.htm] |
Latest revision as of 18:48, 11 February 2016
What is Gene Clustering?
- grouping genes together based on similar proteins they code for -allows for presentation of similar genes together, analyze representative data points within big data (draw patterns), make predictions
---not having to sort through whole data set
-many algorithms
-gene expression data:
a) often comparative expression levels
b) consider that repression is as significant as induction (increase in expression)
b1) might use log scale to represent this!
b2) could mean co-regulation or correlation
linkage methods:
difference between average point in cluster and point in question, minimum distance between point in cluster and point in question,, etc
hierarchical clustering:
-join two most similar genes, repeat until all genes have been clustered (no gene left behind-- starts at +1 correlation, end at -1)
---cutting the tree-- dividing gene clusters into groups by drawing line through hierarchical tree and acknowledging groups left behind
k-means clustering:
specify how many clusters to form, groups each gene to one of k different clusters to maximize similarity
supervised clustering:
find all genes w/ expresion patterns matching "fill in the blank:" (all like this particular gene that we found upregulated)
quality clustering (QT clust)
each gene builds its own cluster based on genes that are most similar to it (repeat for every gene)
-come up with rule for "best cluster" (because each cluster likely overlaps many others)
--default in this case is to remove biggest cluster, then the next biggest, so on...
there is no perfect answer for clustering, you have to experiment based on some biological meaning in order to draw most accurate conclusions
clustering info/practice: