Difference between revisions of "2/9"

From GcatWiki
Jump to: navigation, search
(Created page with "'''What is Gene Clustering?''' - grouping genes together based on similar proteins they code for -allows for presentation of similar genes together, analyze representative da...")
 
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
[[Nick Balanda]]
 +
 
'''What is Gene Clustering?'''
 
'''What is Gene Clustering?'''
  
Line 23: Line 25:
  
 
'''hierarchical clustering:'''
 
'''hierarchical clustering:'''
 +
 +
  
 
-join two most similar genes, repeat until all genes have been clustered (no gene left behind-- starts at +1 correlation, end at -1)
 
-join two most similar genes, repeat until all genes have been clustered (no gene left behind-- starts at +1 correlation, end at -1)
Line 45: Line 49:
  
 
'''there is no perfect answer for clustering, you have to experiment based on some biological meaning in order to draw most accurate conclusions'''
 
'''there is no perfect answer for clustering, you have to experiment based on some biological meaning in order to draw most accurate conclusions'''
 +
 +
'''clustering info/practice:'''
 +
 +
[http://gcat.davidson.edu/DGPB/clust/home.htm]

Latest revision as of 18:48, 11 February 2016

Nick Balanda

What is Gene Clustering?

- grouping genes together based on similar proteins they code for -allows for presentation of similar genes together, analyze representative data points within big data (draw patterns), make predictions

---not having to sort through whole data set

-many algorithms

-gene expression data:

a) often comparative expression levels

b) consider that repression is as significant as induction (increase in expression)

b1) might use log scale to represent this!

b2) could mean co-regulation or correlation

linkage methods:

difference between average point in cluster and point in question, minimum distance between point in cluster and point in question,, etc

hierarchical clustering:


-join two most similar genes, repeat until all genes have been clustered (no gene left behind-- starts at +1 correlation, end at -1)

---cutting the tree-- dividing gene clusters into groups by drawing line through hierarchical tree and acknowledging groups left behind

k-means clustering:

specify how many clusters to form, groups each gene to one of k different clusters to maximize similarity

supervised clustering:

find all genes w/ expresion patterns matching "fill in the blank:" (all like this particular gene that we found upregulated)

quality clustering (QT clust)

each gene builds its own cluster based on genes that are most similar to it (repeat for every gene)

-come up with rule for "best cluster" (because each cluster likely overlaps many others)

--default in this case is to remove biggest cluster, then the next biggest, so on...

there is no perfect answer for clustering, you have to experiment based on some biological meaning in order to draw most accurate conclusions

clustering info/practice:

[1]