Feb 9
Clustering: Grouping in a particular way based on some sort of algorithm with given parameters
Why cluster? Exploration of huge data, extract patterns and make predictions on these patterns (hypothesis generation and testing)
Gene expression data:
Induction looks much more dramatic than the repression (be sure and remember this), equivalent on the fold change, but look very dissimilar
A log transformation "normalizing" the way this data looks for fold changes
Negative correlations are as informative as the positive correlations
Scatter/line plots are a different way to represent a heat map
Comparing Gene Expression Profiles or Guilt by expression:
Co-regulation or directly regulating each other
Proximity Measures:
Want to understand relationships genes and expression level over time or samples
Correlation, Euclidean distance (distance formula), Inner product x y, Hamming distance, L1 distance, Dissimilarities may or may not be metrics
Correlation is very sensitive to outliners (percent change) so the other measures could be good
Linkage Methods:
Find some center point in a cluster, treat it as a "gene" and measure it from the gene of interest
Could average all the distances between the gene of interest and all in cluster
Could do the minimum or the maximum distance of a gene in the cluster to the gene of interest
Single linkage, Average Linkage, etc. Each will produce different clusters
Hierarchical Clustering
Join two most similar genes
Join next two most similar "objects", repeat until all genes have been joined (can never be pulled apart in your cluster once they are joined)
Iterative and stringent