DM Notes 2.09.16

From GcatWiki
Revision as of 19:10, 9 February 2016 by Dymaghini (talk | contribs)
Jump to: navigation, search

Takehome points from correlation exercise: some subtle changes in genes can lead to drastically different correlation coefficients. For example, in scenario three, a gene that is close to evenly expressed across samples can have drastically different correlations if one point changes from 7 to 6. So, noise can be deceptive and lead us to believe that there are patterns where there aren't. Mathematical correlations don't always have biological significance.

Clustering:

Grouping genes (and treatments) based off of some characteristic (such as gene expression), and presenting them in some order to help draw biological significance. There are different approaches/algorithms for clustering.

Why cluster? Data reduction (analyze representative data points), hypothesis generation (gain understanding of patterns), hypothesis testing, prediction based on groups (cluster cancer patients, predict outcomes).

Gene Expression Data Example: One highlighted gene is induced 16 fold. One highlighted gene is repressed 16 fold. Induction looks much more dramatic than repression - one to sixteen is much more noticeable than one to one sixteenth. (Figure: time vs expression. Each line is a gene) Solution- log scale. Induction and repression look equal, but opposite sign with log base 2. Possibilities- genes are co-regulated, or repression of one induces the other and vice versa.

Comparing Gene Expression Profiles, or, Guilt by Association:

Proximity Measures: correlation, Euclidean distance, inner product (xTy), hamming distance, L1 distance