Difference between revisions of "Ashlyn"

Latest revision as of 17:39, 31 March 2016

Welcome to Ashlyn's page. Enjoy your stay.

January 2, 2016 This should say January 12... I'm still trying to figure out how to change the page name without losing the page.
January 14, 2016
January 19, 2016
January 21, 2016
January 26, 2016
January 28, 2016
February 2, 2016
February 4, 2016
February 9, 2016
February 11, 2016
February 16, 2016
February 18, 2016
February 23, 2016
February 25, 2016
March 8, 2016
March 10, 2016
March 15, 2016
March 17, 2016
March 22, 2016
March 24, 2016
March 31, 2016

Burmese Python RNAseq Project's Main Page

2/4

What do we want out of our research and how do we get there? What do we need to do with each of our twelve data sets? -more notes in lab notebook

TYPE UP NOTES AND UPLOAD PICTURES AFTER YOU TURN IN YOUR APPLICATIONS AND BEFORE PRESENTATION 1!!

2/9:

CLUSTERING- What does clustering mean to you? -grouping genes together and samples together and presenting them in an order. How did it do that?

Why cluster? Data reduction-analyze representative data points, not the whole dataset Hypothesis generation- gain understanding of patternis in data, so they may be tested statistically Remember the utility of log transformations. Consider direct, indirect relationships of genes, coregulations, look at other types of relationships. Both negative and positive correlations can be interesting and lead to important discoveries. Intensity plots

 Comparing gene expression profiles, or guilt by association  Proximity measures: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, dissimilarities ma or may not be metrics (triangle inequality, looseley referred to as distance)  WE WANT TO COMPARE GENES AND EXPRESSION PATTERNS BETWEEN FED AND NON-FED.
 How do you compare one thing to a group of things? How do you measure similarity/dissimilarity to the cluster? -Define cluster (take averages of things belonging to cluster) and then treat it like an individual to compare to other things. OR average all the distances. OR do max and min (it's a part of the cluster if it is closest to one of them or close to all of them) All of these are linkage methods (Complete linkage, incomplete linkage, mediode, etc.)

 Hierarchical clustering: joins two most similar genes, join next two most similar "objects" (genes of clusters of genes), repeat until all genes have been joined. Find the two closest genes and join them together. No matter what you discover in the rest of your data, THEY CANNOT BE PULLED APART. That is the biggest problem with hierarchical clustering; it doesn't take all the components together. Also, hierarchical clustering means no gene gets left behind; everybody is in. Starts with 1 correlation and ends with -1.

 Cutting the Tree- process of actually grouping genes, draw line in hierarchy line and see what is still together. Genes that are still together are part of a cluster. BUT where do you cut the tree? you get different answers depending on where you cut the line.

 K-means Clustering: Specifiy how many clusters to form, randomly assign each gene to one of k different clusters, average expression of all genes in each cluster to create k pseudo genes, rearrange genes by assigning each one to the cluster represented by the pseudo gene to which it is most similiar, repeat until convergence

Are there things you can cluster where you know the number??

 Supervised Clustering: find genes in expression file whose patterns are highly similar (close) to desired gene or pattern; add closest gene first; then add gene that is closest to all genes already in cluster; repeat, as long as added gene is within specified distance of genes already in cluster; distance from one gene to a set of genes defined to be max or min or average of all distances to individual members of the set (complete, single, and average linkage, respectively.

TRACK GENES THAT MATCH WITH A TRANSCRITPION FACTOR- Transcription factor might be small, but we want to see what has big changes that correlate with that.

Use QT Clust instead of heat map: MAIN IDEA: 1) each gene builds a supervised cluster, 2) Gene with "best" list, and genes in its list, becomes next cluster, 3) remove these genes from consideration, and repeat, 4) stop when all genes are clustered, or largest cluster is smaller than user specified threshold.

Gene with the biggest numbers/most genes is the group that we are looking at. We are calling it a cluster, now those genes are not part of anyone's group. Now look for next biggest group and get a different cluster. THERE IS NO ONE PERFECT, CORRECT ANSWER. LOOK FOR THINGS THAT MEAN SOMETHING TO YOU. Chase things you are interested in them, look for things that are similar, and then keep pulling things into your group. PRACTICE RESTRAINT.

 It would help to have gene ontology terms to help with clustering. Cluster transcription factors and look at those.

@@ Line 1: / Line 1: @@
-Welcome to Ashlyn's page. Enjoy your stay.
+''Welcome to Ashlyn's page. Enjoy your stay.''
-Notes 1/12:
-Protocol:
--Flash freeze organs after harvest because RNA is unstable and we do not want it to degrade.
--100 micro grams of RNA from 0.1g tissue. RNA isolated from just a kit-- think about sampling problem. Did it come from the right part of the organ? How do we know it's not connective tissue? Dr. C's best may not have been good enough...
--You don't sequence RNA, you sequence cDNA! mRNA template --> transcribed to more stable version of cDNA.
--Beads with oligonucleotide complementary to part of mRNA. Remove beads, now you just have mRNA. Use Reverse Transcriptase to transcribe cDNA.
--RNA FRAGMENTATION: You get very short reads (75bbp). What do you gain by fractionating the mRNA into short fragments randomly? --Now we get a lot more reads (more edges to read from, roughly the same size).
--BUT how do you prime every mRNA individually? -generate every possible heximer (6bp), attach it to code (A, B, C... for each snake "xxx") and attach both to primer.
-Good DNA samples. Amplify. Cut out at 500bp and reamplify. Reamplification is also at 500bp = really good cDNA library.
+*[[January 2, 2016]]  This should say January 12... I'm still trying to figure out how to change the page name without losing the page.
+*[[January 14, 2016]]
+*[[January 19, 2016]]
+*[[January 21, 2016]]
+*[[January 26, 2016]]
+*[[January 28, 2016]]
+*[[February 2, 2016]]
+*[[February 4, 2016]]
+*[[February 9, 2016]]
+*[[February 11, 2016]]
+*[[February 16, 2016]]
+*[[February 18, 2016]]
+*[[February 23, 2016]]
+*[[February 25, 2016]]
+*[[March 8, 2016]]
+*[[March 10, 2016]]
+*[[March 15, 2016]]
+*[[March 17, 2016]]
+*[[March 22, 2016]]
+*[[March 24, 2016]]
+*[[March 31, 2016]]
-Resources for snake intestine gene search:
-) ''First Hungarian report of inclusion body hepatitis associated with adenoviruses and secondary parvovirus infection in an Indonesian pit-viper [Parias (Trimeresurus) hageni]''
-[[http://apps.webofknowledge.com/full_record.do?product=UA&search_mode=GeneralSearch&qid=7&SID=3EIjtJqrlRYkr9ePPNW&page=1&doc=1]]
-) ''Genetic analysis of rainbow trout (Oncorhynchus mykiss): Strain identification via microsatellites and analysis of expressed sequence tags in intestine, liver, kidney, and ovary'' [[http://search.proquest.com/docview/304966900/1E7A54579B0F40CDPQ/3?accountid=10427]]
+[http://gcat.davidson.edu/GcatWiki/index.php/Burmese_Python_RNAseq_Project Burmese Python RNAseq Project's Main Page]
+/4
+What do we want out of our research and how do we get there?
+What do we need to do with each of our twelve data sets?
+-more notes in lab notebook
+TYPE UP NOTES AND UPLOAD PICTURES AFTER YOU TURN IN YOUR APPLICATIONS AND BEFORE PRESENTATION 1!!
+/9:
+CLUSTERING-
+What does clustering mean to you?
+-grouping genes together and samples together and presenting them in an order. How did it do that?
+Why cluster?  Data reduction-analyze representative data points, not the whole dataset  Hypothesis generation- gain understanding of patternis in data, so they may be tested statistically  Remember the utility of log transformations.  Consider direct, indirect relationships of genes, coregulations, look at other types of relationships. Both negative and positive correlations can be interesting and lead to important discoveries.
+Intensity plots
+  Comparing gene expression profiles, or guilt by association  Proximity measures: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, dissimilarities ma or may not be metrics (triangle inequality, looseley referred to as distance)  WE WANT TO COMPARE GENES AND EXPRESSION PATTERNS BETWEEN FED AND NON-FED.
+  How do you compare one thing to a group of things? How do you measure similarity/dissimilarity to the cluster? -Define cluster (take averages of things belonging to cluster) and then treat it like an individual to compare to other things. OR average all the distances. OR do max and min (it's a part of the cluster if it is closest to one of them or close to all of them) All of these are linkage methods (Complete linkage, incomplete linkage, mediode, etc.)
+  Hierarchical clustering: joins two most similar genes, join next two most similar "objects" (genes of clusters of genes), repeat until all genes have been joined. Find the two closest genes and join them together. No matter what you discover in the rest of your data, THEY CANNOT BE PULLED APART. That is the biggest problem with hierarchical clustering; it doesn't take all the components together. Also, hierarchical clustering means no gene gets left behind; everybody is in. Starts with 1 correlation and ends with -1.
+  Cutting the Tree- process of actually grouping genes, draw line in hierarchy line and see what is still together. Genes that are still together are part of a cluster. BUT where do you cut the tree? you get different answers depending on where you cut the line.
+  K-means Clustering: Specifiy how many clusters to form, randomly assign each gene to one of k different clusters, average expression of all genes in each cluster to create k pseudo genes, rearrange genes by assigning each one to the cluster represented by the pseudo gene to which it is most similiar, repeat until convergence
+Are there things you can cluster where you know the number??
+  Supervised Clustering: find genes in expression file whose patterns are highly similar (close) to desired gene or pattern; add closest gene first; then add gene that is closest to all genes already in cluster; repeat, as long as added gene is within specified distance of genes already in cluster; distance from one gene to a set of genes defined to be max or min or average of all distances to individual members of the set (complete, single, and average linkage, respectively.
+TRACK GENES THAT MATCH WITH A TRANSCRITPION FACTOR- Transcription factor might be small, but we want to see what has big changes that correlate with that.
+Use QT Clust instead of heat map:
+MAIN IDEA: 1) each gene builds a supervised cluster, 2) Gene with "best" list, and genes in its list, becomes next cluster, 3) remove these genes from consideration, and repeat, 4) stop when all genes are clustered, or largest cluster is smaller than user specified threshold.
+Gene with the biggest numbers/most genes is the group that we are looking at. We are calling it a cluster, now those genes are not part of anyone's group. Now look for next biggest group and get a different cluster. THERE IS NO ONE PERFECT, CORRECT ANSWER. LOOK FOR THINGS THAT MEAN SOMETHING TO YOU.
+Chase things you are interested in them, look for things that are similar, and then keep pulling things into your group. PRACTICE RESTRAINT.
+  It would help to have gene ontology terms to help with clustering. Cluster transcription factors and look at those.

Difference between revisions of "Ashlyn"

Latest revision as of 17:39, 31 March 2016

Navigation menu

Views

Personal tools

Navigation

Search

Tools