GcatWiki - User contributions [en]

March 31, 2016

2016-04-05T18:55:31Z

Asgruber:

DNA replication is down regulated- story: energy needs to go elsewhere
decide to focus our attention on signaling pathway since replication genes were both up and down regulated and we did not have sufficient time to thoroughly study both.
Fluctuations in mRNA levels may not change the protein level very much.

stem cell differentiation in small intestine

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 31, 2016

2016-03-31T18:55:53Z

Asgruber: Created page with "DNA replication is down regulated- story: energy needs to go elsewhere decide to focus our attention on signaling pathway since replication genes were both up and down regul..."

DNA replication is down regulated- story: energy needs to go elsewhere
decide to focus our attention on signaling pathway since replication genes were both up and down regulated and we did not have sufficient time to thoroughly study both.
Fluctuations in mRNA levels may not change the protein level very much.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

Ashlyn

2016-03-31T17:39:25Z

Asgruber:

''Welcome to Ashlyn's page. Enjoy your stay.''

*[[January 2, 2016]] This should say January 12... I'm still trying to figure out how to change the page name without losing the page.
*[[January 14, 2016]]
*[[January 19, 2016]]
*[[January 21, 2016]]
*[[January 26, 2016]]
*[[January 28, 2016]]
*[[February 2, 2016]]
*[[February 4, 2016]]
*[[February 9, 2016]]
*[[February 11, 2016]]
*[[February 16, 2016]]
*[[February 18, 2016]]
*[[February 23, 2016]]
*[[February 25, 2016]]
*[[March 8, 2016]]
*[[March 10, 2016]]
*[[March 15, 2016]]
*[[March 17, 2016]]
*[[March 22, 2016]]
*[[March 24, 2016]]
*[[March 31, 2016]]

[http://gcat.davidson.edu/GcatWiki/index.php/Burmese_Python_RNAseq_Project Burmese Python RNAseq Project's Main Page]

2/4

What do we want out of our research and how do we get there?
What do we need to do with each of our twelve data sets?
-more notes in lab notebook

TYPE UP NOTES AND UPLOAD PICTURES AFTER YOU TURN IN YOUR APPLICATIONS AND BEFORE PRESENTATION 1!!

2/9:

CLUSTERING-
What does clustering mean to you?
-grouping genes together and samples together and presenting them in an order. How did it do that?

Why cluster? Data reduction-analyze representative data points, not the whole dataset Hypothesis generation- gain understanding of patternis in data, so they may be tested statistically Remember the utility of log transformations. Consider direct, indirect relationships of genes, coregulations, look at other types of relationships. Both negative and positive correlations can be interesting and lead to important discoveries.
Intensity plots

Comparing gene expression profiles, or guilt by association Proximity measures: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, dissimilarities ma or may not be metrics (triangle inequality, looseley referred to as distance) WE WANT TO COMPARE GENES AND EXPRESSION PATTERNS BETWEEN FED AND NON-FED.
How do you compare one thing to a group of things? How do you measure similarity/dissimilarity to the cluster? -Define cluster (take averages of things belonging to cluster) and then treat it like an individual to compare to other things. OR average all the distances. OR do max and min (it's a part of the cluster if it is closest to one of them or close to all of them) All of these are linkage methods (Complete linkage, incomplete linkage, mediode, etc.)

Hierarchical clustering: joins two most similar genes, join next two most similar "objects" (genes of clusters of genes), repeat until all genes have been joined. Find the two closest genes and join them together. No matter what you discover in the rest of your data, THEY CANNOT BE PULLED APART. That is the biggest problem with hierarchical clustering; it doesn't take all the components together. Also, hierarchical clustering means no gene gets left behind; everybody is in. Starts with 1 correlation and ends with -1.

Cutting the Tree- process of actually grouping genes, draw line in hierarchy line and see what is still together. Genes that are still together are part of a cluster. BUT where do you cut the tree? you get different answers depending on where you cut the line.

K-means Clustering: Specifiy how many clusters to form, randomly assign each gene to one of k different clusters, average expression of all genes in each cluster to create k pseudo genes, rearrange genes by assigning each one to the cluster represented by the pseudo gene to which it is most similiar, repeat until convergence

Are there things you can cluster where you know the number??

Supervised Clustering: find genes in expression file whose patterns are highly similar (close) to desired gene or pattern; add closest gene first; then add gene that is closest to all genes already in cluster; repeat, as long as added gene is within specified distance of genes already in cluster; distance from one gene to a set of genes defined to be max or min or average of all distances to individual members of the set (complete, single, and average linkage, respectively.

TRACK GENES THAT MATCH WITH A TRANSCRITPION FACTOR- Transcription factor might be small, but we want to see what has big changes that correlate with that.

Use QT Clust instead of heat map:
MAIN IDEA: 1) each gene builds a supervised cluster, 2) Gene with "best" list, and genes in its list, becomes next cluster, 3) remove these genes from consideration, and repeat, 4) stop when all genes are clustered, or largest cluster is smaller than user specified threshold.

Gene with the biggest numbers/most genes is the group that we are looking at. We are calling it a cluster, now those genes are not part of anyone's group. Now look for next biggest group and get a different cluster. THERE IS NO ONE PERFECT, CORRECT ANSWER. LOOK FOR THINGS THAT MEAN SOMETHING TO YOU.
Chase things you are interested in them, look for things that are similar, and then keep pulling things into your group. PRACTICE RESTRAINT.

It would help to have gene ontology terms to help with clustering. Cluster transcription factors and look at those.

March 24, 2016

2016-03-24T18:24:57Z

Asgruber:

Discuss status report grades
Class vs Homework
Discover vs Explore
*Tell a compelling story supported by evidence
*What you find will give you the raw material to tell the story
*A good paper isn't written the day before it is due

Get rid of genes that don't make sense
We are making subjective groupings by function after looking at the GO terms
Genes can be in more than one category
Genes are in category and separated as up and down regulated

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 22, 2016

2016-03-22T18:53:55Z

Asgruber:

sample dendrograms
**We want correct clusters through dendrogram.
normalizations
**DEseq doesn't care how long the gene is. The only normalization we are doing is how similar the genes are. Normalization through Deseq is only taking into account "how many genes per sample" DEseq only compares same gene against itself.
**normalized accounts asks software how many reads.

one single sample can throw off the dendrogram

Do you think the signal for liver will be different for liver?
-Not necessarily, so crosscheck your list with some groups from other organs!

83 genes on list of clustering done correctly from 3/22

SGTA- co-chaperone that specializes in hormone receptors- helps fold transcription factors that help receptors
Look at the story that you are telling! Modifying proteins like receptors, signaling pathways are starting.

Today we continued looking for GO terms and KEGG pathway identifiers for our 31 genes unique to our list. We also clustered correctly and got 83 genes. Then we researched the function of each gene and checked to see if the genes were up or down regulated. Down regulated genes are highlighted in blue.

'''GENES DR. CAMPBELL GOT EXCITED ABOUT: LGR5, ZFP161'''

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 24, 2016

2016-03-22T17:36:45Z

Asgruber: Created page with " [http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]"

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 22, 2016

2016-03-22T17:36:39Z

Asgruber: Created page with " [http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]"

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

Ashlyn

2016-03-22T17:36:21Z

Asgruber:

''Welcome to Ashlyn's page. Enjoy your stay.''

*[[January 2, 2016]] This should say January 12... I'm still trying to figure out how to change the page name without losing the page.
*[[January 14, 2016]]
*[[January 19, 2016]]
*[[January 21, 2016]]
*[[January 26, 2016]]
*[[January 28, 2016]]
*[[February 2, 2016]]
*[[February 4, 2016]]
*[[February 9, 2016]]
*[[February 11, 2016]]
*[[February 16, 2016]]
*[[February 18, 2016]]
*[[February 23, 2016]]
*[[February 25, 2016]]
*[[March 8, 2016]]
*[[March 10, 2016]]
*[[March 15, 2016]]
*[[March 17, 2016]]
*[[March 22, 2016]]
*[[March 24, 2016]]

[http://gcat.davidson.edu/GcatWiki/index.php/Burmese_Python_RNAseq_Project Burmese Python RNAseq Project's Main Page]

2/4

What do we want out of our research and how do we get there?
What do we need to do with each of our twelve data sets?
-more notes in lab notebook

TYPE UP NOTES AND UPLOAD PICTURES AFTER YOU TURN IN YOUR APPLICATIONS AND BEFORE PRESENTATION 1!!

2/9:

CLUSTERING-
What does clustering mean to you?
-grouping genes together and samples together and presenting them in an order. How did it do that?

Why cluster? Data reduction-analyze representative data points, not the whole dataset Hypothesis generation- gain understanding of patternis in data, so they may be tested statistically Remember the utility of log transformations. Consider direct, indirect relationships of genes, coregulations, look at other types of relationships. Both negative and positive correlations can be interesting and lead to important discoveries.
Intensity plots

Comparing gene expression profiles, or guilt by association Proximity measures: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, dissimilarities ma or may not be metrics (triangle inequality, looseley referred to as distance) WE WANT TO COMPARE GENES AND EXPRESSION PATTERNS BETWEEN FED AND NON-FED.
How do you compare one thing to a group of things? How do you measure similarity/dissimilarity to the cluster? -Define cluster (take averages of things belonging to cluster) and then treat it like an individual to compare to other things. OR average all the distances. OR do max and min (it's a part of the cluster if it is closest to one of them or close to all of them) All of these are linkage methods (Complete linkage, incomplete linkage, mediode, etc.)

Hierarchical clustering: joins two most similar genes, join next two most similar "objects" (genes of clusters of genes), repeat until all genes have been joined. Find the two closest genes and join them together. No matter what you discover in the rest of your data, THEY CANNOT BE PULLED APART. That is the biggest problem with hierarchical clustering; it doesn't take all the components together. Also, hierarchical clustering means no gene gets left behind; everybody is in. Starts with 1 correlation and ends with -1.

Cutting the Tree- process of actually grouping genes, draw line in hierarchy line and see what is still together. Genes that are still together are part of a cluster. BUT where do you cut the tree? you get different answers depending on where you cut the line.

K-means Clustering: Specifiy how many clusters to form, randomly assign each gene to one of k different clusters, average expression of all genes in each cluster to create k pseudo genes, rearrange genes by assigning each one to the cluster represented by the pseudo gene to which it is most similiar, repeat until convergence

Are there things you can cluster where you know the number??

Supervised Clustering: find genes in expression file whose patterns are highly similar (close) to desired gene or pattern; add closest gene first; then add gene that is closest to all genes already in cluster; repeat, as long as added gene is within specified distance of genes already in cluster; distance from one gene to a set of genes defined to be max or min or average of all distances to individual members of the set (complete, single, and average linkage, respectively.

TRACK GENES THAT MATCH WITH A TRANSCRITPION FACTOR- Transcription factor might be small, but we want to see what has big changes that correlate with that.

Use QT Clust instead of heat map:
MAIN IDEA: 1) each gene builds a supervised cluster, 2) Gene with "best" list, and genes in its list, becomes next cluster, 3) remove these genes from consideration, and repeat, 4) stop when all genes are clustered, or largest cluster is smaller than user specified threshold.

Gene with the biggest numbers/most genes is the group that we are looking at. We are calling it a cluster, now those genes are not part of anyone's group. Now look for next biggest group and get a different cluster. THERE IS NO ONE PERFECT, CORRECT ANSWER. LOOK FOR THINGS THAT MEAN SOMETHING TO YOU.
Chase things you are interested in them, look for things that are similar, and then keep pulling things into your group. PRACTICE RESTRAINT.

It would help to have gene ontology terms to help with clustering. Cluster transcription factors and look at those.

March 17, 2016

2016-03-17T18:55:33Z

Asgruber:

*Collaborate with organ groups for final project! Maybe initiate google doc?
*Find candidate genes
*Try to use KEGG because it seems like a really cool and useful tool
*Try to figure out KEGG and look at GOterms
*IMPORTANT: HEAT MAPS ACTUALLY WERE DONE USING EXPECTED COUNTS not FPKM... that was wrong in Methods Outline Report
*Dr. Campbell fixed our clusters. One was throwing off the whole list, so we thought that gene might be worth searching and see what it does. Why does that one throw off the whole cluster?
*

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 17, 2016

2016-03-17T17:45:38Z

Asgruber: Created page with "*Collaborate with organ groups for final project! Maybe initiate google doc? *Find candidate genes *Try to use KEGG because it seems like a really cool and useful tool *..."

Ashlyn

2016-03-17T17:40:00Z

Asgruber:

''Welcome to Ashlyn's page. Enjoy your stay.''

*[[January 2, 2016]] This should say January 12... I'm still trying to figure out how to change the page name without losing the page.
*[[January 14, 2016]]
*[[January 19, 2016]]
*[[January 21, 2016]]
*[[January 26, 2016]]
*[[January 28, 2016]]
*[[February 2, 2016]]
*[[February 4, 2016]]
*[[February 9, 2016]]
*[[February 11, 2016]]
*[[February 16, 2016]]
*[[February 18, 2016]]
*[[February 23, 2016]]
*[[February 25, 2016]]
*[[March 8, 2016]]
*[[March 10, 2016]]
*[[March 15, 2016]]
*[[March 17, 2016]]

[http://gcat.davidson.edu/GcatWiki/index.php/Burmese_Python_RNAseq_Project Burmese Python RNAseq Project's Main Page]

2/4

What do we want out of our research and how do we get there?
What do we need to do with each of our twelve data sets?
-more notes in lab notebook

TYPE UP NOTES AND UPLOAD PICTURES AFTER YOU TURN IN YOUR APPLICATIONS AND BEFORE PRESENTATION 1!!

2/9:

CLUSTERING-
What does clustering mean to you?
-grouping genes together and samples together and presenting them in an order. How did it do that?

Why cluster? Data reduction-analyze representative data points, not the whole dataset Hypothesis generation- gain understanding of patternis in data, so they may be tested statistically Remember the utility of log transformations. Consider direct, indirect relationships of genes, coregulations, look at other types of relationships. Both negative and positive correlations can be interesting and lead to important discoveries.
Intensity plots

Comparing gene expression profiles, or guilt by association Proximity measures: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, dissimilarities ma or may not be metrics (triangle inequality, looseley referred to as distance) WE WANT TO COMPARE GENES AND EXPRESSION PATTERNS BETWEEN FED AND NON-FED.
How do you compare one thing to a group of things? How do you measure similarity/dissimilarity to the cluster? -Define cluster (take averages of things belonging to cluster) and then treat it like an individual to compare to other things. OR average all the distances. OR do max and min (it's a part of the cluster if it is closest to one of them or close to all of them) All of these are linkage methods (Complete linkage, incomplete linkage, mediode, etc.)

Hierarchical clustering: joins two most similar genes, join next two most similar "objects" (genes of clusters of genes), repeat until all genes have been joined. Find the two closest genes and join them together. No matter what you discover in the rest of your data, THEY CANNOT BE PULLED APART. That is the biggest problem with hierarchical clustering; it doesn't take all the components together. Also, hierarchical clustering means no gene gets left behind; everybody is in. Starts with 1 correlation and ends with -1.

Cutting the Tree- process of actually grouping genes, draw line in hierarchy line and see what is still together. Genes that are still together are part of a cluster. BUT where do you cut the tree? you get different answers depending on where you cut the line.

K-means Clustering: Specifiy how many clusters to form, randomly assign each gene to one of k different clusters, average expression of all genes in each cluster to create k pseudo genes, rearrange genes by assigning each one to the cluster represented by the pseudo gene to which it is most similiar, repeat until convergence

Are there things you can cluster where you know the number??

Supervised Clustering: find genes in expression file whose patterns are highly similar (close) to desired gene or pattern; add closest gene first; then add gene that is closest to all genes already in cluster; repeat, as long as added gene is within specified distance of genes already in cluster; distance from one gene to a set of genes defined to be max or min or average of all distances to individual members of the set (complete, single, and average linkage, respectively.

TRACK GENES THAT MATCH WITH A TRANSCRITPION FACTOR- Transcription factor might be small, but we want to see what has big changes that correlate with that.

Use QT Clust instead of heat map:
MAIN IDEA: 1) each gene builds a supervised cluster, 2) Gene with "best" list, and genes in its list, becomes next cluster, 3) remove these genes from consideration, and repeat, 4) stop when all genes are clustered, or largest cluster is smaller than user specified threshold.

Gene with the biggest numbers/most genes is the group that we are looking at. We are calling it a cluster, now those genes are not part of anyone's group. Now look for next biggest group and get a different cluster. THERE IS NO ONE PERFECT, CORRECT ANSWER. LOOK FOR THINGS THAT MEAN SOMETHING TO YOU.
Chase things you are interested in them, look for things that are similar, and then keep pulling things into your group. PRACTICE RESTRAINT.

It would help to have gene ontology terms to help with clustering. Cluster transcription factors and look at those.

March 15, 2016

2016-03-15T18:59:13Z

Asgruber: Created page with "Presentation #1 Day! Contig: assembled piece of DNA that can have multiple genes on it. Gene and contig are not synonymous. [http://gcat.davidson.edu/mediawiki-1.19.1/index..."

Presentation #1 Day!

Contig: assembled piece of DNA that can have multiple genes on it. Gene and contig are not synonymous.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

Ashlyn

2016-03-15T17:33:46Z

Asgruber:

''Welcome to Ashlyn's page. Enjoy your stay.''

*[[January 2, 2016]] This should say January 12... I'm still trying to figure out how to change the page name without losing the page.
*[[January 14, 2016]]
*[[January 19, 2016]]
*[[January 21, 2016]]
*[[January 26, 2016]]
*[[January 28, 2016]]
*[[February 2, 2016]]
*[[February 4, 2016]]
*[[February 9, 2016]]
*[[February 11, 2016]]
*[[February 16, 2016]]
*[[February 18, 2016]]
*[[February 23, 2016]]
*[[February 25, 2016]]
*[[March 8, 2016]]
*[[March 10, 2016]]
*[[March 15, 2016]]

[http://gcat.davidson.edu/GcatWiki/index.php/Burmese_Python_RNAseq_Project Burmese Python RNAseq Project's Main Page]

2/4

What do we want out of our research and how do we get there?
What do we need to do with each of our twelve data sets?
-more notes in lab notebook

TYPE UP NOTES AND UPLOAD PICTURES AFTER YOU TURN IN YOUR APPLICATIONS AND BEFORE PRESENTATION 1!!

2/9:

CLUSTERING-
What does clustering mean to you?
-grouping genes together and samples together and presenting them in an order. How did it do that?

Why cluster? Data reduction-analyze representative data points, not the whole dataset Hypothesis generation- gain understanding of patternis in data, so they may be tested statistically Remember the utility of log transformations. Consider direct, indirect relationships of genes, coregulations, look at other types of relationships. Both negative and positive correlations can be interesting and lead to important discoveries.
Intensity plots

Comparing gene expression profiles, or guilt by association Proximity measures: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, dissimilarities ma or may not be metrics (triangle inequality, looseley referred to as distance) WE WANT TO COMPARE GENES AND EXPRESSION PATTERNS BETWEEN FED AND NON-FED.
How do you compare one thing to a group of things? How do you measure similarity/dissimilarity to the cluster? -Define cluster (take averages of things belonging to cluster) and then treat it like an individual to compare to other things. OR average all the distances. OR do max and min (it's a part of the cluster if it is closest to one of them or close to all of them) All of these are linkage methods (Complete linkage, incomplete linkage, mediode, etc.)

Hierarchical clustering: joins two most similar genes, join next two most similar "objects" (genes of clusters of genes), repeat until all genes have been joined. Find the two closest genes and join them together. No matter what you discover in the rest of your data, THEY CANNOT BE PULLED APART. That is the biggest problem with hierarchical clustering; it doesn't take all the components together. Also, hierarchical clustering means no gene gets left behind; everybody is in. Starts with 1 correlation and ends with -1.

Cutting the Tree- process of actually grouping genes, draw line in hierarchy line and see what is still together. Genes that are still together are part of a cluster. BUT where do you cut the tree? you get different answers depending on where you cut the line.

K-means Clustering: Specifiy how many clusters to form, randomly assign each gene to one of k different clusters, average expression of all genes in each cluster to create k pseudo genes, rearrange genes by assigning each one to the cluster represented by the pseudo gene to which it is most similiar, repeat until convergence

Are there things you can cluster where you know the number??

Supervised Clustering: find genes in expression file whose patterns are highly similar (close) to desired gene or pattern; add closest gene first; then add gene that is closest to all genes already in cluster; repeat, as long as added gene is within specified distance of genes already in cluster; distance from one gene to a set of genes defined to be max or min or average of all distances to individual members of the set (complete, single, and average linkage, respectively.

TRACK GENES THAT MATCH WITH A TRANSCRITPION FACTOR- Transcription factor might be small, but we want to see what has big changes that correlate with that.

Use QT Clust instead of heat map:
MAIN IDEA: 1) each gene builds a supervised cluster, 2) Gene with "best" list, and genes in its list, becomes next cluster, 3) remove these genes from consideration, and repeat, 4) stop when all genes are clustered, or largest cluster is smaller than user specified threshold.

Gene with the biggest numbers/most genes is the group that we are looking at. We are calling it a cluster, now those genes are not part of anyone's group. Now look for next biggest group and get a different cluster. THERE IS NO ONE PERFECT, CORRECT ANSWER. LOOK FOR THINGS THAT MEAN SOMETHING TO YOU.
Chase things you are interested in them, look for things that are similar, and then keep pulling things into your group. PRACTICE RESTRAINT.

It would help to have gene ontology terms to help with clustering. Cluster transcription factors and look at those.

March 10, 2016

2016-03-10T19:45:22Z

Asgruber:

Plan for presentation:
*Look for interesting genes in overrepresented gene blast.
*Didn't find anything interesting in the paper.
*Correlation clustering.
*Cross-reference genes with Castoe
*Back to heat map for where to start. The following genes on the heat map looked the most distinct:
Contig 8459- appears on both lists
Contig 1862*
Contig 445*
**inhibits ATPase when mitochondrial membrane potential falls below threshold. Required to avoid cellular consumption of ATP
asterics is unique to our list of 40. The contig numbers corespond to our list of genes from correlation! Google the exciting genes.
LOOK AT SLIDESHOW TO COMPLETE NOTES AND WRITE RESPONSE

Follow the genes only on our list path, cross reference again with the heat map to find the genes most differentially expressed on this list, research function

We are following two different ideas, look at genes that are the most differentially expressed

Codes ",marigns = c(10,10))

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 10, 2016

2016-03-10T18:44:35Z

Asgruber:

Plan for presentation:

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 25, 2016

2016-03-10T14:56:38Z

Asgruber:

== Classwork ==
Different groups in class will work on fixing bad characters and identifying new protein functions? (I'm confused by this)

As of late when working with the intestine samples, we are getting a mixture of fed and non-fed when clustering even though DEseq has already told us that clustering goes from fed vs. non-fed. Moving forward, we will play with the filter values and use the correction FPKM. <-- this doesn't sound right... talk to Elise and Kathryn. FPKM is the adjusted expression levels for the length of gene.

Remember, the ultimate goal of this class is a term paper with an introduction, methods, and description of found candidate genes explaining what each candidate gene does and why we selected it. Good potential candidate genes include: transcription factors, transporters, and anything in the signaling cascade.

In class today, we looked for genes belonging to smaller clusters. Other than a lot of trial and error and familiarizing myself with the programming and clustering process, I did not have significant success finding genes that belong to small clusters.

==== Questions to Consider: ====
*Is a smaller cluster necessarily advantageous considering that we want to find a gene that turns-on a lot of other genes?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:35:55Z

Asgruber: /* Classwork */

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
::::myMeans <- apply(as.matrix(myCountData), 1, mean)
::::toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:34:33Z

Asgruber: /* Classwork */

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
myMeans <- apply(as.matrix(myCountData), 1, mean)

toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:34:09Z

Asgruber: /* Classwork */

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
>>>myMeans <- apply(as.matrix(myCountData), 1, mean)
>>>toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:32:23Z

Asgruber: /* Classwork */

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
myMeans <- apply(as.matrix(myCountData), 1, mean)
toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:31:42Z

Asgruber:

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
'''myMeans <- apply(as.matrix(myCountData), 1, mean)
toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]'''
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:31:28Z

Asgruber: /* Supervised Clustering Attempts: */

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
'''myMeans <- apply(as.matrix(myCountData), 1, mean)
toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]'''
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 23, 2016

2016-03-09T15:31:12Z

Asgruber:

== Classwork ==

'''Coding Notes:'''
*Use "expected_count" in coding NOT "FPKM" or else we are re-normalizing previously normalized data.
*We have been #filtering to keep only those genes who mean expression is >10; however, for strategic clustering play with this value to see how the size of toSearch changes because nothing is special about the number 10.
'''myMeans <- apply(as.matrix(myCountData), 1, mean)
toSearch <- myCountData[myMeans > 10 & !is.na9myMeans,]'''
*Look at "toSearch" to find interesing genes and make new clusters by plugging in those genes to the command line.
*''Big thanks to Elise and Kathryn for helping me conquer Rstudio and for their patience!''

CHECK KATHRYN'S RSTUDIO NOTES.

== Gene Search: ==
We can coordinate with other organ groups and see if we all converge on the same gene from different approaches. Although we can differentiate between fed and non-fed with all of our existing knowledge, we would like help with the gene ontology search so that we can search for gene function instead of just a transcription factor.

Remember to keep in mind the '''''scientific goal:''''' find genes that are differentially expressed between fed and non-fed snakes and try to find candidates for genes that are at the beginning of the cascade.

From there, we will see where our work takes us and hunt down the genes that we find.

==== Supervised Clustering Attempts: ====
*394 Contig110_GNA13_Guanine_nucleotide-binding_protein_subunit_alpha-13_Homo_sapiens_2
*Contig77_FOXO1_Forkhead_box_protein_O1_Homo_sapiens (Forkhead box proteins are a family of transcription factors)

After clustering with a few different genes, it seems as though our clustering is sensitive to the number of reads. Snake 4 has significantly fewer reads than the other snakes and has appeared as an outlier in each cluster.

The heat maps generated from the above clustering were poorly constructed and insignificant; therefore, they are not included. (I SHOULD MAYBE STILL ADD ONE THOUGH)

=== Questions to Consider: ===
*Should we set a more strict threshold?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 18, 2016

2016-03-09T15:11:47Z

Asgruber:

== Classwork ==

Dr. Campbell and Dr. Heyer made significant progress on coding work for correlation. They changed the p-value from 0.05 to 0.01 so that R-Studio generates a shorter list of genes. (FIND DOCUMENT WITH THE CODE ON IT). Dr. Campbell and Dr. Heyer also mentioned a code that allows us to find gene names, "write.csv(colnames(carp)..." We want to use this code to find a seed gene, and then find genes most correlated to that seed gene.

Other groups in the class are working formatting sequence lists. One group is using Blast 2Go while another group is using gene names from a file and pairing them with gene ontology terms to format sequences with a sequence-based method or name-based method, respectively.

''Coding Notes:''
*"t" in code = transforming/transposing x and y axis.

== Gene Search ==
All six liver samples are viable. However, we still need to verify intestine samples 3 and 6.

After attaining a list of genes from a previous blast of over represent genes, we tried to match these genes with genes in intestine samples 3 and 6. We googled the names of housekeeping genes from [http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Notes_2/16/16 Kathryn's list of housekeeping genes] to see if we can decode the differences in gene identifications and match them to genes that would verify our tissue samples. We also attempted to find housekeeping genes given the list of genes in Castoe et al. (2013). As we continued our gene search, we remembered that transcription factors are not always transcribed in high quantities so we must look for change in expression (on vs. off) instead of only looking at quantity of expression.

=== Identified Genes: ===
*Housekeeping genes in Python:
**NHE3: a sodium transporter in the intestinal membrane.
**Reference [http://www.annualreviews.org/doi/full/10.1146/annurev.physiol.67.031103.153004 NHE3 article] for more information.

''Note: Ashlyn left class early this day to travel for Track & Field competition.''

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 18, 2016

2016-03-09T15:07:22Z

Asgruber:

== Classwork ==

Dr. Campbell and Dr. Heyer made significant progress on coding work for correlation. They changed the p-value from 0.05 to 0.01 so that R-Studio generates a shorter list of genes. (FIND DOCUMENT WITH THE CODE ON IT). Dr. Campbell and Dr. Heyer also mentioned a code that allows us to find gene names, "write.csv(colnames(carp)..." We want to use this code to find a seed gene, and then find genes most correlated to that seed gene.

Other groups in the class are working formatting sequence lists. One group is using Blast 2Go while another group is using gene names from a file and pairing them with gene ontology terms to format sequences with a sequence-based method or name-based method, respectively.

''Coding Notes:''
*"t" in code = transforming/transposing x and y axis.

== Gene Search ==
All six liver samples are viable. However, we still need to verify intestine samples 3 and 6.

After attaining a list of genes from a previous blast of over represent genes, we tried to match these genes with genes in intestine samples 3 and 6. We googled the names of housekeeping genes from [ Kathryn's list of housekeeping genes] to see if we can decode the differences in gene identifications and match them to genes that would verify our tissue samples. As we continued our gene search, we remembered that transcription factors are not always transcribed in high quantities so we must look for change in expression (on vs. off) instead of only looking at quantity of expression.

Housekeeping genes in Pythons: NHE3 is a sodium transporter in intestinal membrane. Let's see if it matches up with one of our samples. [http://www.annualreviews.org/doi/full/10.1146/annurev.physiol.67.031103.153004 NHE3 article]

Also look for housekeeping genes given in Castoe et all (2013). We think they provide housekeeping genes

''Note: Ashlyn left class early this day to travel for Track & Field competition.''

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 18, 2016

2016-03-09T15:00:02Z

Asgruber:

== Classwork ==

Dr. Campbell and Dr. Heyer made significant progress on coding work for correlation. They changed the p-value from 0.05 to 0.01 so that R-Studio generates a shorter list of genes. (FIND DOCUMENT WITH THE CODE ON IT). Dr. Campbell and Dr. Heyer also mentioned a code that allows us to find gene names, "write.csv(colnames(carp)..." We want to use this code to find a seed gene, and then find genes most correlated to that seed gene.

As we began our gene search, we remembered that transcription factors are not always transcribed in high quantities so we must look for change in expression (on vs. off) instead of only looking at quantity of expression.

Other groups in the class are working formatting sequence lists. One group is using Blast 2Go while another group is using gene names from a file and pairing them with gene ontology terms to format sequences with a sequence-based method or name-based method, respectively.

''Coding Notes:''
*"t" in code = transforming/transposing x and y axis.

All 6 are good liver samples. We still need to identify/verify intestine samples 3 and 6. Look in excel sheet Kathryn shared. Blasting over represented genes gave us genes. Look at and cite Kathryn's page. Google names on list of housekeeping genes and see if we can verify/decode/match differences in names to verify tissue samples.

Housekeeping genes in Pythons: NHE3 is a sodium transporter in intestinal membrane. Let's see if it matches up with one of our samples. [http://www.annualreviews.org/doi/full/10.1146/annurev.physiol.67.031103.153004 NHE3 article]

Also look for housekeeping genes given in Castoe et all (2013). We think they provide housekeeping genes

'''Note: Ashlyn left class early this day to travel for Track & Field competition.'''

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T14:44:32Z

Asgruber:

== Classwork ==
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

== Future Direction ==
Four groups will be assigned one of the following tasks:
# Blast 2Go
# Assign GeneOntology terms to our genes
# Ask: What can DEseq do for us?
# Investigate DAVID

The structure of this research course is advantageous because our research will have redundancy without competition as three groups work on the same organ.

'''DAVID Findings:'''

DAVID is a bioinformatics resource that allows a researcher to extract meaning from large gene or protein lists. The DAVID approach will not work for our research, however, because DAVID only works for humans and would require gene IDs for Python genes which do not exist.

''Note: Ashlyn left class early this day to travel for Track & Field competition.''

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T14:44:13Z

Asgruber:

== Classwork ==
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

== Future Direction ==
Four groups will be assigned one of the following tasks:
# Blast 2Go
# Assign GeneOntology terms to our genes
# Ask: What can DEseq do for us?
# Investigate DAVID

The structure of this research course is advantageous because our research will have redundancy without competition as three groups work on the same organ.

'''DAVID Findings:'''

DAVID is a bioinformatics resource that allows a researcher to extract meaning from large gene or protein lists. The DAVID approach will not work for our research, however, because DAVID only works for humans and would require gene IDs for Python genes which do not exist.

*Note: Ashlyn left class early this day to travel for Track & Field competition.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:43:58Z

Asgruber:

== Classwork ==

Currently, the class is still trying to do gene ontology searches. The class is also still trying to verify organ samples and determine if any samples should be discarded as our research progresses. We also discussed in groups DEseq and learned that DEseq allows researchers to contrast two conditions and determine the differences in expression between the conditions. Ideally, when dealing with a biological experiment, conditions would be tested in replicate. Furthermore, I, personally, did not have the programming or genomics background to make sense of most of the content in the DEseq link provided by the syllabus.

== Gene Search ==

We began our house-keeping gene search by looking through genes in the IntestineResult data in the BIO343 folder. From the Intestine Blast, interesting genes were highlighted in red and were a good starting point for genes that could help us verify the tissue samples. After identifying a gene of interest, we searched the gene in google to learn more about its function and other characteristics. The goal of our gene search is to verify all six intestine samples in order to be confident moving forward with them.

=== Identified Genes: ===
'''Intestine 4 fed'''
*Transcription factor_2 Gallus gallus
**Biological process: negative regulation of endopeptidase activity
**Belongs to the COUP transcription factor 2 group.
**Necessary for expression of chicken ovalbumin gene.
**MAKE NOTE OF WHETHER IT WAS UP OR DOWN REGULATED!

Reference [http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Notes_2/16/16 Kathryn's 2/16/16 notes] for a more complete list of genes to verify tissue samples.

'''''Moving forward, after tissues are verified we want to identify transcription factors and research the genes that correlate with the identified transcription factors using supervised clustering.'''''

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:43:09Z

Asgruber:

== Classwork ==

Currently, the class is still trying to do gene ontology searches. The class is also still trying to verify organ samples and determine if any samples should be discarded as our research progresses. We also discussed in groups DEseq and learned that DEseq allows researchers to contrast two conditions and determine the differences in expression between the conditions. Ideally, when dealing with a biological experiment, conditions would be tested in replicate. Furthermore, I, personally, did not have the programming or genomics background to make sense of most of the content in the DEseq link provided by the syllabus.

== Gene Search ==

We began our house-keeping gene search by looking through genes in the IntestineResult data in the BIO343 folder. From the Intestine Blast, interesting genes were highlighted in red and were a good starting point for genes that could help us verify the tissue samples. After identifying a gene of interest, we searched the gene in google to learn more about its function and other characteristics. The goal of our gene search is to verify all six intestine samples in order to be confident moving forward with them.

=== Identified Genes: ===
'''Intestine 4 fed'''
*Transcription factor_2 Gallus gallus
**Biological process: negative regulation of endopeptidase activity
**Belongs to the COUP transcription factor 2 group.
**Necessary for expression of chicken ovalbumin gene.
**MAKE NOTE OF WHETHER IT WAS UP OR DOWN REGULATED!

Reference [http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Notes_2/16/16 Kathryn's 2/16/16 notes] for a more complete list of genes to verify tissue samples.

'''''Moving forward, after tissues are verified we want to identify transcription factors and research the genes that correlate with the identified transcription factors using supervised clustering.'''''

*Note: Ashlyn left class early this day to travel for Track & Field competition.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:42:37Z

Asgruber:

== Classwork ==

Currently, the class is still trying to do gene ontology searches. The class is also still trying to verify organ samples and determine if any samples should be discarded as our research progresses. We also discussed in groups DEseq and learned that DEseq allows researchers to contrast two conditions and determine the differences in expression between the conditions. Ideally, when dealing with a biological experiment, conditions would be tested in replicate. Furthermore, I, personally, did not have the programming or genomics background to make sense of most of the content in the DEseq link provided by the syllabus.

== Gene Search ==

We began our house-keeping gene search by looking through genes in the IntestineResult data in the BIO343 folder. From the Intestine Blast, interesting genes were highlighted in red and were a good starting point for genes that could help us verify the tissue samples. After identifying a gene of interest, we searched the gene in google to learn more about its function and other characteristics. The goal of our gene search is to verify all six intestine samples in order to be confident moving forward with them.

=== Identified Genes: ===
'''Intestine 4 fed'''
*Transcription factor_2 Gallus gallus
**Biological process: negative regulation of endopeptidase activity
**Belongs to the COUP transcription factor 2 group.
**Necessary for expression of chicken ovalbumin gene.
**MAKE NOTE OF WHETHER IT WAS UP OR DOWN REGULATED!

Reference [http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Notes_2/16/16 Kathryn's February 16, 2016 notes] for a more complete list of genes to verify tissue samples.

'''''Moving forward, after tissues are verified we want to identify transcription factors and research the genes that correlate with the identified transcription factors using supervised clustering.'''''

*Note: Ashlyn left class early this day to travel for Track & Field competition.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:37:43Z

Asgruber: /* Identified Genes */

== Classwork ==

Currently, the class is still trying to do gene ontology searches. The class is also still trying to verify organ samples and determine if any samples should be discarded as our research progresses. We also discussed in groups DEseq and learned that DEseq allows researchers to contrast two conditions and determine the differences in expression between the conditions. Ideally, when dealing with a biological experiment, conditions would be tested in replicate. Furthermore, I, personally, did not have the programming or genomics background to make sense of most of the content in the DEseq link provided by the syllabus.

== Gene Search ==

We began our house-keeping gene search by looking through genes in the IntestineResult data in the BIO343 folder. From the Intestine Blast, interesting genes were highlighted in red and were a good starting point for genes that could help us verify the tissue samples. After identifying a gene of interest, we searched the gene in google to learn more about its function and other characteristics. The goal of our gene search is to verify all six intestine samples in order to be confident moving forward with them.

=== Identified Genes: ===
'''Intestine 4 fed'''
*Transcription factor_2 Gallus gallus
**Biological process: negative regulation of endopeptidase activity
**Belongs to the COUP transcription factor 2 group.
**Necessary for expression of chicken ovalbumin gene.
**MAKE NOTE OF WHETHER IT WAS UP OR DOWN REGULATED!

'''''Moving forward, after tissues are verified we want to identify transcription factors and research the genes that correlate with the identified transcription factors using supervised clustering.'''''

Make notes about referencing Kathryn's page for gene descriptions and housekeeping genes.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:37:28Z

Asgruber:

== Classwork ==

Currently, the class is still trying to do gene ontology searches. The class is also still trying to verify organ samples and determine if any samples should be discarded as our research progresses. We also discussed in groups DEseq and learned that DEseq allows researchers to contrast two conditions and determine the differences in expression between the conditions. Ideally, when dealing with a biological experiment, conditions would be tested in replicate. Furthermore, I, personally, did not have the programming or genomics background to make sense of most of the content in the DEseq link provided by the syllabus.

== Gene Search ==

We began our house-keeping gene search by looking through genes in the IntestineResult data in the BIO343 folder. From the Intestine Blast, interesting genes were highlighted in red and were a good starting point for genes that could help us verify the tissue samples. After identifying a gene of interest, we searched the gene in google to learn more about its function and other characteristics. The goal of our gene search is to verify all six intestine samples in order to be confident moving forward with them.

=== Identified Genes ===
'''Intestine 4 fed'''
*Transcription factor_2 Gallus gallus
**Biological process: negative regulation of endopeptidase activity
**Belongs to the COUP transcription factor 2 group.
**Necessary for expression of chicken ovalbumin gene.
**MAKE NOTE OF WHETHER IT WAS UP OR DOWN REGULATED!

'''''Moving forward, after tissues are verified we want to identify transcription factors and research the genes that correlate with the identified transcription factors using supervised clustering.'''''

Make notes about referencing Kathryn's page for gene descriptions and housekeeping genes.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:36:29Z

Asgruber:

== Classwork ==

Currently, the class is still trying to do gene ontology searches. The class is also still trying to verify organ samples and determine if any samples should be discarded as our research progresses. We also discussed in groups DEseq and learned that DEseq allows researchers to contrast two conditions and determine the differences in expression between the conditions. Ideally, when dealing with a biological experiment, conditions would be tested in replicate. Furthermore, I, personally, did not have the programming or genomics background to make sense of most of the content in the DEseq link provided by the syllabus.

== Gene Search ==

We began our house-keeping gene search by looking through genes in the IntestineResult data in the BIO343 folder. From the Intestine Blast, interesting genes were highlighted in red and were a good starting point for genes that could help us verify the tissue samples. After identifying a gene of interest, we searched the gene in google to learn more about its function and other characteristics. The goal of our gene search is to verify all six intestine samples in order to be confident moving forward with them.

==== Identified Genes ====
=== Intestine 4 fed ====
*Transcription factor_2 Gallus gallus
**Biological process: negative regulation of endopeptidase activity
**Belongs to the COUP transcription factor 2 group.
**Necessary for expression of chicken ovalbumin gene.
**MAKE NOTE OF WHETHER IT WAS UP OR DOWN REGULATED!

'''''Moving forward, after tissues are verified we want to identify transcription factors and research the genes that correlate with the identified transcription factors using supervised clustering.'''''

Make notes about referencing Kathryn's page for gene descriptions and housekeeping genes.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T14:09:54Z

Asgruber:

== Classwork ==
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

== Future Direction ==
Four groups will be assigned one of the following tasks:
# Blast 2Go
# Assign GeneOntology terms to our genes
# Ask: What can DEseq do for us?
# Investigate DAVID

The structure of this research course is advantageous because our research will have redundancy without competition as three groups work on the same organ.

'''DAVID Findings:'''

DAVID is a bioinformatics resource that allows a researcher to extract meaning from large gene or protein lists. The DAVID approach will not work for our research, however, because DAVID only works for humans and would require gene IDs for Python genes which do not exist.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 16, 2016

2016-03-09T14:08:20Z

Asgruber:

== Classwork ==

*Still trying to do searches by gene ontology terms
*Still trying to figure out if our organ samples are good samples (using DEseq?)
**DEseq = differential expression of RNA-seq
**Which ones should we continue on with

DEseq notes- reports the number of reads assigned to each gene

"To
contrast two conditions, e.g., to see whether there is di�erential expression between conditions\untreated"and\treated",
we simply call the function
nbinomTest
. It performs the tests as described in [1] and returns a data frame with the
p
values and other useful information"
Need to test things in replicate when dealing with a biological experiment.
Personally, I do not have the programming or genomics background to make sense of the DEseq link provided in the syllabus.

LOOK FOR HOUSE KEEPING GENE:
*Started by looking in Intestine Result folder in BIO343 folder.
Intestine 4 fed:
*searched transcription factor, then googled each one to find out what they do
**transcription factor_2 Gallus Gallus
***COUP transcription factor 2
****necessary for expression of chicken ovalbumin gene
****biological process is negative regulation of endopeptidase activity

Intestine Blast interesting genes highlighted in red- looking at these to figure out if we can use them/ if they are only in the intestine/ if they can help us verify our tissue sample.
For Thursday, share what we've found (I don't think I've found anything), BUT we are trying to verify all 12 samples to make sure which ones we should carry on with.
Dr. Campbell and Dr. Heyer compared liver fed to no_fed and found 3 transcription factors, then they want to change the way it clusters and look at all the genes that correlate with those transcription factors (supervised cluster).
***"species = human"

Make notes about referencing Kathryn's page for gene descriptions and housekeeping genes.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T04:51:32Z

Asgruber:

== Classwork ==
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

== Future Direction ==
Four groups will be assigned one of the following tasks:
# Blast 2Go
# Assign GeneOntology terms to our genes
# Ask: What can DEseq do for us?
# Investigate DAVID

The structure of this research course is advantageous because our research will have redundancy without competition as three groups work on the same organ.

'''DAVID Findings:'''

DAVID is a bioinformatics resource that allows a researcher to extract meaning from large gene or protein lists. Apart for this basic information, the DAVID group, including myself, did not find DAVID to be of particular use to our research WHY?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T04:50:41Z

Asgruber:

== Classwork ==
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

==== Future Direction: ====
Four groups will be assigned one of the following tasks:
# Blast 2Go
# Assign GeneOntology terms to our genes
# Ask: What can DEseq do for us?
# Investigate DAVID

The structure of this research course is advantageous because our research will have redundancy without competition as three groups work on the same organ.

'''DAVID Findings:'''

DAVID is a bioinformatics resource that allows a researcher to extract meaning from large gene or protein lists. Apart for this basic information, the DAVID group, including myself, did not find DAVID to be of particular use to our research WHY?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T04:50:23Z

Asgruber:

February 11, 2016

2016-03-09T04:49:51Z

Asgruber:

== Classwork ==
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

==== Future Direction: ====
Four groups will be assigned one of the following tasks:
# Blast 2Go
# Assign GeneOntology terms to our genes
# Ask: What can DEseq do for us?
# Investigate DAVID

The structure of this research course is advantageous because our research will have redundancy without competition as three groups work on the same organ.

''DAVID Findings:''
DAVID is a bioinformatics resource that allows a researcher to extract meaning from large gene or protein lists. Apart for this basic information, the DAVID group, including myself, did not find DAVID to be of particular use to our research WHY?

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 11, 2016

2016-03-09T04:43:32Z

Asgruber:

=== Classwork ===
==== Cluster Self-Quiz ====
Take away: you can cluster anything if you change the threshold value.

We have redundancy without competition because we have three groups working on the same organ.

Where we are going...
4 groups with different tasks: 1) Blast 2Go. 2) Assign GeneOntology terms to our genes, 3) Ask what can DEseq do for us?, 4) Investigate DAVID

DAVID group:
-bioinformatics resource:
-extract meaning from large gene/protein lists

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 9, 2016

2016-03-09T04:42:03Z

Asgruber:

== Classwork ==

=== Clustering Activity ===
==== Clustering ====
'''''Clustering:''''' grouping genes and samples together and presenting them in a specific order.

Clustering is used as a data reduction analysis. It is representative of data points rather than an entire data set. When clustering, we seek to gain an understanding of patterns in a data set, so that they may be tested statistically. While analyzing patterns, it is important to consider the utility of log transformations, co-regulations, and direct/indirect relationships of genes. Both negative and positive correlations can be interesting and lead to important discoveries.

'''''Hierarchical Clustering:''''' joins the two most similar genes, then the next two most similar genes or cluster of genes until all genes have been joined.

In hierarchical clustering, after two genes or cluster of genes are joined, they cannot be pulled apart regardless of what future discoveries in data reveal. The biggest problem with hierarchical clustering is that it does not consider all data components together. Furthermore, no gene is left behind in hierarchical clustering; correlations begin with a value of 1 and end with a value of -1.

'''''K-means Clustering:''''' specifies how many clusters to form by randomly assigning each gene to one of k different clusters.

In K-means clustering, the average expression of all genes in each cluster is used to create k pseudo genes. Genes can be rearranged by assigning each one to the cluster represented by the pseudo gene to which it is most similar. K-means clustering can be repeated until there is convergence.

'''''Supervised Clustering:''''' finds genes in expression file whose patterns are highly similar to the desired gene or pattern.

Supervised clustering adds the closest gene first. Then, the gene closest to all of the genes already in a cluster is added. This process continues as long as the added gene is within the specified distance of genes already in cluster. The specified distance from one gene to a set of genes can be defined as the maximum, minimum, or average of all distances to individual members of the set (complete, single, and average linkage, respectively).

'''''Cutting the Tree:''''' the process of grouping genes by determining a threshold value in the dendrogram.

In cutting the tree, cut the dendrogram at different points and see what genes or clusters of genes are still clustered together. Genes that are still together are part of a cluster. Different clusters arise depending on where the tree was cut.

==== Intensity Plots ====
'''''Intensity plots''''' compare gene expression profiles. Proximity measures include: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, and dissimilarities that may or may not be metrics.

We want our intensity plots to compare the genes and expression patterns between fed and non-fed snakes.

In order to measure the similarity or dissimilarity to the cluster, one much determine which linkage method to use.

'''''Linkage Methods:'''''
*''Complete linkage:'' define the cluster by taking the average of the cluster's components and then treat the average like an individual to compare it to other genes.
*''Incomplete linkage:'' average the gene of interest to all of the distances included in a cluster.(?)
*''Mediode linkage:'' use a max/min approach, including a gene to the cluster if it is closest to one or all of the other genes in the cluster.

==== QT clust ====

One can also use QT clust instead of a heat map with the following steps:
# each gene builds a supervised cluster
#Gene with "best" list, and genes in its list, becomes next cluster
#Remove these genes from consideration, and repeat
#Stop when all genes are clustered, or largest cluster is smaller than user specified threshold

=== Questions to Consider: ===
*How do you compare one thing to a group of things?
*How can we track genes that match with a transcription factor?

'''''Moving Forward:'''''
*Remember, there is no one perfect, correct answer. Therefore, chase things that are of interest to you and cluster; however, practice restraint.
*It will be important to track genes that match with a transcription factor. Although a transcription factor might be small, big changes may still correlate with it.
*Gene ontology terms will help the clustering process.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

February 9, 2016

2016-03-09T04:41:45Z

Asgruber: /* Intensity Plots */

February 9, 2016

2016-03-09T04:40:51Z

Asgruber:

== Classwork ==

=== Clustering Activity ===
==== Clustering ====
'''''Clustering:''''' grouping genes and samples together and presenting them in a specific order.

Clustering is used as a data reduction analysis. It is representative of data points rather than an entire data set. When clustering, we seek to gain an understanding of patterns in a data set, so that they may be tested statistically. While analyzing patterns, it is important to consider the utility of log transformations, co-regulations, and direct/indirect relationships of genes. Both negative and positive correlations can be interesting and lead to important discoveries.

'''''Hierarchical Clustering:''''' joins the two most similar genes, then the next two most similar genes or cluster of genes until all genes have been joined.

In hierarchical clustering, after two genes or cluster of genes are joined, they cannot be pulled apart regardless of what future discoveries in data reveal. The biggest problem with hierarchical clustering is that it does not consider all data components together. Furthermore, no gene is left behind in hierarchical clustering; correlations begin with a value of 1 and end with a value of -1.

'''''K-means Clustering:''''' specifies how many clusters to form by randomly assigning each gene to one of k different clusters.

In K-means clustering, the average expression of all genes in each cluster is used to create k pseudo genes. Genes can be rearranged by assigning each one to the cluster represented by the pseudo gene to which it is most similar. K-means clustering can be repeated until there is convergence.

'''''Supervised Clustering:''''' finds genes in expression file whose patterns are highly similar to the desired gene or pattern.

Supervised clustering adds the closest gene first. Then, the gene closest to all of the genes already in a cluster is added. This process continues as long as the added gene is within the specified distance of genes already in cluster. The specified distance from one gene to a set of genes can be defined as the maximum, minimum, or average of all distances to individual members of the set (complete, single, and average linkage, respectively).

'''''Cutting the Tree:''''' the process of grouping genes by determining a threshold value in the dendrogram.

In cutting the tree, cut the dendrogram at different points and see what genes or clusters of genes are still clustered together. Genes that are still together are part of a cluster. Different clusters arise depending on where the tree was cut.

==== Intensity Plots ====
'''''Intensity plots''''' compare gene expression profiles. Proximity measures include: correlation, Euclidean distance, inner product XY, Hamming distance, L1 distance, and dissimilarities that may or may not be metrics.

We want our intensity plots to compare the genes and expression patterns between fed and non-fed snakes.

In order to measure the similarity or dissimilarity to the cluster, one much determine which linkage method to use.

'''''Linkage Methods:'''''
*''Complete linkage:'' define the cluster by taking the average of the cluster's components and then treat the average like an individual to compare it to other genes.
*''Incomplete linkage:'' average the gene of interest to all of the distances included in a cluster.(?)
*''Mediode linkage:'' use a max/min approach, including a gene to the cluster if it is closest to one or all of the other genes in the cluster.

One can also use QTclust instead of a heat map with the following steps:
# each gene builds a supervised cluster
#Gene with "best" list, and genes in its list, becomes next cluster
#Remove these genes from consideration, and repeat
#Stop when all genes are clustered, or largest cluster is smaller than user specified threshold

=== Questions to Consider: ===
*How do you compare one thing to a group of things?
*How can we track genes that match with a transcription factor?

'''''Moving Forward:'''''
*Remember, there is no one perfect, correct answer. Therefore, chase things that are of interest to you and cluster; however, practice restraint.
*It will be important to track genes that match with a transcription factor. Although a transcription factor might be small, big changes may still correlate with it.
*Gene ontology terms will help the clustering process.

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 8, 2016

2016-03-08T19:54:21Z

Asgruber:

INCLUDE INFO FROM DR. CAMPBELL AND DR. HEYER EMAIL FROM SB

When you write a paper, think about the images you need to tell a story.
List of 40 genes from Castoe
Look at POUF converter

Save as Contig77Foxhead

Cross reference Kathryn's list from correlation clustering to Castoe's list from Dr. Campbell- genes that show up on both were genes with sustained differential expression from feeding to 6 hours (gene in both lists means its differentially expressed at both time points), genes only in Castoe we are not interested in, genes only in ours we are very interested in because they are turned on at feeding and turned off at 6 hours post-feeding
contig- piece of DNA stitched together. Numbers are size of bases in contig

http://gcat.davidson.edu/mediawiki-1.19.1/index.php/File:March_8_Normalized_correlation_clustering.png

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 8, 2016

2016-03-08T19:12:22Z

Asgruber:

INCLUDE INFO FROM DR. CAMPBELL AND DR. HEYER EMAIL FROM SB

When you write a paper, think about the images you need to tell a story.
List of 40 genes from Castoe
Look at POUF converter

http://gcat.davidson.edu/mediawiki-1.19.1/index.php/File:March_8_Normalized_correlation_clustering.png

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 8, 2016

2016-03-08T19:11:17Z

Asgruber:

INCLUDE INFO FROM DR. CAMPBELL AND DR. HEYER EMAIL FROM SB

When you write a paper, think about the images you need to tell a story.
List of 40 genes from Castoe
Look at POUF converter

[[File:March 8 normalized clustering]]

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

File:March 8 Normalized correlation clustering.png

2016-03-08T19:09:56Z

Asgruber: Correlation clustering after accounting for normalization

Correlation clustering after accounting for normalization

March 8, 2016

2016-03-08T19:04:35Z

Asgruber:

INCLUDE INFO FROM DR. CAMPBELL AND DR. HEYER EMAIL FROM SB

When you write a paper, think about the images you need to tell a story.
List of 40 genes from Castoe
Look at POUF converter

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]

March 8, 2016

2016-03-08T15:54:44Z

Asgruber:

INCLUDE INFO FROM DR. CAMPBELL AND DR. HEYER EMAIL FROM SB

[http://gcat.davidson.edu/mediawiki-1.19.1/index.php/Ashlyn Ashlyn's Main Page]