Difference between revisions of "JP Feb 18 16"
Jupreziosi (talk | contribs) |
Jupreziosi (talk | contribs) |
||
Line 21: | Line 21: | ||
We believe this method will yield new results as Todd did his genome a while ago, and new information may have become available since then. | We believe this method will yield new results as Todd did his genome a while ago, and new information may have become available since then. | ||
+ | |||
+ | |||
+ | For example, Todd's data says | ||
+ | |||
+ | ">Contig263_Protein_of_unknown_function | ||
+ | ATGATGATAATAACGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATG | ||
+ | ATGATAACAACAATAATACACATTAAAAGTATCTCAAACCGACAGAGACATAAGGGGATG | ||
+ | GGAAATCTCCTTAGACTGCTCTTTGAGATGAACCTTGGTATAGACTTGGTGGACTTTGGA | ||
+ | CTTTTCTCCAGCTCTCTGGATTATCTCAAGTGGCTTACCTCAAGATTGCAGATCTTGTCA | ||
+ | TGA" | ||
+ | |||
+ | We gave this a number and ran our data through Todd's data set. | ||
+ | (Our excel file (not the same protein)) | ||
+ | Contig1001_Protein_of_unknown_function_2387 Contig1001_Protein_of_unknown_function_2387 1077 1007.87 0 0 0 | ||
+ | |||
+ | Now if we run Todd's sequence for "Protein...2387" through Blast2Go, we can get GO terms for this unknown (now known) protein, and attribute those terms to the proteins in our data. |
Revision as of 19:32, 18 February 2016
Drs. C & H findings:
http://www.bio.davidson.edu/courses/Bio343/2016/Thursday_18Feb_2016.txt
Euclidean distance correlations for clustering. Changed from z scale to absolute value.
Then looked at correlation (1 - correlation), clustering was different, dendrogram was different.
They are now working on supervised clustering; make csv to export gene names based on clustering. Given a seed gene, output correlated genes.
- Transcription factors are not usually highly transcribed - small value; we look for things that are transcribed after feeding.
GAMEPLAN:
Our snake 1-6 data (excel files) were mapped to Todd's python genome (text file) to associate sequences with Gene names. If we can Blast2GO Todd's "protein of unknown function" sequences, we can get gene names and GO terms for these unknown proteins. As they're labeled, we can match the label (ex "...unknown_function_20") to our output with the same label, and frequencies, and find the GO terms for these.
We believe this method will yield new results as Todd did his genome a while ago, and new information may have become available since then.
For example, Todd's data says
">Contig263_Protein_of_unknown_function ATGATGATAATAACGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATG ATGATAACAACAATAATACACATTAAAAGTATCTCAAACCGACAGAGACATAAGGGGATG GGAAATCTCCTTAGACTGCTCTTTGAGATGAACCTTGGTATAGACTTGGTGGACTTTGGA CTTTTCTCCAGCTCTCTGGATTATCTCAAGTGGCTTACCTCAAGATTGCAGATCTTGTCA TGA"
We gave this a number and ran our data through Todd's data set. (Our excel file (not the same protein)) Contig1001_Protein_of_unknown_function_2387 Contig1001_Protein_of_unknown_function_2387 1077 1007.87 0 0 0
Now if we run Todd's sequence for "Protein...2387" through Blast2Go, we can get GO terms for this unknown (now known) protein, and attribute those terms to the proteins in our data.