DM Notes 2.18.16
Dr. Campbell and Dr. Heyer's Progress: Looking at heat map. They had been trying to adjust so that Euclidean distance isn't what determines correlation (can be different magnitudes, but still correlated). Now it's measured by correlation.
- Got rid of scale, but gave it absolute values. Yields about 2 horizontal rows (blue figure).
- But, when they eliminated Euclidean distance, first attempt yielded large red box with two rows showing other coloring. Changed coloring to see on normalized scale. p <=.01
- Code results are online under 2.18 (Dr. C and Dr. H results). The code makes CSVs with gene names. Working on 'if I select one gene, what genes are closely related to that one?' CSV has 28 genes.
- Contigs on that list are good candidates to pursue further.
Blast2GO: FASTA sequences->NCBI->GO annotations
Intestine folk: working on checking samples to see if they're good
Liver folk: working on checking liver samples
Our team: Working with map_to_GO script today. Generated organisms_to_GO_ID.txt from map_to_GO_IDs.py. The script takes a list of organism names, opens the GAFs for those organisms, and associates GO IDs from the GAFs with the genes in our main file (geneNames.txt). The output file has columns as follows: organism name, gene name, and then separate columns for each GO ID associated with the gene symbol.
We should follow up by finding if there are duplicate gene names across organisms, and find how many genes in our file are accounted for. Also, we need to look at DESeq and see how to work with our GO IDs, and if we need to convert those to GO terms rather than IDs. How far back to we need to map the GO terms to be able to group in a significant way?
Back to home Dylan Maghini