Difference between revisions of "DM Notes 2.16.16"
From GcatWiki
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | Continuing from last class: | ||
+ | *Pull organism GAF from GO database (Saipriya and I pulled the organism names. There are 443 different ones, though some are viruses, fungi, bacteria, etc.) The organisms GO has account for ~75% of the genes in our file. | ||
+ | *Pull gene names for each organism from contig file and associate those w/ the GAF GO ID#s | ||
+ | *Associate GO IDs with terms | ||
+ | Last Thursday, we pulled the organisms referenced in the python genome file. We also counted how many contigs listed each organism, and found that the most annotated ~17 organisms accounted for >75% of the genes in the file. | ||
+ | Today, we are writing a script that will pull the organism and the gene symbol from each contig, and put them into a dictionary of dictionaries. It will be formatted as follows: | ||
+ | organism_dictionary[organism name] = {gene symbols} | ||
+ | gene_symbols[gene symbol] = [GO terms associated w/ gene symbol] | ||
+ | |||
+ | This didn't entirely work, so we're formulating a new approach. | ||
Back to home [[Dylan Maghini]] | Back to home [[Dylan Maghini]] |
Latest revision as of 18:33, 18 February 2016
Continuing from last class:
- Pull organism GAF from GO database (Saipriya and I pulled the organism names. There are 443 different ones, though some are viruses, fungi, bacteria, etc.) The organisms GO has account for ~75% of the genes in our file.
- Pull gene names for each organism from contig file and associate those w/ the GAF GO ID#s
- Associate GO IDs with terms
Last Thursday, we pulled the organisms referenced in the python genome file. We also counted how many contigs listed each organism, and found that the most annotated ~17 organisms accounted for >75% of the genes in the file.
Today, we are writing a script that will pull the organism and the gene symbol from each contig, and put them into a dictionary of dictionaries. It will be formatted as follows:
organism_dictionary[organism name] = {gene symbols}
gene_symbols[gene symbol] = [GO terms associated w/ gene symbol]
This didn't entirely work, so we're formulating a new approach.
Back to home Dylan Maghini