DM Notes 2.16.16

From GcatWiki
Jump to: navigation, search

Continuing from last class:

  • Pull organism GAF from GO database (Saipriya and I pulled the organism names. There are 443 different ones, though some are viruses, fungi, bacteria, etc.) The organisms GO has account for ~75% of the genes in our file.
  • Pull gene names for each organism from contig file and associate those w/ the GAF GO ID#s
  • Associate GO IDs with terms

Last Thursday, we pulled the organisms referenced in the python genome file. We also counted how many contigs listed each organism, and found that the most annotated ~17 organisms accounted for >75% of the genes in the file.

Today, we are writing a script that will pull the organism and the gene symbol from each contig, and put them into a dictionary of dictionaries. It will be formatted as follows:

organism_dictionary[organism name] = {gene symbols}

gene_symbols[gene symbol] = [GO terms associated w/ gene symbol]

This didn't entirely work, so we're formulating a new approach.

Back to home Dylan Maghini