Remove special characters from contig_to_GO_ID.txt file.

Dr. Heyer's suggestions:

  • Had to change “ to no space (quote)
  • Had to change ‘ to _prime (apostrophe)
  • Had to change to ^ (two apostrophes)

We did this. It worked. Sent contig_to_GO_ID.txt file to Campbell/Heyer.

We were able to put the reference genome GAF for proteins of unknown function into the organism_to_GO_ID.txt file.

We should check to see what happens for genes that do not have any GO IDs/if there are any. We should follow up by finding if there are duplicate gene names across organisms, and find how many genes in our file are accounted for. Also, we need to look at DESeq and see how to work with our GO IDs, and if we need to convert those to GO terms rather than IDs. How far back to we need to map the GO terms to be able to group in a significant way?

