Beta-galactosidase (Olivia Ho-Shing)

From GcatWiki
Revision as of 13:52, 9 September 2009 by Olhoshing (talk | contribs) (BLASTp)
Jump to: navigation, search

I chose a well-known predicted gene involved in sugar metabolism for H. mukohataei: 644033004 beta-galactosidase/beta-glucuronidase ( EC:3.2.1.23 )

To verify this predicted protein, I used:

  1. BLASTn
  2. BLASTp
  3. Look for Shine-Dalgarno sequence within 50 bp upstream
  4. GC Calculator

BLASTn

Usually JGI highlights the start and stop codons in red, and any upstream or downstream sequence in green. However, with this nucleotide sequence, there was no start codon highlighted. The first codon of the sequence was TTG.

Here is the distribution and the alignment of the BLAST hits:
Bgalactosidase-Blastn.png
Bgalactosidase-Blastn-align.png

The first relevant BLAST hit I got from the sequence was Synthetic construct beta-galactosidase (lacZnls12co) gene, complete cds

  Query Coverage = 44%
  Score = 206 bits (228)
  E-value = 4e-49
  Identities = 658/1008 (65%)
  Gaps = 73/1008 (7%)
  Strand=Plus/Plus

These BLAST hits weren't as well-aligned as I thought they would be for this protein, and I was surprised that didn't seem to be a definitive start codon. The beginning of the query sequence did not align with the beginning of the hit described above either, but this could just mean that the protein is not well-conserved on the 5' and 3' ends.

BLASTp

Although the nucleotide sequence given by JGI did not begin with a definitive start codon, the amino acid sequence given still began with M, so JGI must use M as the default initiating amino acid without regarding the actual codon. The second codon is AAC, which it does call N as expected. The BLASTp hits aligned with the query very well. The first amino acid in the hits did not match the M though; at least the top 10 hits began with L (TTG).

File:Bgalactosidase-blastp.png

The BLASTp results I think are a strong indication that this gene probably is beta-galactosidase.

Shine Dalgarno and GC content

The 50 bp upstream of the gene sequence are: AGCAATCTGCACGCCGGGACAGCGTGACTGCCTCGCCGTGGGTTCGGCGA
I could not identify what I thought looked like a Shine Dalgarno sequence. Overall it is not a very purine-rich sequence. It may mean the gene is part of an operon, and does not have a Shine Dalgarno sequence directly upstream.

I used the GC calculator to check the GC content of this sequence; it is 67.7%. The average GC content for our genome is 65.64%, but I don't know what the normal GC content of a coding region in our genome is. So I can only verify that the gene doesn't seem to be an alien gene in our genome.