Dylan
Contents
Myb Transcription Factors
Myb is actually an acronym taken from “myeloblastosis,” an old name for a certain type of leukemia. Studies have established that the absence of Myb proteins causes an increase in mitotic arrest, abnormal chromosome number, and faulty spindle formation. The protein localizes to recently replicated DNA in mitotically cycling and endocycling cells, regulating gene expression by binding directly to the DNA. The graphic representation below shows the helical structure of the protein (light blue) as it works in the cell: fitting into the major groove of the DNA double helix (red and yellow).[1]
Myb Protein Structure
Myb proteins are defined by the aptly named Myb domain, a sequence of approximately 50 amino acids. They are further divided into three subfamilies based on the number of times that this domain repeats. The protein may have one (designated ‘MYB1R’), two (‘R2R3-MYB’), or three (‘MYB3R’) sets of these repeats.[3] The repeat domains are imperfect, but highly conserved; analysis of diverse eukaryotic organisms has determined that each Myb repeat is more closely related to other members of the same family than to other repeats within the same protein.[4] These repeats designate the proteins ability to bind to DNA and subsequently regulate transcription of genes.[5]
Myb Transcription Factors in Plants
A study conducted by Ban et al. explored the function of the Myb transcription factors in apples. The study focused on the regulation of anthocyanins, a member of the flavonoid family that, among other things, gives the fruit its red color.[7] Anthocyanins change color with pH, allowing them to also give blueberries their blue color.[8] Myb proteins were believed prior to this study to have some control over anthocyanin expression,[9] so Ban and his partners chose it as their primary topic. The health benefits of anthocyanin are potentially great, with laboratory results suggesting positive effects against cancer, aging and neurological diseases, inflammation, diabetes, and bacterial infections. [10] Unfortunately, anthocyanin is poorly conserved in the body during digestion; less than 5% of the starting mass is properly absorbed. [11] Therefore, any treatment using this chemical would need to be specially treated and perhaps in exceptionally high concentration. Knowledge of the Myb transcription factors could help to produce this chemical quickly and cheaply from modified plants.
Current Work
BLASTing known Myb sequences against the blueberry genome, mostly from grape. Sequences used in grape:
- myb4a (Gene ID: 100233133)
- myb4b (Gene ID: 100245558)
- myb12 (Gene ID: 100260656)
- vvmyba1 (Gene ID: 100233098)
- myba1 (Gene ID: 100255007)
- myba6 (Gene ID: 100243253)
- myba7 (Gene ID: 100265568)
- mybcs1 (Gene ID: 100233122)
- vvmyba2 (Gene ID: 100232838)
Also in arabidopsis:
- myb30 (TAIR:AT3G28910)
- PAP1 (TAIR:AT1G56650)
- PAP1 alternate (TAIR:AT3G16500)
These genes were BLASTed against three databases containing the known blueberry genome: bb_latest_assembly.fasta, BB_EST_ALL_updated.fasta.cap.contigs, and Illumina_Data using line command searches. No hits were found for any gene in any database.
Genes were obtained from the NCBI database.
Potential Directions
An attempt will be made to find the amino acid sequence of the highly conserved Myb domain in the Myb transcription factor. This will require a new BLAST technique, as described below in this section of the NCBI Blast Program Selection Guide.
- 4.10 "Protein query vs translated database (tblastn)" is useful for finding protein homologs in unannotated nucleotide data.
- A tblastn search allows you to compare a protein sequence to the six-frame translations of a nucleotide database. It can be a very productive way of finding homologous protein coding regions in unannotated nucleotide sequences such as expressed sequence tags (ESTs) and draft genome records (HTG), located in the BLAST databases est and htgs, respectively.
- ESTs are short, single-read cDNA sequences. They comprise the largest pool of sequence data for many organisms and contain portions of transcripts from many uncharacterized genes. Since ESTs have no annotated coding sequences, there are no corresponding protein translations in the BLAST protein databases. Hence a tblastn search is the only way to search for these potential coding regions at the protein level. The HTG sequences, draft sequences from various genome projects or large genomic clones, are another large source of unannotated coding regions.
- Like all translating searches, the tblastn search is especially suited to working with error prone data like ESTs and draft genomic sequences from HTG because it combines BLAST statistics for hits to multiple reading frames and thus is robust to frame shifts introduced by sequencing error.
This new tool should allow users to search within a nucleotide sequence database for a given amino acid sequence. Hopefully, once I can find the protein sequence of the Myb binding domain, I can search for it within the genome. Myb, here we come!
Update
The tblastn tool works! I ran a control by copying a nucleotide sequence from the genome and translating it into an amino acid sequence. I then blasted that back against the database and got a hit within the expected scaffold. Now I just have to find the amino acid sequence for the Myb domain (hopefully as highly conserved as the literature is insinuating) and BLAST it using the tblastn command.
- Multiple alignment of 20 representative R2R3 MYB domains from Arabidopsis and six MYB domains from characterised grape MYB genes. Identical amino acid residues are shaded in yellow and the blue and white boxes indicate the extent of the R2 and R3 repeats. The consensus sequence shown under the alignment was used to search for MYB homologues in the Grape Genome. [12]