ATA box gene transcriptions
Editor-In-Chief: Henry A. Hoff
The ATA box is a variant of the TATA box that appears in the globin and other genes. Instead of a sequence TATA as in the TATA box, the ATA box lacks the first thymine (T) and may be tissue specific.
Expression "of globin genes is tightly regulated. Hemoglobin gene expression is restricted to erythroid cells. The genes are expressed at extremely high levels late in erythroid differentiation, with balanced production of α-globin and β-globin. Paralogous globin genes are expressed at progressive developmental stages. This exquisite regulation is exerted, at least in part, by the binding of specific transcription factors to DNA sequences that serve as cis-regulatory modules (CRMs), such as promoters and enhancers (Maniatis et al. 1987)."[1]
"The major DNA region regulating expression of the globin genes (MRE) is located in an intron of NPRL3 (Higgs et al. 1990)."[1]
"Detailed studies over the past three decades have led to the discovery of numerous CRMs in both the α-globin gene (HBA) and β-globin gene (HBB) clusters [...]. Some are located proximal to and within the genes, such as promoters and internal enhancers (Mellon et al. 1981; Wright et al. 1984; Myers et al. 1986; Antoniou et al. 1988; Wall et al. 1988), and others are located distal to the genes (Grosveld et al. 1987; Talbot et al. 1989; Higgs et al. 1990). For instance, the major regulatory element (MRE) of the HBA gene complex is located distal to the adult HBA genes (∼60 kb upstream in human), residing in an intron of the large NPRL3 gene [...]. Several additional CRMs are present around the MRE (Anguita et al. 2004) both in human and mouse (Fig. 5A,B). A cluster of CRMs called the locus control region (LCR) is found 50–70 kb upstream of the HBB gene (Grosveld et al. 1987; Talbot et al. 1989; Moon and Ley 1990) in human and mouse [...]. These distal regulatory regions are enhancers (Tuan et al. 1989; Ney et al. 1990; Pondel et al. 1992) required for high-level expression of the globin genes (Grosveld et al. 1987; Talbot et al. 1989; Higgs et al. 1990; Bender et al. 2000a; Anguita et al. 2002). They are in regions of open chromatin marked by DNase hypersensitive sites (Forrester et al. 1986, 1990; Vyas et al. 1992; Gourdon et al. 1995), and they can protect against some repressive position effects (Grosveld et al. 1987; Caterina et al. 1991; Milot et al. 1996). They are bound by key transcription factors active in erythroid cells, such as GATA1 and TAL1 (Johnson et al. 2002; Anguita et al. 2004; Grass et al. 2006). The protein CTCF is bound at specific sites in the gene clusters, some of which serve as insulators that localize the effects of distal enhancers on target genes (Bulger et al. 2003)."[1]
Known "cis-regulatory modules (CRMs), and signals for occupancy of TAL1, GATA1, and CTCF [are] based on genome-wide ChIP-seq and the signal for DNase hypersensitive sites (DHSs) based on genome-wide DNase-seq."[1]
Consensus sequences
"The 3' flanking area contained the highly conserved hexanucleotide sequence A-A-T-A-A-A found in eukaryotic messages between the terminator codon and the polyadenylylation site (44)."[2]
"ATA boxes [AATAAA] can be clearly identified in the chicken αA- and αD-globin genes about 70 bp upstream from the initiator ATG codon [...] The sequences of the proposed cap sites agree with those determined for other globin genes (Fig. 6A; Refs. 15, 24, and 32) as do their positions relative to the ATA boxes"[3]
An ATA box may have the sequence AAATAT.[4] The CArG box has the sequence CCTATTATGG.[4]
"The [Sminthopsis crassicaudata putative embryonic β-globin gene] ATA box, located 30 bp 5' to the putative cap site, is of the form AAATAAAA typically found in eutherian embryonic β-like globin genes. In sequence comparisons with ATA boxes from human, mouse, and [Didelphis virginiana] adult and embryonic β-like globin genes, the S.c-ε ATA box was found to most closely resemble that found in the D. virginiana ε-globin gene (Fig 4)."[5]
This suggests a consensus sequence of AAATA(A/T)A on the template strand, or perhaps (A/C/G/T)AATA(A/T)A.
"Aligning the nucleotide sequences of the various amplicons obtained from the buffalo’s DNA samples [...], and the sequence of both the goat and sheep alpha and beta subunits, gave the following results: (i) the size of each of the four different sequenced clones of the subunit alpha was 1311 bp; (ii) the length of the five different sequenced clones of the subunit beta ranged from 1841 to 1960 bp, due to an insertion of 119 nucleotides (data not shown); (iii) the nucleotide sequences of the amplicons of both alpha and beta subunits show the presence of canonical hallmarks characteristic of the haemoglobin genetic structures, namely three exons and two introns with consensus intron/exon splice junctions; (iv) the 5' UTR included all sequences defined as important for transcription and translation events such as the CCAAT box, the ATA box and the presumed mRNA Cap site, whereas, in the 3' UTR, the hexanucleotide AATAAA polyadenylation signal and the poly (A) addition site were present; (v) the ATG start codon and the TAA stop codon were present."[6]
Human genes
GeneID: 3043 HBB hemoglobin subunit beta, "The alpha (HBA) and beta (HBB) loci determine the structure of the 2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta-globin cluster is 5'-epsilon -- gamma-G -- gamma-A -- delta -- beta--3'."[7]
Cystatin genes
The "four cystatin genes [GeneID: 1469 CST1, GeneID: 1470 CST2, GeneID: 1471 CST3, and GeneID: 1472 CST4] contain the ATA-box sequence (ATAAA) in their 5'-flanking regions; however, the CAT-box sequence (CAT), a binding site of the transcription factor, CTF, is found only in the 5'-flanking region of the S-type cystatin genes."[8]
β-thalassemia
"DNA sequence analysis of a cloned β-globin gene from a Chinese patient with β-thalassemia revealed a single nucleotide substitution (A→ G) within the ATA box homology and 28 base pairs upstream from the cap site."[9]
"Comparison of the level of β-globin transcripts in a variety of deletion mutants shows that for efficient transcription, both the ATA or Goldberg–Hogness box, and a region between 100 and 58 base pairs in front of the site at which transcription is initiated, are required. Deletion of either of these regions results in a decrease in the level of β-globin transcripts by an order of magnitude; deletion of the ATA box causes an additional loss in the specificity of the site of initiation of RNA synthesis. The DNA sequences downstream from the ATA box, including the natural β-globin mRNA cap site, are dispensable for transcription in vivo."[10]
"The first is a sequence rich in the nucleic acids adenine and thymine (the Goldberg-Hogness, "TATA," or "ATA" box) which is located 20-30 base pairs upstream from the RNA initiation site (the cap site which is the transcriptional start site for the mRNA) and is characterized by a concensus sequence (5'-TATAA-ATA-3')."[11]
Hypotheses
- A1BG has no ATA boxes in either promoter.
- A1BG is not transcribed by an ATA box.
- AGCE1 does not participate in the transcription of A1BG.
ATA box samplings
For the Basic programs (starting with SuccessablesATA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), including extending the number of nts from 958 to 4445, the programs are, are looking for, and found:
- negative strand, negative direction: 1, AATAAA at 1726.
- negative strand, positive direction: 1, AATAAA at 3427.
- positive strand, negative direction: 3, AATAAA at 4072, AATAAA at 3335, AATAAA at 3014.
- positive strand, positive direction: 0.
- inverse complement, negative strand, negative direction: 5, TTTATT at 4221, TTTATT at 4075, TTTATT at 4071, TTTATT at 3334, TTTATT at 3013.
- inverse complement, negative strand, positive direction: 2, TTTATT at 4142, TTTATT at 2347.
- inverse complement, positive strand, negative direction: 1, TTTATT at 4537.
- inverse complement, positive strand, positive direction: 0.
ATA (4560-2846) UTRs
- Negative strand, negative direction: TTTATT at 4221, TTTATT at 4075, TTTATT at 4071, TTTATT at 3334, TTTATT at 3013.
- Positive strand, negative direction: TTTATT at 4537, AATAAA at 4072, AATAAA at 3335, AATAAA at 3014.
ATA positive direction (4265-4050) proximal promoters
- Negative strand, positive direction: TTTATT at 4142.
ATA negative direction (2596-1) distal promoters
- Negative strand, negative direction: AATAAA at 1726.
ATA positive direction (4050-1) distal promoters
- Negative strand, positive direction: AATAAA at 3427, TTTATT at 2347.
ATA box random dataset samplings
- ATAr0: 7, AATAAA at 2736, AATAAA at 2017, AATAAA at 1498, AATAAA at 1494, AATAAA at 1104, AATAAA at 412, AATAAA at 198.
- ATAr1: 1, AATAAA at 4430.
- ATAr2: 1, AATAAA at 3948.
- ATAr3: 3, AATAAA at 3051, AATAAA at 1797, AATAAA at 127.
- ATAr4: 2, AATAAA at 2443, AATAAA at 1322.
- ATAr5: 1, AATAAA at 4308.
- ATAr6: 1, AATAAA at 2660.
- ATAr7: 1, AATAAA at 3633.
- ATAr8: 2, AATAAA at 925, AATAAA at 721.
- ATAr9: 2, AATAAA at 3936, AATAAA at 1380.
- ATAr0ci: 2, TTTATT at 1641, TTTATT at 227.
- ATAr1ci: 2, TTTATT at 3609, TTTATT at 3047.
- ATAr2ci: 0.
- ATAr3ci: 0.
- ATAr4ci: 3, TTTATT at 3016, TTTATT at 1408, TTTATT at 1349.
- ATAr5ci: 4, TTTATT at 1112, TTTATT at 858, TTTATT at 492, TTTATT at 271.
- ATAr6ci: 4, TTTATT at 3763, TTTATT at 2418, TTTATT at 1142, TTTATT at 918.
- ATAr7ci: 0.
- ATAr8ci: 2, TTTATT at 3123, TTTATT at 226.
- ATAr9ci: 1, TTTATT at 966.
ATAr arbitrary (evens) (4560-2846) UTRs
- ATAr2: AATAAA at 3948.
- ATAr4ci: TTTATT at 3016.
- ATAr6ci: TTTATT at 3763.
- ATAr8ci: TTTATT at 3123.
ATAr alternate (odds) (4560-2846) UTRs
- ATAr1: AATAAA at 4430.
- ATAr3: AATAAA at 3051.
- ATAr5: AATAAA at 4308.
- ATAr7: AATAAA at 3633.
- ATAr9: AATAAA at 3936.
- ATAr1ci: TTTATT at 3609, TTTATT at 3047.
ATAr arbitrary positive direction (odds) (4445-4265) core promoters
- ATAr1: AATAAA at 4430.
- ATAr5: AATAAA at 4308.
ATAr arbitrary negative direction (evens) (2811-2596) proximal promoters
- ATAr0: AATAAA at 2736.
- ATAr6: AATAAA at 2660.
ATAr arbitrary negative direction (evens) (2596-1) distal promoters
- ATAr0: AATAAA at 2017, AATAAA at 1498, AATAAA at 1494, AATAAA at 1104, AATAAA at 412, AATAAA at 198.
- ATAr4: AATAAA at 2443, AATAAA at 1322.
- ATAr8: AATAAA at 925, AATAAA at 721.
- ATAr0ci: TTTATT at 1641, TTTATT at 227.
- ATAr4ci: TTTATT at 1408, TTTATT at 1349.
- ATAr6ci: TTTATT at 2418, TTTATT at 1142, TTTATT at 918.
- ATAr8ci: TTTATT at 226.
ATAr alternate negative direction (odds) (2596-1) distal promoters
- ATAr3: AATAAA at 1797, AATAAA at 127.
- ATAr5ci: TTTATT at 1112, TTTATT at 858, TTTATT at 492, TTTATT at 271.
- ATAr9ci: TTTATT at 966.
ATAr arbitrary positive direction (odds) (4050-1) distal promoters
- ATAr3: AATAAA at 3051, AATAAA at 1797, AATAAA at 127.
- ATAr7: AATAAA at 3633.
- ATAr9: AATAAA at 3936.
- ATAr1ci: TTTATT at 3609, TTTATT at 3047.
- ATAr5ci: TTTATT at 1112, TTTATT at 858, TTTATT at 492, TTTATT at 271.
- ATAr9ci: TTTATT at 966.
ATAr alternate positive direction (evens) (4050-1) distal promoters
- ATAr0: AATAAA at 2736, AATAAA at 2017, AATAAA at 1498, AATAAA at 1494, AATAAA at 1104, AATAAA at 412, AATAAA at 198.
- ATAr2: AATAAA at 3948.
- ATAr4: AATAAA at 2443, AATAAA at 1322.
- ATAr6: AATAAA at 2660.
- ATAr8: AATAAA at 925, AATAAA at 721.
- ATAr0ci: TTTATT at 1641, TTTATT at 227.
- ATAr4ci: TTTATT at 3016, TTTATT at 1408, TTTATT at 1349.
- ATAr6ci: TTTATT at 3763, TTTATT at 2418, TTTATT at 1142, TTTATT at 918.
- ATAr8ci: TTTATT at 3123, TTTATT at 226.
ATA box analysis and results
"The 3' flanking area contained the highly conserved hexanucleotide sequence A-A-T-A-A-A found in eukaryotic messages between the terminator codon and the polyadenylylation site (44)."[2]
Reals or randoms | Promoters | direction | Numbers | Strands | Occurrences | Averages (± 0.1) |
---|---|---|---|---|---|---|
Reals | UTR | negative | 9 | 2 | 4.5 | 4.5 |
Randoms | UTR | arbitrary negative | 4 | 10 | 0.4 | 0.55 |
Randoms | UTR | alternate negative | 7 | 10 | 0.7 | 0.55 |
Reals | Core | negative | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary negative | 0 | 10 | 0 | 0 |
Randoms | Core | alternate negative | 0 | 10 | 0 | 0 |
Reals | Core | positive | 0 | 2 | 0 | 0 |
Randoms | Core | arbitrary positive | 0 | 10 | 0 | 0 |
Randoms | Core | alternate positive | 0 | 10 | 0 | 0 |
Reals | Proximal | negative | 0 | 2 | 0 | 0 |
Randoms | Proximal | arbitrary negative | 2 | 10 | 0.2 | 0.1 |
Randoms | Proximal | alternate negative | 0 | 10 | 0 | 0.1 |
Reals | Proximal | positive | 1 | 2 | 0.5 | 0.5 |
Randoms | Proximal | arbitrary positive | 0 | 10 | 0 | 0 |
Randoms | Proximal | alternate positive | 0 | 10 | 0 | 0 |
Reals | Distal | negative | 1 | 2 | 0.5 | 0.5 |
Randoms | Distal | arbitrary negative | 18 | 10 | 1.8 | 1.25 |
Randoms | Distal | alternate negative | 7 | 10 | 0.7 | 1.25 |
Reals | Distal | positive | 2 | 2 | 1 | 1 |
Randoms | Distal | arbitrary positive | 12 | 10 | 1.2 | 1.8 |
Randoms | Distal | alternate positive | 24 | 10 | 2.4 | 1.8 |
Comparison:
The occurrences of real ATA UTRs and positive direction proximals are greater than the randoms, distals are less than the randoms. This suggests that the real ATAs are likely active or activable.
Acknowledgements
The content on this page was first contributed by: Henry A. Hoff.
Initial content for this page in some instances came from Wikiversity.
See also
References
- ↑ 1.0 1.1 1.2 1.3 Ross C. Hardison (December 2012). "Evolution of Hemoglobin and Its Genes". Cold Spring Harbor Perspectives in Medicine. 2 (12): a011627. doi:10.1101/cshperspect.a011627. PMID 23209182. Retrieved 29 November 2021.
- ↑ 2.0 2.1 Stephen A. Liebhaber, Michel J. Goossens, and Yuet Wai Kan (December 1980). "Cloning and complete nucleotide sequence of human 5'-α-globin gene" (PDF). Proceedings of the National Academy of Science USA. 77 (12): 7054–8. Retrieved 2013-06-28.
- ↑ Jerry B. Dodgson and James Douglas Engel (10 April 1983). "The nucleotide sequence of the adult chicken alpha-globin genes" (PDF). The Journal of Biological Chemistry. 258 (7): 4623–9. Retrieved 2017-02-04.
- ↑ 4.0 4.1 Shigemi Kimura, Kuniya Abe, Misao Suzuki, Masakatsu Ogawa, Kowashi Yoshioka, Tadasi Kaname, Teruhisa Miike and Ken-ichi Yamamura (June 1997). "A 900 bp genomic region from the mouse dystrophin promoter directs lacZ reporter expression only to the right heart of transgenic mice". Development, Growth & Differentiation. 39 (3): 257–65. doi:10.1046/j.1440-169X.1997.t01-2-00001.x. Retrieved 2013-06-28.
- ↑ Steven J.B. Cooper and Rory M.HOPE (December 1993). "Evolution and expression of a beta-like globin gene of the Australian marsupial Sminthopsis crassicaudata" (PDF). Proceedings of the National Academy of Sciences USA. 90: 11777–81. Retrieved 2017-02-04.
- ↑ The haemoglobin subunits alpha and beta: Old and new genetic variants in the Italian Mediterranean buffalo
- ↑ RefSeq (July 2008). HBB hemoglobin subunit beta [ Homo sapiens (human) ]. 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 2017-02-04.
- ↑ Eiichi Saitoh and Satoko Isemura (January 1, 1993). "Molecular Biology of Human Salivary Cysteine Proteinase Inhibitors" (PDF). Critical Reviews in Oral Biology and Medicine. 4 (3/4): 487–93. doi:10.1177/10454411930040033301. Retrieved 2013-06-28.
- ↑ Stuart H. Orkin, Julianne P. Sexton, Tu-chen Cheng, Sabra C. Goff, Patricia J. V. Giardina, I. Lee Joseph and Haig H. Hazazian Jr. (1983). "ATA box transcription mutation in β-thalassemia". Nucleic Acids Research. 11 (14): 4727–34. doi:10.1093/nar/11.14.4727. Retrieved 2014-05-29.
- ↑ G. C. Grosveld, E. De Boer, C. K. Shewmaker, & R. A. Flavell (January 14, 1982). "DNA sequences necessary for transcription of the rabbit β-globin gene in vivo". Nature. 295 (5845): 120–6. doi:10.1038/295120a0. Retrieved 2014-05-29.
- ↑ GE Smith, MD Summers (1988). "Method for producing a recombinant baculovirus expression vector". US Patent (4, 745, 051). Retrieved 2014-05-29.