Core promoter gene transcriptions

Jump to navigation Jump to search

Editor-In-Chief: Henry A. Hoff

The diagram shows an overview of the four core promoter elements B recognition element (BRE), TATA box, initiator element (Inr), and downstream promoter element (DPE), with their respective consensus sequences and their distance from the transcription start site.[1] Credit: Jennifer E.F. Butler & James T. Kadonaga.{{free media}}

A core promoter is that portion of the proximal promoter that contains the transcription start sites.

Biochemical definition: the minimal stretch of DNA sequence that is sufficient to direct accurate initiation of transcription. An acceptable range of the length of a core promoter is typically 60 to 120 base pairs.

Genomics definition: short sequences surrounding the transcription start sites (TSSs).

It contains a binding site for RNA polymerase (RNA polymerase I, RNA polymerase II, or RNA polymerase III) holoenzymes.

A vast network of regulatory factors that contribute to the initiation of transcription by RNA polymerase ultimately target any specific gene’s core promoter.

The core promoter includes the transcription start site(s) (TSS).

That portion of the core promoter that is upstream of the TSS is also part of the proximal promoter.

The core promoter is approximately -34 bp upstream from the TSS. "Several factors have been identified that bind to core promoters (reviewed in Smale, 1997)".[2][3]

Genetics

File:Bob, the guinea pig.jpg
This is an image of Bob, the guinea pig. Credit: selbst.

Genetics involves the expression, transmission, and variation of inherited characteristics.

Gene transcriptions

DNA is a double helix of interlinked nucleotides surrounded by an epigenome. On the basis of biochemical signals, an enzyme, specifically a ribonucleic acid (RNA) polymerase, is chemically bonded to one of the strands (the template strand) of this double helix. The polymerase, once phosphorylated, begins to catalyze the formation of RNA using the template strand. Although the catalysis may have more than one beginning nucleotide (a start site) and more than one ending nucleotide (a stop site) along the DNA, each nucleotide sequence catalyzed that ultimately produces approximately the same RNA is part of a gene. The catalysis of each RNA representation from the template DNA is a transcription, specifically a gene transcription. The overall process is also referred to as gene transcription.

Promoters

Def. a "section of DNA that controls the initiation of RNA transcription as a product of a gene"[4] is called a promoter.

Proximal promoters

Def a section of promoter DNA which includes the transcription start sites that is neighboring the start sites is called a proximal promoter.

Cores

Def. a central or most important part of something is called a core.

Theoretical core promoters

Def. "the factors, including RNA polymerase II itself, that are minimally essential for transcription in vitro from an isolated core promoter" is called the basal machinery, or basal transcription machinery.[5]

Def. one or more sequence motifs containing the transcription start sites (TSSs), juxtaposed to the motif containing the TSSs, or in the proximal promoter that are only found in this core of motifs is called a core promoter.

Metal responsive elements

A metal responsive element (MRE), or TGC box, may occur in the core promoter of some human DNA genes.

"The metallothionein (MT) genes provide a good example of eucaryotic promoter architecture. MT genes specify the synthesis of low-molecular-weight metal-binding proteins. They are transcriptionally regulated by the metal ions cadmium and zinc (11), glucocorticoid hormones (18), interferon (14), interleukin-1 (22), and tumor promoters (2). The metal ion regulation of MTs is conferred by a short sequence element called the metal-responsive element (MRE [21]) or TGC box (31, 34), which functions as a metal ion-dependent enhancer."[6]

GC boxes

Def. a "sequence of contiguous guanine, guanine, guanine, cytosine, and guanine, in that order, along a DNA strand"[7] is called a GC box.

"[A] GC box is a distinct pattern of nucleotides found in the promoter region of some eukaryotic genes upstream of the TATA box and approximately 110 bases upstream from the transcription initiation site. It has a consensus sequence GGGCGG which is position dependent and orientation independent. The GC elements are bound by transcription factors and have similar functions to enhancers.[8]"[9]

"A large subclass of polymerase II promoters lacks both TATAA and CCAAT sequence motifs but contains multiple GC boxes. This promoter class includes several housekeeping genes (e.g., the genes encoding dihydrofolate reductase [DHFR] ..., hydroxymethylglutaryl coenzyme A reductase [39], hypoxanthine guanine phosphoribosyltransferase [33], and adenosine deaminase [46]) [and] nonhousekeeping genes (e.g., the transforming growth factor alpha [9, 23], rat malic enzyme [36], human c-Ha-ras [21], epidermal growth factor receptor [22], and nerve growth factor receptor [42] genes)."[10]

"[A] GC box-binding factor is required for transcription and ... a truncated promoter containing one GC box is transcriptionally inactive (44). ... the DNA-protein interactions occurring at the GC boxes in the DHFR promoter are functionally distinct and that factors binding to the GC boxes must interact in a position-dependent manner."[10]

"In promoters containing multiple GC boxes but lacking the TATAA box, transcription start sites may be single and specific, as observed in the nerve growth factor receptor gene (42) and the cellular retinol-binding protein gene (37), or there may be multiple heterogeneous start sites, such as those found in the c-myb (4), insulin receptor (45), and Ha-ras (21) genes. ... GC boxes are responsible for directing transcription from the major and the minor start sites. ... All TATAA-less promoters have at least two GC boxes"[10].

"A GC box sequence, one of the most common regulatory DNA elements of eukaryotic genes, is recognized by the Spl transcription factor; its consensus sequence is represented as 5'-G/T G/A GGCG G/T G/A G/A C/T-3' [or 5′-KRGGCGKRRY-3′] (Briggs et al., 1986)."[11]

HY boxes

A core responsive element is the hypertrophy region HY box between -89 and -60 nucleotides (nts) upstream from the transcription start site.[12]

CAAT boxes

"[A] CCAAT box (also sometimes abbreviated a CAAT box or CAT box) is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 75-80 bases to the initial transcription site. The CAAT box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. It is an invariant DNA sequence at about minus 70 base pairs from the origin of transcription in many eukaryotic promoters. Genes that have this element seem to require it for the gene to be transcribed in sufficient quantities. It is frequently absent from genes that encode proteins used in virtually all cells. This box along with the GC box is known for binding general transcription factors. CAAT and GC are primarily located in the region from 100-150bp upstream from the TATA box. Both of these consensus sequences belong to the regulatory promoter. Full gene expression occurs when transcription activator proteins bind to each module within the regulatory promoter. Protein specific binding is required for the CCAAT box activation. These proteins are known as CCAAT box binding proteins/CCAAT box binding factors. A CCAAT box is a feature frequently found before eukaryote coding regions".[13]

B recognition elements

"The B recognition element (BRE) is a DNA sequence found in the promoter region of most genes in eukaryotes and Archaea.[14][15] The BRE is a cis-regulatory element that is found immediately upstream of the TATA box, and consists of 7 nucleotides."[16]

"The Transcription Factor IIB (TFIIB) recognizes this sequence in the DNA, and binds to it. The fourth and fifth alpha helices of TFIIB intercalate with the major groove of the DNA at the BRE. TFIIB is one part of the preinitiation complex that helps RNA Polymerase II bind to the DNA."[16]

The consensus sequence is 5’-G/C G/C G/A C G C C-3’.[17]

The general consensus sequence using degenerate nucleotides is 5’-SSRCGCC-3’, where S = G or C and R = A or G.[18]

"The position in nucleotides (nt) relative to the transcription start site (TSS, +1)" is -35 for the BRE. Of human promoters, some "22-25% [are] BRE containing promoters ... the functional consensus sequences for BRE ... motif [is] still poorly defined."[18]

EIF4E basal elements

The EIF4E basal element, also eIF4E, (4EBE) is a basal promoter element for the eukaryotic translation initiation factor 4E. "Interactions between 4EBE and upstream activator sites are position, distance, and sequence dependent."[19]

TATA boxes

Def. a "DNA sequence (cis-regulatory element) found in the promoter region of genes in archaea and eukaryotes"[20] is called a TATA box.

The TATA box can be an AT-rich sequence "located at a fixed distance upstream of the transcription start site"[5].

TBP-like factors

Notation: let the symbol TLF designate a TATA binding protein-like factor.

The human gene TBPL1 (TBP-like 1, also TLF and TRF2[5]), GeneID: 9519, encodes a protein that "does not bind to the TATA box and initiates transcription from TATA-less promoters."[21]

Downstream TFIIB recognition

The downstream TFIIB recognition element (dBRE) has a consensus sequence in the transcription direction on the template strand of 3'-RTDKKKK-5', using degenerate nucleotides, or 3'-A/G-T-A/G/T-G/T-G/T-G/T-G/T-5'.[22]

dBRE is cis-TATA box, between the TATA box and the Inr or transcription start site (TSS) and trans-TSS.[22]

Initiator elements

For RNA polymerase II holoenzyme to transcribe a gene, the gene's promoter must be located. After the promoter is located, the transcription start site (TSS) is pinpointed by using nucleotide sequences that include the TSS or perhaps allow distance measurement to the TSS. Within the promoter, most human genes lack a TATA box and have an initiator element (Inr) or downstream promoter element instead.

"RNA pol II itself recognizes features of the Inr which might assist the correct positioning of the polymerase on the promoter (Carcamo et al., 1991; Weis and Reinberg, 1997)."[2][23][24]

Transcription start sites

The transcription start site (TSS) is the location on the DNA template strand where transcription begins at the 3'-end of a gene.[25] This location corresponds to the 5'-end of the mRNA which by convention is used to designate DNA locations.[25] For example, the 5'-TATA-box-3' designation refers to the directionality of the mRNA and corresponds to the 3'-TATA-box-5' designation for nucleotides on the template strand.[25] The template strand is the DNA strand being transcribed by RNA polymerase.[25]

Downstream core elements

"[N]onredundant human promoter sequences 600 bp long (−499 to +100 bp around the TSS) [are available] from [the] Eukaryotic Promoter Database (EPD) release 75 (4, 68) (http://www.epd.isb-sib.ch/), and ... promoters sequences 1,200 bp long (−1,000 to +200 bp) [are available] from the Database of Transcriptional Start Sites (DBTSS) (59, 74, 75) (http://dbtss.hgc.jp/index.html)"[26].

The downstream core element (DCE) is a transcription core promoter sequence that is within the transcribed portion of a gene.

The consensus sequence for the DCE is CTTC...CTGT...AGC.[26] These three consensus elements are referred to as subelements: "SI is CTTC, SII is CTGT, and SIII is AGC."[26]

The number of nucleotides between each subelement can apparently vary down to none.

A core promoter that contains all three subelements may be much less common than one containing only one or two.[26] "SI resides approximately from +6 to +11, SII from +16 to +21, and SIII from +30 to +34."[26]

SI as 3'-CTTC-5' can occur as 3 of 4 (CTT, TTC) or 4 of 4 (CTTC). SII as 3'-CTGT-5' can also occur as 3 of 4 (CTG, TGT) or 4 of 4 (CTGT). SIII as AGC is not known to vary.

DCE SIII can function independently of SI and SII.[26]

Transcription factor II D (TFIID), a transcription factor that is part of the RNA polymerase II holoenzyme, interacts with promoters containing only SIII of the DCE suggesting a critical spacing parameter between SIII and the TATA box, initiator element, or some combination of the two.[26] TFIID probably serves as a core promoter recognition complex.[26]

TAF1 interacts with the DCE in a sequence-dependent manner.[26]

The differences between core promoters with downstream elements may be explained by

  1. "TATA- and DPE-dependent promoters are specific for particular enhancers"[26],
  2. "preferences of activators for specific core promoter architectures"[26], and
  3. "the presence of a DCE or [downstream core promoter element (DPE)] might be indicative of an architecture designed for specific regulatory networks, such as the regulation of housekeeping promoters versus tissue-specific promoters (or other highly regulated promoters) or the regulation of subsets of viral promoters."[26]

Motif ten elements

The motif ten element (MTE) is a downstream core promoter element that "promotes transcription by RNA polymerase II when it is located precisely at positions +18 to +27 relative to A+1 in the initiator (Inr) element."[27]

The motif 10 consensus sequence is CSARCSSAACGS [5'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-3'].[27] By convention, the consensus sequence 5'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-3' is stated as it would be translated into mRNA. In the direction of transcription on the template strand this consensus sequence becomes 3'-C-C/G-A-A/G-C-C/G-C/G-A-A-C-G-C/G-5'.

Downstream promoter elements

"The downstream promoter element (DPE) is a core promoter element ... present in other species including humans and excluding Saccharomyces cerevisiae.[28] Like all core promoters, the DPE plays an important role in the initiation of gene transcription by RNA polymerase II."[29]

The core sequence of the DPE is located precisely +28 to +32 nts relative to the A+1 nt in the Inr.[17]

Super core promoters

A super core promoter (SCP) contains a TATA box, Inr, motif ten element (MTE), and DPE in a single promoter.[30] The SCP is the strongest core promoter observed in vitro and yields high levels of transcription in conjunction with transcriptional enhancers.[28]

Promoters can be classified based on the motifs found in the core promoter which include the TFIIB recognition element (BREu), typically starting from -37 bp to -32 bp upstream of the TSS, the TATA box, -31 bp to -26 bp upstream of the TSS, the Inr -2 bp to +4 bp, the Motif Ten Element (MTE), +18 bp to +27 bp downstream of the TSS, and, the Downstream Promoter Element (DPE), +28 bp to +32 bp.[31] Of these, DPE and BREu are the most common, present in 25% of the human core promoters, and the TATA box, present in 13% of the human core promoters.[31] The Downstream Core Element (DCE) (N5-7[CTTC]N7-8[CTGT]N7-11[AGC]N1-2) +10 bp to +40 bp can be present in promoters containing a TATA box and/or Inr, presumably does not occur with a DPE or MTE.[31]

The BRE is specifically recognized by TFIIB, but all other core promoter elements are TFIID-interaction sites: TAF6 and TAF9 contact the DPE, TAF1 and TAF2 contact the Inr, and TAF1 contacts the DCE.[32][33]

Hypotheses

  1. Each portion of a DNA that becomes active has a core promoter.
  2. The "minimal portion of the promoter required to properly initiate transcription".[34]

Comparisons of negative direction promoter elements

Butler (2002) Watson (2014) Juven-Gershon (2008) Butler (2002)
~-37 to -32 BREu SSRCGCC ~-31 to -26 TATAWAW -2 to +4 Inr YYRNWYY +28 to +32 DPE RGWYV
UTR nn(4560-2846) UTR nn(4560-2846) UTR nn(4560-2846) UTR nn(4560-2846)
- - TTACTCC at 4557 -
- - - GGACC at 4546
- - ciAGTGTAA at 4533 -
- - - ciTGTCT at 4518
- - - GGACC at 4494
- - - AGTCG at 4489
- - - GGTCG at 4480
- - - ciGGTCT at 4448
- - - AGTCC at 4436
- - - AGATG at 4430
- - - GGTCA at 4415
- - TCACACT at 4361 -
- - GTCACA at 4359 Ngoc (2017) -
- - CCCACT at 4353 Ngoc (2017) GGACA at 4369
- - TCGGACC at 4349 -
- - GTCACT at 4319 Ngoc (2017) GGACC at 4349
- - - GGTCG at 4345
- - CCAGTTT at 4309 -
- - TCCAGT at 4307 Ngoc (2017) -
- - TCGGACC at 4300 GGACC at 4300
- - - ciGAACT at 4294
- - ciGGTCCGA at 4255 ciTGTCC at 4282
- - - ciCGACT at 4276
- - CTGCACC at 4238 ciGAACC at 4268
- - TCGGTCT at 4233 GGTCG at 4261
- - - GGTCC at 4253
- - TCACTCT at 4202 ciGGTCT at 4233
- - GTCACT at 4200 Ngoc (2017) -
- - TCGAACC at 4188 AGATG at 4212
- - - GGACA at 4208
- - - ciGAACC at 4188
- - - AGTTC at 4178
- - CCGGTCC at 4170 GGTCC at 4170
- - ciAGTACGG at 4118 ciCGACT at 4145
- - CCGTACC at 4107 AGTCC at 4138
- - CCGGTCC at 4102 GGTCG at 4130
- - TTACACT at 4092 GGACA at 4121
- - - GGTCC at 4102
- - - ciCAACC at 4097
- - TCACTCT at 4051 ciTATCT at 4079
- - TTGTATC at 4046 -
- - TCGGACC at 4037 AGATG at 4062
- - TCGGACC at 4037 AGATG at 4062
- - - GAGACT at 4053 Matsumoto (2020)
- - - GGACC at 4037
- - - GGTCG at 4033
- - - AGTTC at 4027
- - - GGTTC at 4019
- - - ciGAACT at 4012
- - ciAGTGTGG at 3967 ciCGACT at 3994
- - CCGGTCC at 3951 GGTTG at 3979
- - TTCACA at 3939 Ngoc (2017) GGACA at 3970
- - CTACTTT at 3922 GGTCC at 3951
- - CTACTTT at 3922 ciCAACC at 3946
- - - ciCAACC at 3942
- - - ciGGACT at 3932
- - TCATTCT at 3893 ciTGTCT at 3917
- - TCATTC at 3892 (Butler 2002) ciTGTCT at 3917
- - CTCATT at 3891 Ngoc (2017) ciTGTCT at 3917
- - - GGACC at 3906
- - ciGGTCCGG at 3873 GGTCC at 3885
- - CTGGTCC at 3871 GGTCC at 3871
- - ciGGTATGG at 3858 ciCGACC at 3864
- - CTCATA at 3829 Ngoc (2017) GGACG at 3861
- - TCCACT at 3825 Ngoc (2017) AGTTC at 3844
- - CTACACC at 3810 AGACC at 3835
- - - ciCATCT at 3820
- - - ciCAACT at 3805
- - - ciGAACC at 3793
- - CTGTTCT at 3759 ciGGACT at 3781
- - - ciTGACC at 3749
- - - GGACC at 3744
- - - GGTCGTG at 3733 Kadonaga (2002)
- - - GGTCG at 3731
- - - ciCGACC at 3719
- - - AGACG at 3706
- - - GGTCG at 3701
- - - GGTCG at 3682
- - - ciTGTCT at 3672
- - - ciCGACT at 3649
- - - AGTCTC at 3645 Matsumoto (2020)
- - - ciCAACC at 3606
- - - ciCGTCT at 3589
- - - GGTCC at 3585
- - - GGACG at 3579
- - - ciGAACT at 3571
- - - GGTCC at 3564
- - - AGACA at 3556
- - - ciTGACT at 3542
- - - ciCAACT at 3533
- - - ciTAACC at 3529
- - ciGGTCTAG at 3488 AGTTG at 3523
- - TTGGTCT at 3486 ciCAACT at 3505
- - TCATTT at 3481 (Butler 2002) ciCAACT at 3505
- - GTCATT at 3480 Ngoc (2017) ciCAACT at 3505
- - TTGATCT at 3463 ciGGTCT at 3486
- - CCGTATC at 3446 ciGATCT at 3463
- - TTCACT at 3410 Ngoc (2017) ciTATCC at 3447
- - CCGAACT at 3401 AGTTG at 3431
- - ciAGTCCGA at 3398 ciTATCT at 3422
- - TCGTTCT at 3374 ciGAACT at 3401
- - TTGTTCT at 3340 AGTCC at 3396
- - TCGTTTT at 3313 ciTAACT at 3358
- - TTGTTCT at 3307 AGACA at 3319
- - TCGGACC at 3298 GGACC at 3298
- - TCGGTTC at 3273 GGACC at 3298
- - ciAGTGCGG at 3281 GGTCG at 3294
- - TCGGTTC at 3273 GGTTC at 3273
- - - ciCATCT at 3256
- - - GGTCC at 3249
- - - ciCiGAACT at 3242
- - - ciCGACT at 3224
- - - AGTCC at 3217
- - CCACACC at 3186 GGTCG at 3209
- - CCCACA at 3184 Ngoc (2017) AGTCG at 3204
- - - GGACA at 3200
- - TTGTATT at 3169 ciCGACC at 3180
- - CCACTTT at 3146 -
- - TCCACT at 3144 Ngoc (2017) -
- - TTGTTCC at 3141 -
- - ciGGACCGG at 3130 -
- - TCGGACC at 3128 AGATG at 3158
- - - GGTTG at 3137
- - TCGGACC at 3128 GGACC at 3128
- - - GGTCG at 3124
- - - ciCAACC at 3116
- - - AGTCC at 3110
- - - ciGAACT at 3103
- - - ciCGACT at 3085
- - - GGTCGTG at 3072 Kadonaga (2002)
CCGCACC at 3047 - CCGCACC at 3047 GGTCG at 3070
- - ciGATTCGA at 3033 GGACA at 3061
- - TTGATTC at 3031 GGACA at 3061
- - - ciCGACC at 3041
- - CCGATTT at 3009 ciCGACC at 3035
- - - GGATA at 2996
- - - AGATG at 2988
- - TTGATTC at 2914 ciCiGAACC at 2921
- - ciAAAGTAG at 2887 ciCiGAACC at 2921
- TATATAT at 2872 TTCACA at 2860 Ngoc (2017) ciTATCT at 2903
- ciTTATATA at 2871 - ciTGTCT at 2878
- ciTTTTATA at 2869 - ciTGTCT at 2878
- TATAAAA at 2853 - ciTGTCT at 2878
- - - GGTTA at 2848
UTR pn(4560-2846) UTR pn(4560-2846) UTR pn(4560-2846) UTR pn(4560-2846)
- - ciGGAATGA at 4555 -
- - TTAATTC at 4542 -
- - TCACATT at 4533 -
- - ciAGTCCAA at 4502 -
- - - AGACA at 4507
- - CCACTTT at 4461 AGTCC at 4500
- - - ciGATCC at 4476
- - - AGATC at 4475
- - - GGACA at 4468
- - CCACTCC at 4425 ciCATCC at 4456
- - - ciGAACC at 4451
- - CCAGTTC at 4417 AGTTC at 4417
- - ciAGTGTGA at 4361 AGTTC at 4417
- - CTGCACT at 4340 ciTGTCT at 4371
- - - AGACC at 4365
- - CCGGACT at 4327 ciGGACT at 4327
- - - GGTCA at 4307
- - ciAAAATAA at 4221 GGATC at 4288
- - - AGACG at 4235
- - - ciTGTCT at 4210
- - ciAGTTCAA at 4177 AGACC at 4204
- - - AGACA at 4181
- - - AGTTC at 4175
- - - GGATC at 4157
- - ciAATGTGA at 4092 AGTCC at 4126
- - ciAAAATAA at 4071 AGTTG at 4096
- - ciAGACCAG at 4032 ciCATCT at 4058
- - ciAGTTCAA at 4026 ciGAGACT at 4053 Matsumoto (2020)
- - - AGACC at 4030
- - - AGTTC at 4024
- - TCACACC at 3967 GGATC at 4006
- - - GGTTG at 3945
- - ciGGAGTAA at 3891 AGATG at 3919
- - ciGGACCAG at 3870 ciCATCC at 3903
- - CCATACC at 3858 GGACC at 3868
- - ciGATGTGG at 3810 ciCAACT at 3849
- - - ciTGTCT at 3833
- - - GGTCG at 3813
- - CTGAACC at 3784 GGTTG at 3804
- - ciAATGCAG at 3772 ciGAACC at 3784
- - ciGGACTGG at 3749 AGACC at 3761
- - CTGGACT at 3747 GGACA at 3756
- - ciGGAACAG at 3725 ciGGACT at 3747
- - CCATTTC at 3688 ciCGTCC at 3698
- - ciAATCCAG at 3681 ciCGTCC at 3698
- - - GGATA at 3655
- - - ciGGACT at 3640
- - - AGATG at 3627
- - - AGATG at 3620
- - CTGCTCC at 3582 GGTTG at 3605
- - - ciCATCT at 3551
- - - GGTTG at 3532
- - CCAGATC at 3488 ciCAACT at 3524
- - ciAAACCAG at 3485 -
- - ciGAACTAG at 3462 AGATC at 3488
- - - AGATA at 3465
- - - ciGAACT at 3460
- - ciGAAGTGA at 3410 AGACA at 3433
- - ciAAATTGA at 3358 GGACA at 3389
- - ciAAAACAA at 3330 -
- - ciAGAGCAA at 3311 ciTGTCT at 3321
- - TTGCACT at 3289 -
- - TTGAACC at 3245 AGATC at 3276
- - - GGTTG at 3261
- - ciGGTGTGG at 3186 ciGAACC at 3245
- - ciAAATTAG at 3176 -
- - ciAGACCAG at 3123 ciCATCT at 3154
- - - AGACC at 3121
- - - AGTTG at 3115
- - ciAAACTAA at 3030 GGATC at 3097
- - ciAAAATAA at 3013 -
- - ciAGAATGG at 3004 -
- - - ciTGTCT at 2986
- - - AGATA at 2981
- - - AGACA at 2948
- - - ciCAACT at 2911
- TATAAA at 2874 Butler (2002) - AGATG at 2905
- - - AGATG at 2894
- - - AGACA at 2880
Cores nn (2846-2811) Cores nn (2846-2811) Cores nn (2846-2811) Cores nn (2846-2811)
Cores pn (2846-2811) Cores pn (2846-2811) Cores pn (2846-2811) Cores pn (2846-2811)
- - ciAAAACAA at 2842 CAACC at 2844
Proximals nn (2811-2596) Proximals nn (2811-2596) Proximals nn (2811-2596) Proximals nn (2811-2596)
- - ciACTGAG at 2787 Ngoc (2017) -
- - TCGTACT at 2784 -
- - TCGGACC at 2770 GGACC at 2770
- - ciAGTACGG at 2753 ciTGTCT at 2778
- - GTCACT at 2739 Ngoc (2017) GGTCG at 2766
- - TTGGACC at 2720 GGACC at 2720
- - - ciCGACT at 2744
- - - ciGAACT at 2714
- - - ciCAACT at 2705
- - ciTTTATA at 2638 Butler (2002) ciCGACT at 2696
- - - ciTGTCC at 2689
- - GTCACA at 2656 Ngoc (2017) GGTCG at 2681
- - - GGACAT at 2673
- ciATTTATA at 2638 TCACACC at 2658 GGACA at 2672
- - CCACTTT at 2619 GGTCA at 2654
- - TTGTACC at 2614 AGTCG at 2650
- - - GGTTGT at 2611 Juven-Gershon (2010)
- - TCACACC at 2605 GGTTG at 2610
- - GTCACA at 2603 Ngoc (2017) -
- - - GGTCA at 2601
Proximals pn (2811-2596) Proximals pn (2811-2596) Proximals pn (2811-2596) Proximals pn (2811-2596)
- - - ATGACT at 2786 Juven-Gershon (2010)
- - - AGTTG at 2733
- - - ciGAACC at 2717
- - - AGTTG at 2704
- - - -
- - - AGACC at 2598
Distal nn (2596-1) Distal nn (2596-1) Distal nn (2596-1) Distal nn (2596-1)
- - - ciCAACT at 2593
- - - AGTCCT at 2588 Juven-Gershon (2010)
- - CCAGTCC at 2587 AGTCC at 2587
- - - ciGAACT at 2580
- - ciAGTACGG at 2535 ciCGTCC at 2568
- - - ciCGACT at 2562
- - CCGGTCC at 2519 GGTTG at 2547
- - TCATTCT at 2503 GGACA at 2538
- - TTGTTTT at 2490 GGTCC at 2519
- - - ciTGTCC at 2514
- - TCGTTTT at 2476 AGTTA at 2496
- - TCACTCT at 2449 -
- - TCGGACC at 2435 ciTGTCT at 2443
- - ciAGTGTGG at 2418 GGACC at 2435
- - TTGGACC at 2385 GGTCG at 2431
- - - ciCGTCC at 2389
- - - GGACC at 2385
- - - ciGAACT at 2379
- - - ciCGTCC at 2367
- - - ciCGACT at 2361
- - - GGTCG at 2346
- - - GGACA at 2337
- - CCACTTT at 2282 ciCGACC at 2326
- - TCGTACC at 2277 AGATG at 2294
- - TCGGACC at 2268 GGACC at 2268
- - TCAAACT at 2257 GGTCG at 2264
- - CCAGTCC at 2250 AGTCC at 2250
CCACGCC at 2197 - ciAGTGCGG at 2208 ciCGACT at 2226
- - CCGCTTT at 2157 GGTCA at 2211
- - TTGTACC at 2152 AGATG at 2169
- - TCAAACT at 2141 GGTTG at 2148
- - - AGTCC at 2134
- - - ciGAACT at 2127
- - - ciTGTCT at 2119
- - TCACATT at 2087 ciCGACT at 2109
- - CCGGTCC at 2077 GGATC at 2093
- - TTACACC at 2065 GGTCC at 2077
- - TCGTTCT at 2023 ciCGACC at 2069
- - TCGGACC at 2009 AGACA at 2029
- - ciAGTGCGG at 1992 ciTGTCT at 2017
- - - GGACC at 2009
- - TTGGACC at 1959 GGTCG at 2005
- - CCGTACT at 1953 ciCGTCT at 1967
- - - GGACC at 1959
- - - ciCGTCC at 1941
- - - ciTGACT at 1935
- - - ciGAACC at 1927
- - CCGCACC at 1897 GGTCG at 1920
- - - GGACA at 1911
- - - ciCGACC at 1891
- - ciGGACCGA at 1843 AGATG at 1867
- - - ciCAACT at 1853
- - - GGACC at 1841
- - - ciCGTCC at 1823
- ciTTTATA at 1740 Butler (2002) - GGTTC at 1817
- - ciAGTGCAG at 1773 ciCGACT at 1800
- - - GGTCG at 1785
- - TTATACC at 1742 AGACA at 1776
- - ciAAAATAG at 1730 ciCGACC at 1756
CCGCGCC at 1762 - - ciCGACC at 1746
- - TTAATTT at 1697 ciTATCT at 1710
- TATAAA at 1602 Butler (2002) - ciGGTCT at 1670
- - - ciCATCT at 1653
- - ciAGAACGG at 1608 ciGAACC at 1649
- - - ciGGACT at 1623
- - TTGGATT at 1591 ciCGTCT at 1614
- - TTACTTT at 1582 GGTCG at 1611
- - CCGTTTT at 1561 ciTGTCT at 1567
- - TTGCTTC at 1555 -
- - ciGATATAG at 1528 GGTCA at 1532
- - - AGATA at 1525
- - - ciGGTCT at 1518
- - CCACACT at 1479 AGTTG at 1513
- - ciGGTCCGA at 1462 AGTCG at 1486
- - ciAGAGCGA at 1448 ciCGACC at 1464
- - - GGTCC at 1460
- - - AGACA at 1452
- - TTGTTTT at 1394 ciGGTCT at 1411
- - TCGTTTT at 1371 AGTTG at 1406
- - TTATTCT at 1365 -
- - TCAGACC at 1356 AGACC at 1356
- - TTGGATC at 1306 GGTCA at 1352
- - - ciCGTCT at 1314
- - - ciGATCC at 1307
- - - GGATC at 1306
- - - ciGAACT at 1300
- - - ciCGTCC at 1288
- - CCGCACC at 1244 ciCGACT at 1282
- - - AGTCC at 1275
- - - GGTCG at 1267
- - CCACTTT at 1212 GGACA at 1258
- - TTGTACC at 1207 AGATG at 1224
- - ciGGACCGG at 1200 GGTTG at 1203
- - TCGGACC at 1198 GGACC at 1198
- - - GGTCG at 1194
- - ciAGTGTGG at 1128 ciGGACT at 1173
- - - GGTCG at 1140
- - - GGACA at 1131
- - TCACTCT at 1079 ciCGACC at 1111
- - ciGAAGTGA at 1056 AGACA at 1085
- - - ciTGTCT at 1073
- - - GGTCG at 1061
- - TTGGACC at 1015 ciTAACC at 1045
- - - ciCGTCT at 1023
- - - GGACC at 1015
- - TTAGTCC at 984 ciGAACT at 1009
- - - ciCGTCC at 997
- - - ciCGACT at 991
- - - AGTCC at 984
- - CCGTACC at 953 GGTCG at 976
- - TCGGTCC at 948 ciCATCT at 970
- - - GGACA at 967
- - TCGCTCT at 913 GGTCC at 948
- - - AGACA at 919
- - - ciTGTCT at 907
- - TCGGACC at 899 GGACC at 899
- - ciAGTGTGG at 882 GGTCG at 895
- - TCGGTTC at 874 GGTTC at 874
- - - GGTCC at 850
- - - ciGAACT at 843
- - - ciCGTCC at 831
- - - ciCGACT at 825
- - CTACACC at 787 GGTCG at 810
- - - GGACA at 801
- - - ciCGACC at 781
- - TCGCACC at 741 AGATG at 758
- - ciGGACTGG at 734 GGTCG at 737
- - TCGGACT at 732 ciGGACT at 732
- - CCAGTCC at 714 GGTCG at 728
- - CCGGTTC at 692 AGTCC at 714
- - - ciCGTCC at 697
- - ciAGTGCGG at 664 GGTTC at 692
- - CCGGTCC at 648 GGTCG at 676
- - - GGACA at 667
- - - GGTCC at 648
- - TTATACC at 605 ciTAACC at 643
- - ciGGACCGA at 598 AGATG at 624
- - CCAGTCC at 578 GGACC at 596
- - CCGGTTC at 556 ciTAACT at 585
- - - AGTCC at 578
- - - ciCGTCC at 565
- - - ciTGTCC at 561
- - - GGTTC at 556
- - TCGGACC at 508 GGTCG at 540
- - - GGACC at 508
- - TCACTTT at 473 GGTCG at 504
- - TTGTATC at 468 AGATG at 481
- - TCGGACC at 459 GGACC at 459
- - CCAGTCC at 441 AGTCC at 441
- - CCGGTTC at 419 ciTGTCC at 424
CCACGCC at 380 - - GGTTC at 419
- - - GGTCG at 403
- - - GGACA at 394
- - CTGCTTT at 312 ciTATCT at 355
- - TCACTCT at 301 ciGAACC at 328
- TATAAA at 221 Butler (2002) TTATACT at 274 ciTGACT at 307
- - TTGGTCC at 262 ciTGTCT at 289
- TTATAAAA at 222 Carninci (2006) CTACATT at 247 ciCATCT at 284
- TATAAAA at 183 Carninci (2006) ciGATACAA at 213 GGTCC at 262
- - CCATATT at 181 AGATA at 234
- - CCGTACT at 124 ciTGTCT at 168
- - - ciCGACT at 140
- - CCGTTTC at 93 ciTGACT at 130
- - - ciCATCC at 119
- - CTATACC at 77 ciTATCT at 100
- - TTGTTCC at 71 ciCAACT at 85
- - - GGTCG at 35
- - - ciTGACT at 17
- - - ciTGTCT at 13
Distal pn (2596-1) Distal pn (2596-1) Distal pn (2596-1) Distal pn (2596-1)
- - - AGTTG at 2592
- - - GGTCA at 2585
- - - GGATC at 2574
- - AAAACAA at 2509 AGTCC at 2543
- - AAAGCAA at 2480 -
- - AAAGCAA at 2474 -
- - GATTCGG at 2454 -
- - AGAGTGA at 2447 -
- - ciCTGCACT at 2426 -
- - ciTTGAACC at 2382 AGATC at 2413
- - - GGTTG at 2398
- - ciCTACTCC at 2352 ciGAACC at 2382
- - AAACTAG at 2313 -
- - AATACAA at 2305 -
- - AGACCAG at 2263 ciCATCT at 2290
- - - GGACA at 2271
- - - AGACC at 2261
- - - GGTCA at 2248
- - - GGATC at 2239
- - GGTGCGG at 2197 GGTTG at 2234
- - AAAATGA at 2187 ciTGACC at 2189
- - GATACAA at 2180 -
- - AGACCAA at 2147 AGATA at 2177
- - AGTTTGA at 2141 ciTGTCT at 2165
- - AGTGTAA at 2087 AGACC at 2145
- - GGTGCAG at 2082 -
- - AATGTGG at 2065 -
- - AGAGCAA at 2021 ciTGTCT at 2031
- - ciCTGCACT at 2000 AGACC at 2121
- - - GGACA at 2117
- - AGAATGG at 1948 AGATC at 1987
ciGGCGTGG at 1897 - AGACTGA at 1935 ciGAACC at 1956
- - AAATTAG at 1887 ciGAGACT at 1933 Matsumoto (2020)
- - AATACAA at 1878 -
- - - ciCATCT at 1863
- - - ciCATCC at 1838
- - - AGACC at 1834
- - - AGATG at 1828
- - - ciGATCC at 1813
- - - GGATC at 1812
- - AATATGG at 1742 ciCGTCT at 1774
- - ciTTATTTT at 1727 ciTATCT at 1731
- - GAATTAA at 1696 -
- - AAAGCGG at 1680 ciGAACT at 1685
- - GAAATGA at 1663 ciGAACT at 1685
- - GAAACAA at 1585 AGATA at 1595
- - AATACAG at 1566 ciCATCC at 1572
- - AGAACGA at 1553 AGACA at 1569
- - AGTGCAA at 1536 AGACA at 1569
- - - ciTATCC at 1529
- - GGTGTGA at 1479 ciCAACC at 1514
- - AGTGCAG at 1471 ciGATCT at 1482
- - ciTCGCTCT at 1450 ciGATCT at 1482
- - - AGATG at 1438
- - AAAACAA at 1388 ciCAACC at 1407
- - ciCCATTTC at 1380 ciCAACC at 1407
- - AGAGCAA at 1369 -
- - AGTCTGG at 1356 -
- - ciCCAGTCT at 1354 -
- - ciTTGCACT at 1347 -
- - ciTTGCACC at 1339 -
ciGGCGTGG at 1244 - ciTTGAACC at 1303 GGTTG at 1319
- - AAATTAG at 1234 ciGAACC at 1303
- - - ciTGTCT at 1222
GGACGCC at 1153 - - ciCGACC at 1191
- - - AGTTC at 1177
- - - GGATC at 1167
- - ciAGAGTGA at 1077 GGACG at 1151
- - ciTCACTCC at 1058 ciTGTCT at 1087
- - ciAGATTGG at 1045 ciGAGACT at 1081 Matsumoto (2020)
- - ciTTGAACC at 1012 ciTGACT at 1051
- - - GGTTG at 1028
- - ciGATCCAG at 975 ciGAACC at 1012
- - - ciGATCC at 973
- - ciAGAGCGA at 911 AGATC at 972
- - - ciTGTCT at 921
- - - ciGAGACT at 915 Matsumoto (2020)
- - ciTTGAACC at 846 AGATC at 877
- - ciGATGTGG at 787 GGTTG at 862
- - ciAAATTAG at 777 ciGAACC at 846
- - ciAATACAA at 769 GGATG at 784
- - ciAGACCAG at 727 ciCGTCT at 754
- - ciAGTTCGA at 721 ciTGACC at 734
- - - AGACC at 725
- - - AGTTC at 719
- - ciAAATTGG at 643 GGTCA at 712
- - ciAATACAA at 635 GGATC at 703
- - ciAATATGG at 605 ciTAACC at 614
- - ciAGATTGA at 585 ciCATCC at 593
- - - AGATC at 589
- - - GGTCA at 576
- - - GGTCA at 568
- - - AGACA at 559
- - ciAAATTAG at 499 GGATC at 525
- - ciAATACGA at 492 -
- - ciAGTGCGA at 448 ciTGTCT at 479
- - - GGTCA at 439
- - - GGATC at 430
- - ciGGTGCGG at 380 AGACA at 422
- - ciAAACTGA at 307 -
- - ciAGAACAG at 288 -
- - ciAATATGA at 274 -
- - ciAAACCAG at 261 -
- - ciAGTTCAA at 255 -
- - ciGATGTAA at 247 AGTTC at 253
- - ciGAAACAA at 229 AGATG at 244
- - ciGGTATAA at 181 GGTCA at 206
- - ciAAAACAG at 167 AGACA at 170
- - CTGCATT at 152 AGTCG at 157
- - ciAAACTGA at 130 AGTCG at 157
- - ciGATATGG at 77 GGATA at 108
- - ciAAAACAA at 69 GGATA at 98
- - - AGTTG at 84
- - ciGGACCAG at 34 GGATA at 74
- - CTGAATT at 20 AGATA at 57
- - ciAGACTGA at 17 GGACC at 32

Comparisons of positive direction promoter elements

Butler (2002) Watson (2014) Juven-Gershon (2008) Butler (2002)
~-37 to -32 BREu SSRCGCC ~-31 to -26 TATAWAW -2 to +4 Inr YYRNWYY +28 to +32 DPE RGWYV
Cores np(4445-4265) Cores np(4445-4265) Cores np(4445-4265) Cores np(4445-4265)
- - ciGGAACAG at 4445 -
- - ciGGTCTGG at 4416 GGTCC at 4420
- - - ciGGTCT at 4414
- - ciGGAGTGA at 4350 ciGGTCT at 4380
- - CTGCACC at 4343 ciTGTCC at 4367
- - - ciCGACC at 4358
- - - AGACA at 4332
- - - AGACG at 4319
- - - GGTCA at 4269
- - - -
Proximals np(4265-4050) Proximals np(4265-4050) Proximals np(4265-4050) Proximals np(4265-4050)
- - - GGACA at 4252
- - - ciTGACC at 4216
- - - AGTTC at 4200
- - TTAGTTT at 4139 ciCATCC at 4183
- - ciGATTTAG at 4136 -
- - TTGATTT at 4134 -
- - TCACTCT at 4128 -
- - TCATTTT at 4120 -
- - ciGAAATGA at 4094 -
- - ciAGAACAG at 4069 GGATG at 4099
- - - ciGATCC at 4081
- - - GGATC at 4080
- - - GGTTC at 4073
- - - ciCGTCT at 4056
Distals np(4050-1) Distals np(4050-1) Distals np(4050-1) Distals np(4050-1)
- - - ciGAACT at 4048
- - ciAGAGTGG at 4040 -
- - ciGGTGTGA at 3971 ciTGACC at 4018
- - ciAGTGTGG at 3966 -
- - ciAGTCTGA at 3924 ciGAACC at 3937
- - ciAGAGTGA at 3876 ciCAACC at 3911
- - ciGAACCAG at 3840 AGACA at 3893
- - ciAGAATGA at 3835 AGTCC at 3863
- - TCACACC at 3824 ciGAACC at 3856
- - ciAATCCGA at 3799 ciGAACC at 3838
- - - GGTCA at 3820
- - - ciCGACT at 3801
- - - ciTGACC at 3784
- - - ciGGTCT at 3771
- - - ciTAACT at 3733
- - ciGAAGCGG at 3670 ciTGACC at 3714
- - - ciCATCC at 3629
- - CTGTTCC at 3625 -
- - ciAGTGTGA at 3594 ciTGTCC at 3619
- - ciGGAATGA at 3567 ciGGTCT at 3608
- - CCAGACC at 3550 ciTGTCC at 3577
- - ciGGACCAG at 3547 GGATG at 3574
- - - AGACC at 3550
- - TCACACT at 3507 GGACC at 3545
- - TCACACT at 3507 GGACA at 3530
- - ciAGTGCAG at 3465 GGTTG at 3490
- - ciGATGCAG at 3460 AGATG at 3475
- - ciGGAATGA at 3441 GGATG at 3457
- - TTGCATC at 3402 AGATG at 3418
- - CTGTTCC at 3352 ciCATCT at 3403
- - TTGCACT at 3343 ciTGTCT at 3392
- - TTGCACT at 3343 ciTATCC at 3384
- - TTGCACT at 3343 AGTTA at 3381
- - CCGCATC at 3328 -
- - CTGCACC at 3322 -
- - CTGCTCC at 3309 -
- - CTGGTCT at 3299 ciCATCT at 3329
- - TCGCTCT at 3276 ciGGTCT at 3299
- - TCGCTCT at 3276 ciCAACT at 3291
- - CTGGTCT at 3245 AGTCG at 3283
- - - ciCGTCT at 3256
- - ciGGACCAA at 3174 ciGGTCT at 3245
- - ciGGACCAA at 3174 ciCGTCC at 3203
- - ciGAAATGG at 3168 -
- - ciAATATGG at 3162 GGACC at 3172
- - CCAGTCC at 3084 GGACA at 3131
- - CCAGTCC at 3084 ciCATCC at 3108
- - ciGGTCTGG at 3021 AGTCC at 3084
- - - ciTGTCT at 3053
- - - GGTTG at 3050
- - CCAGTCC at 2998 GGTTA at 3024
- - CCAGTCC at 2998 ciGGTCT at 3019
- - CCAGTCC at 2998 ciGGTCT at 3019
- - CTGCTCC at 2978 ciTGTCT at 3004
- - ciGGTCTGA at 2943 AGTCC at 2998
- - ciGATTTGA at 2871 AGTTC at 2954
- - ciGATTTGA at 2871 ciTGACT at 2945
- - ciGATTTGA at 2871 ciGGTCT at 2941
- - ciGATTTGA at 2871 GGTTC at 2922
- - - ciTGACC at 2873
- - TCAGATT at 2868 -
- - ciAGAATGA at 2841 AGACC at 2861
- - ciGGTGCAA at 2801 ciCATCT at 2852
- - ciGGTGCAA at 2801 ciGGACT at 2820
- - - ciGAACC at 2776
- - ciAAAGTGG at 2711 GGATA at 2737
- - ciAGAGCAA at 2705 GGATG at 2714
- - ciGGACTGA at 2674 -
- - ciGATATAA at 2662 -
- - CCACACT at 2636 ciGGACT at 2672
- - ciGAAATAG at 2626 AGTTA at 2666
- ciTTTATA at 2588 Butler (2002) CCACACC at 2602 GGATA at 2659
- - TTATACC at 2590 AGTCA at 2618
- - TTATACC at 2590 AGTCA at 2613
- - TTATACC at 2590 AGTCA at 2607
- - CCGCACC at 2566 ciGAACC at 2579
- - CTAATTT at 2440 ciGATCC at 2514
- - CTAATTT at 2440 ciGGTCT at 2489
- - CTACACC at 2430 ciTGTCT at 2466
- - CTACACC at 2430 GGACA at 2460
- - - GGATG at 2409
- - ciGGTGCAA at 2335 ciGATCC at 2378
- - ciAGTGCAG at 2327 ciCGACT at 2359
- - TCACTCT at 2306 -
- - CTGTTTC at 2263 -
- - TCAATCT at 2235 ciGGACT at 2271
- - ciAGATCAA at 2232 ciGGTCT at 2258
- - CCAGATC at 2230 -
- - ciGAACCAG at 2227 -
- - CTGCATT at 2206 AGATC at 2230
- - TCATATT at 2178 ciGAACC at 2225
- - - GGTCA at 2220
- - - ciTGACC at 2213
- - - ciTGTCT at 2172
- - - AGTTA at 2134
- - TCGCTTC at 2095 ciCAACC at 2120
- - - ciCATCT at 2111
- - ciAGTGCAG at 2064 AGTCA at 2100
- - CCAGTCC at 2026 ciTGTCT at 2078
- - ciAAAGCAG at 2007 GGTCA at 2035
- - CTATTTC at 1978 AGTCC at 2026
- - ciGGTGTGG at 1971 ciCAACC at 2013
- - ciGAACTGG at 1953 AGTTC at 1987
- - - ciGGTCT at 1958
- - CCACTTC at 1914 ciGAACT at 1951
- - - ciCGTCC at 1930
- - - GGTTC at 1926
- - - GGATG at 1878
- - - GGACA at 1869
- - - ciTGTCT at 1862
ciGGCGCCC at 1770 - ciGGTGTGG at 1805 AGTCC at 1841
- - ciAGTGCAG at 1787 AGTCC at 1826
GGGCGCC at 1769 - - ciGAACC at 1811
- - ciGGTGCGG at 1764 -
- - CCAGACT at 1744 -
GGACGCC at 1672 - - ciTGTCT at 1731
- - - ciGGTCT at 1711
- - - ciTGTCT at 1862
- - - GGACA at 1693
- - - GGTCG at 1687
- - ciGAAGCGG at 1636 GGACG at 1670
- - - ciTGACC at 1662
- - ciAGTGCGG at 1590 ciCAACC at 1616
- - CTGCACT at 1472 AGTCG at 1528
- - - ciCGTCT at 1493
- - ciAATGCGG at 1422 -
- - CTGCACT at 1372 ciCGTCT at 1393
- - ciAATGCGG at 1322 -
GCACGCC at 1302 - - -
- - ciAGTGCGG at 1254 ciCAACC at 1280
- - ciAGTGCGG at 1170 -
- - ciAGTGCGG at 1086 -
- - - ciCGACT at 998
- - - ciTGTCC at 993
- - - ciGATCC at 965
- - - AGATC at 964
- - - ciCAACC at 944
- - - ciGGACT at 914
- - - ciCGACT at 898
- - - ciTGTCC at 893
- - - ciGATCC at 865
- - - AGATC at 864
- - - ciCAACC at 844
- - ciGGTGCAG at 784 ciGGACT at 814
- - CCGGACT at 746 AGTCC at 757
- - - ciGGACT at 746
- - - AGACA at 712
- - ciAGTGCGG at 666 ciCATCC at 698
- - ciAGTGCGG at 582 ciCATCC at 629
- - - ciCAACC at 608
- - ciAGTGCGG at 498 -
- - ciGGTGCGG at 489 -
- - ciAGACCGG at 442 -
- - ciGGAGCGA at 429 AGACC at 440
- - - GGACG at 410
- - - AGACG at 398
- - CCACACT at 345 GGACG at 359
- - - GGACG at 323
- - - GGTTC at 305
- - ciAATGTGA at 230 ciTGTCT at 268
- - - GGTCC at 218
- - - GGACC at 187
- - CTGTTTT at 147 AGTCC at 172
- - - AGATG at 166
- - TTGTATT at 115 GGTCA at 153
- - - ciTGTCT at 100
- - ciAGAGTGG at 53 ciTGTCC at 82
- - - GGATG at 59
- - - GGACC at 37
- - - ciCATCC at 30
Cores pp(4445-4265) Cores pp(4445-4265) Cores pp(4445-4265) Cores pp(4445-4265)
- - - GGACC at 4424
- - - AGACC at 4416
- - - GGACC at 4409
- - - ciGGTCT at 4330
- - - ciCGTCT at 4317
- - - ciGAACC at 4300
- - - AGTCA at 4271
Proximals pp(4265-4050) Proximals pp(4265-4050) Proximals pp(4265-4050) Proximals pp(4265-4050)
- - - GGACG at 4231
- - - ciGGACT at 4214
- - - ciGGACT at 4186
- - - ciCGACC at 4177
- - - ciTAACT at 4161
- - - ciGAACT at 4131
- - - ciTGACT at 4089
- - - ciGATCC at 4077
- - - AGATC at 4076
- - - ciTGTCC at 4070
- - - ciGATCT at 4065
- - - AGATC at 4064
- - - AGTCG at 4052
Distals pp(4050-1) Distals pp(4050-1) Distals pp(4050-1) Distals pp(4050-1)
- - - ciCATCT at 4036
- - - GGTCC at 4032
- - - AGTCG at 4023
- - - ciGAACT at 4016
- - - AGTCG at 3997
- - - ciCGACC at 3989
- - - ciTGTCC at 3975
- - - ciCGTCT at 3916
- - - ciGGTCT at 3891
- - - AGTCC at 3868
- - - GGTCA at 3841
- - - ciCGTCT at 3831
- - - ciGGTCT at 3806
- - - GGACC at 3787
- - - ciCGACT at 3778
- - - ciCGTCC at 3768
- - - AGTCG at 3775
- - - GGACC at 3758
- - - ciCATCC at 3753
- - - ciTGACT at 3735
- - - AGTCC at 3728
- - - GGTCG at 3720
- - - ciCGTCC at 3694
- - - GGTCC at 3687
- - - GGACC at 3679
- - - ciCGTCC at 3662
- - - ciTGTCC at 3636
- - - GGTTG at 3633
- - - GGACA at 3622
- - - GGACA at 3617
- - - ciCGACT at 3588
- - - ciTGTCC at 3571
- - - ciGGTCT at 3548
- - - GGTCC at 3536
- - - ciCGACC at 3526
- - - ciGATCC at 3522
- - - GGACC at 3496
- - - ciGATCC at 3484
- - - ciCGTCT at 3473
- - - ciCGTCC at 3466
- - - GGACA at 3434
- - - AGTTA at 3424
- - - ciCATCT at 3416
- - - AGACC at 3405
- - - GGTCA at 3379
- - - GGACC at 3362
- - - AGACG at 3358
- - - ciTGACC at 3345
- - - AGACG at 3306
- - - GGACC at 3296
- - - AGTTG at 3290
- - - AGACG at 3278
- - - AGACG at 3267
- - - AGATA at 3258
- - - ciCGACC at 3242
- - - GGTCG at 3239
- - - ciGGTCT at 3221
- - - ciCGTCT at 3214
- - - ciTGTCT at 3179
- - - AGTCG at 3155
- - - ciCGTCC at 3147
- - - ciTGTCT at 3133
- - - ciCGTCC at 3128
- - - ciTGACC at 3117
- - - GGTCC at 3111
- - - ciGGTCT at 3091
- - - GGTCA at 3082
- - - AGACG at 3060
- - - GGACC at 3047
- - - AGTCG at 3041
- - - AGTCC at 3034
- - - AGACC at 3021
- - - GGTCC at 3016
- - - GGTCA at 2996
- - - GGACC at 2988
- - - AGACC at 2983
- - - AGACG at 2975
- - - ciGGACT at 2968
- - - AGACA at 2957
- - - AGTCA at 2936
- - - AGACA at 2925
- - - ciCGACT at 2915
- - - GGTTA at 2908
- - - GGACC at 2891
- - - AGACC at 2883
- - - GGTCC at 2876
- - - ciCGTCT at 2859
- - - AGACG at 2856
- - - ciTGTCT at 2837
- - - ciCAACC at 2816
- - - ciCGACC at 2810
- - - GGTCC at 2780
- - - ciCGACC at 2770
- - - ciCGTCC at 2745
- - - ciCGACC at 2734
- - - ciCGTCT at 2721
- - - ciCGTCC at 2683
- - - ciTGACT at 2674
- - - ciTGTCT at 2652
- - - ciGATCC at 2639
- - - ciTATCT at 2627
- ciTTTATA at 2588 Butler (2002) CCACACC at 2602 AGTCC at 2620
ciGGCGTGG at 2566 - TTATACC at 2590 AGTTC at 2615
- - - GGTCA at 2605
- - - GGTTC at 2593
- - - GGTCC at 2574
- - - GGACC at 2569
- - - ciTATCC at 2550
- - - ciCAACC at 2541
- - - AGTCG at 2526
- - - GGACG at 2520
- - - AGTTC at 2508
- - - GGACC at 2501
- - - ciGATCC at 2482
- - - GGATC at 2481
- - - GGACC at 2433
- - - ciTGTCT at 2414
- - - ciCGACC at 2405
- - - GGTTC at 2398
- - - AGTCG at 2390
- - - AGTCC at 2372
- - - ciCGACC at 2320
- - - GGTCC at 2316
- - - AGACA at 2308
- - - ciCGTCC at 2296
- - - AGACA at 2260
- - - ciCATCC at 2255
- - - GGACA at 2250
- - - AGTTA at 2233
- - - ciGGTCT at 2228
- - - ciGGACT at 2211
- - - AGTCG at 2198
- - - ciCAACC at 2185
- - - AGACA at 2182
- - - AGATC at 2167
- - - ciTGTCC at 2125
- - - AGTCC at 2115
- - - AGTCG at 2102
- - - AGTCA at 2098
- - - AGTCA at 2060
- - - GGTCG at 2052
- - - GGTCA at 2024
- - - GGTTG at 2012
- - - AGACC at 1992
- - - ciTGTCC at 1966
- - - ciTGACC at 1953
- - - ciCGTCT at 1937
- - - ciCGTCC at 1905
- - - GGTCC at 1893
- - - ciCATCC at 1875
- - - AGACC at 1864
- - - GGACA at 1860
- - - GGTCC at 1855
CCACGCC at 1764 - ciAGTGCAG at 1787 GGACC at 1815
- - - ciGAACC at 1799
- - - ciCGTCC at 1788
- - - ciCGACC at 1779
- - - GGACG at 1776
- - - ciGGTCT at 1742
- - - ciCGACC at 1736
- - - AGACG at 1733
- - - ciGGACT at 1676
- - - ciGGACT at 1660
- - - ciGGTCT at 1631
- - - AGTTG at 1621
- - - AGTCG at 1603
- - - GGATG at 1573
ciGGCGCCG at 1438 - CTGCACT at 1472 AGACG at 1495
- - - AGACC at 1476
- - - GGACG at 1469
- - - GGTCG at 1463
- - - GGTCG at 1457
- - - ciCGTCT at 1416
ciGGCGCCG at 1338 - CTGCACT at 1372 GGACG at 1411
- - - AGACG at 1395
- - - AGACC at 1376
- - - GGACG at 1369
- - - GGTCG at 1363
- - - GGTCG at 1357
- - - ciCGTCT at 1316
- - - GGACG at 1311
- - - ciTGACT at 1286
- - - GGATG at 1283
- - - GGTTG at 1279
- - - GGTCG at 1271
- - - AGTCG at 1267
- - - GGTCA at 1250
- - - GGACC at 1199
- - - GGATG at 1195
- - - GGTCC at 1175
- - - ciTGACC at 1140
- - - GGTCG at 1127
CGACGCC at 1033 - ciAGTGCGG at 1086 GGACG at 1118
- - - GGACG at 1075
- - - GGACA at 991
- - - ciGGACT at 959
- - - GGACC at 947
- - - GGTTG at 943
- - - ciGGTCT at 935
- - - AGTCG at 931
- - - GGACG at 907
- - - GGACA at 891
- - - ciGGACT at 859
- - - GGACC at 847
- - - GGTTG at 843
- - - ciGGTCT at 835
- - - AGTCG at 831
- - - ciCGACC at 779
ciGGCGCGC at 682 - CCGGACT at 746 -
- - - ciGGACT at 725
- - - GGTCC at 707
- - - ciCGTCC at 658
- - - GGATG at 649
- - - GGTCG at 623
- - - GGTCG at 617
- - - AGTCG at 613
- - - GGTTG at 607
- - - GGACC at 598
- - - ciTGTCC at 552
CCACGCC at 489 - ciAGTGCGG at 498 GGTCC at 515
- - - AGTCG at 511
- - - ciGGTCT at 468
- - - ciCGTCT at 438
- - - GGACG at 435
- - - GGTCC at 424
- - - ciCGACC at 417
- - - ciCGTCT at 396
- - - ciCGACC at 386
- - - ciCGTCC at 379
- - - ciTGTCC at 365
- - - ciTGACC at 347
- - - GGTCG at 329
- - - ciCGTCC at 318
- - - GGACC at 286
- - - ciCGACC at 277
- - - AGACC at 270
- - - AGACG at 223
- - - GGTCC at 215
- - - ciGGTCT at 204
- - - ciCGTCC at 194
- - - GGACG at 191
- - - GGTTC at 177
- - - ciTGTCC at 157
- - - GGACA at 144
- - - AGACC at 102
- - - AGACA at 98
- - - AGTCC at 90
- - - GGACC at 40
- - - GGTCC at 33
- - - ciTAACC at 24
- - - ciGGTCT at 15
- - - GGTCC at 8

Acknowledgements

The content on this page was first contributed by: Henry A. Hoff.

Initial content for this page in some instances came from Wikiversity.

See also

References

  1. Jennifer E.F. Butler, James T. Kadonaga (October 15, 2002). "The RNA polymerase II core promoter: a key component in the regulation of gene expression". Genes & Development. 16 (20): 2583–292. doi:10.1101/gad.1026202. PMID 12381658.
  2. 2.0 2.1 Gillian E. Chalkley and C. Peter Verrijzer (September 1, 1999). "DNA binding site selection by RNA polymerase II TAFs: a TAFII250-TAFII150 complex recognizes the Initiator" (PDF). The EMBO Journal. 18 (17): 4835–45. PMID 10469661. Retrieved 2012-04-26.
  3. S. T. Smale (1997). "Transcription initiation from TATA-less promoters within eukaryotic protein-coding genes". Biochim. Biophys. Acta. 1351: 73–88. Retrieved 2012-04-26.
  4. Ceyockey (28 January 2005). promoter. San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2012-09-29.
  5. 5.0 5.1 5.2 Stephen T. Smale and James T. Kadonaga (July 2003). "The RNA Polymerase II Core Promoter" (PDF). Annual Review of Biochemistry. 72 (1): 449–79. doi:10.1146/annurev.biochem.72.121801.161520. PMID 12651739. Retrieved 2012-05-07.
  6. Robert D. Andersen, Susan J. Taplitz, Sandy Wong, Greg Bristol, Bill Larkin, and Harvey R. Herschman (October 1987). "Metal-Dependent Binding of a Factor In Vivo to the Metal-Responsive Elements of the Metallothionein 1 Gene Promoter" (PDF). Molecular and Cellular Biology. 7 (10): 3574–81. doi:10.1128/MCB.7.10.3574. Retrieved 2013-04-15.
  7. Msh210 (23 February 2010). "GC box". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2013-01-27.
  8. Klug WS, Cummings MR, Spencer CA, Palladina, MA (2009). Concepts of Genetics: Ninth Edition. San Francisco: Pearson Benjamin Cummings. pp. 463–464. ISBN 978-0-321-54098-0.
  9. "GC box". San Francisco, California: Wikimedia Foundation, Inc. June 23, 2012. Retrieved 2013-01-27.
  10. 10.0 10.1 10.2 Michael C. Blake, Robert C. Jambou, Andrew G. Swick, Jeanne W. Kahn, and Jane Clifford Azizkhan (December 1990). "Transcriptional Initiation Is Controlled by Upstream GC-Box Interactions in a TATAA-Less Promoter" (PDF). Molecular and Cellular Biology. 10 (12): 6632–41. doi:10.1128/MCB.10.12.6632. PMID 2247077. Retrieved 2013-01-27.
  11. H Imataka, K Sogawa, KI Yasumoto, Y Kikuchi, K Sasano, A Kobayashi, M Hayami, and Y Fujii-Kuriyama (October 1992). "Two regulatory proteins that bind to the basic transcription element (BTE), a GC box sequence in the promoter region of the rat P-4501A1 gene" (PDF). The EMBO Journal. 11 (10): 3663–71. PMID 1356762. Retrieved 2013-01-27.
  12. Akiro Higashikawa, Taku Saito, Toshiyuki Ikeda, Satoru Kamekura, Naohiro Kawamura, Akinori Kan, Yasushi Oshima, Shinsuke Ohba, Naoshi Ogata, Katsushi Takeshita, Kozo Nakamura, Ung-Il Chung, Hiroshi Kawaguchi (January 2009). "Identification of the core element responsive to runt-related transcription factor 2 in the promoter of human type x collagen gene". Arthritis & Rheumatism. 60 (1): 166–78. doi:10.1002/art.24243. PMID 19116917. Retrieved 2013-06-18.
  13. "CAAT box". San Francisco, California: Wikimedia Foundation, Inc. April 8, 2013. Retrieved 2013-04-14.
  14. Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH (1998). "New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB". Genes & Development. 12 (1): 34–44. doi:10.1101/gad.12.1.34. PMC 316406. PMID 9420329.
  15. Littlefield O, Korkhin Y, Sigler PB (1999). "The structural basis for the oriented assembly of a TBP/TFB/promoter complex". Proceedings of the National Academy of Sciences of the USA. 96 (24): 13668–73. doi:10.1073/pnas.96.24.13668. PMC 24122. PMID 10570130.
  16. 16.0 16.1 "B recognition element". San Francisco, California: Wikimedia Foundation, Inc. January 30, 2013. Retrieved 2013-01-30.
  17. 17.0 17.1 Alan K. Kutach, James T. Kadonaga (July 2000). "The Downstream Promoter Element DPE Appears To Be as Widely Used as the TATA Box in Drosophila Core Promoters" (PDF). Molecular and Cellular Biology. 20 (13): 4754–64. PMID 10848601. Retrieved 2012-07-15.
  18. 18.0 18.1 Chuhu Yang, Eugene Bolotin, Tao Jiang, Frances M. Sladek, Ernest Martinez. (March 7, 2007). "Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters". Gene. 389 (1): 52–65. doi:10.1016/j.gene.2006.09.029. PMID 17123746.
  19. Mary Lynch, Li Chen, Michael J. Ravitz, Sapna Mehtani, Kevin Korenblat, Michael J. Pazin and Emmett V. Schmidt (August 2005). "hnRNP K Binds a Core Polypyrimidine Element in the Eukaryotic Translation Initiation Factor 4E (eIF4E) Promoter, and Its Regulation of eIF4E Contributes to Neoplastic Transformation". Molecular and Cellular Biology. 25 (15): 6436–53. doi:10.1128/MCB.25.15.6436-6453.2005. Retrieved 2013-03-17.
  20. "TATA box". San Francisco, California: Wikimedia Foundation, Inc. June 17, 2013. Retrieved 2014-05-07.
  21. National Center for Biotechnology Information (April 28, 2012). "TBPL1 TBP-like 1 [ Homo sapiens ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: U.S. National Library of Medicine. Retrieved 2012-04-30.
  22. 22.0 22.1 Wensheng Deng, Stefan G.E. Roberts (October 15, 2005). "A core promoter element downstream of the TATA box that is recognized by TFIIB". Genes & Development. 19 (20): 2418–23. doi:10.1101/gad.342405. PMID 16230532.
  23. J. Carcamo, L. Buckbinder and D. Reinberg (1991). "The initiator directs the assembly of a transcription factor IID-dependent transcription complex". Proc. Natl. Acad. Sci, USA. 88: 8052–6. Retrieved 2012-04-26.
  24. L. Weis and D. Reinberg (1997). "Accurate positioning of RNA polymerase II on a natural TATA-less promoter is independent of TATA-binding protein associated factors and initiator-binding proteins" (PDF). Mol. Cell. Biol. 17: 2973–84. Retrieved 2012-04-26.
  25. 25.0 25.1 25.2 25.3 Marketa J. Zvelebil, Jeremy O. Baum (2008). Dom Holdsworth, ed. Understanding bioinformatics. New York: Garland Science. p. 772. ISBN 978-0815340249.
  26. 26.00 26.01 26.02 26.03 26.04 26.05 26.06 26.07 26.08 26.09 26.10 26.11 Dong-Hoon Lee, Naum Gershenzon, Malavika Gupta, Ilya P. Ioshikhes, Danny Reinberg and Brian A. Lewis (November 2005). "Functional Characterization of Core Promoter Elements: the Downstream Core Element Is Recognized by TAF1". Molecular and Cellular Biology. 25 (21): 9674–86. doi:10.1128/MCB.25.21.9674-9686.2005. PMID 16227614. Retrieved 2010-10-23.
  27. 27.0 27.1 Chin Yan Lim, Buyung Santoso, Thomas Boulay, Emily Dong, Uwe Ohler, and James T. Kadonaga (July 1, 2004). "The MTE, a new core promoter element for transcription by RNA polymerase II". Genes & Development. 18 (13): 1606–17. doi:10.1101/gad.1193404. PMID 15231738. Retrieved 2013-02-10.
  28. 28.0 28.1 Tamar Juven-Gershon, James T. Kadonaga (March 15, 2010). "Regulation of Gene Expression via the Core Promoter and the Basal Transcriptional Machinery". Developmental Biology. 339 (2): 225–9. doi:10.1016/j.ydbio.2009.08.009. PMC 2830304. PMID 19682982.
  29. "Downstream promoter element". San Francisco, California: Wikimedia Foundation, Inc. May 6, 2012. Retrieved 2012-05-20.
  30. Tamar Juven-Gershon, Susan Cheng & James T Kadonaga (23 October 2006). "Rational design of a super core promoter that enhances gene expression". Nature Methods. 3: 917-922. doi:10.1038/nmeth937.
  31. 31.0 31.1 31.2 Glenn A. Maston, Sara K. Evans, and Michael R. Green (2006). "Transcriptional Regulatory Elements in the Human Genome". Annual Review of Genomics and Human Genetics. 7: 29-59. doi:10.1146/annurev.genom.7.080505.115623. https://www.annualreviews.org/doi/pdf/10.1146/annurev.genom.7.080505.115623.
  32. Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, Lewis BA (2005). "Functional characterization of core promoter elements: the downstream core element is recognized by TAF1". Mol. Cell. Biol. 25:9674–86.
  33. Stephen T. Smale and James T. Kadonaga (July 2003). "The RNA Polymerase II Core Promoter" (PDF). Annual Review of Biochemistry. 72 (1): 449–79. doi:10.1146/annurev.biochem.72.121801.161520. PMID 12651739.
  34. Cquan (2 October 2006). "Promoter (genetics)". San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-01-09.

Further reading

External links

{{Phosphate biochemistry}}