HMG box gene transcriptions

Jump to navigation Jump to search
File:Large lymphocytes-9.JPG
This is a large lymphocyte. Credit: Guy Waterval.{{free media}}

"Upstream Binding Factor (UBF) is important for activation of ribosomal RNA transcription and belongs to a family of proteins containing nucleic acid binding domains, termed HMG-boxes, with similarity to High Mobility Group (HMG) chromosomal proteins."[1]

Chromosomal proteins

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[2]

"Previous studies in lymphocytes have described two DNA-binding HMG box proteins, TCF-1 and LEF-1, with affinity for the A/TA/TCAAAG motif found in several T cell-specific enhancers."[3]

"The high mobility group-1 (HMG) box was originaly identified by Tjian and co-workers in the transcription factor UBF as a region of homology to HMG-1 proteins (Jantzen et al., 1990). UBF reportedly contained four such regions of -80 amino acids; one of these boxes was shown to mediate DNA binding."[3]

"Interestingly, the sequence-specific HMG boxes characterized to date display high afinity to the A/TA/TCAAAG motif despite a low level of amino acid homology (typically <25% identity)."[3]

"Human LEF-1 was originally identified as a T cell-specific protein binding to the TTCAAAG motif in the TCR-α enhancer (Waterman et al., 1991)."[3]

"As analysed by gel retardation, the Sox-4 HMG box indeed bound to the AACAAAG motif (probe MWε-1; Figure 2B, lane 1). As described for other HMG boxes, Sox-4 interacted with DNA bases within the minor groove: substitution of A/T pairs for I/C pairs, which leaves the surface of the minor groove intact (Star and Hawley, 1991), had no apparent effect on binding affinity (lanes 2 and 4)."[3]

Consensus sequences

"In mammals, the Tcf/Lef family consists of four genes: Tcf‐1, Lef‐1, Tcf‐3 and Tcf‐4. All TCF/LEF proteins display several common structural features (48,49). They contain a nearly identical DNA‐binding domain, the HMG box, recognizing the consensus sequence A/T A/T CAAA."[4]

"Both directed and random screen studies have identified a consensus recognition sequence for the HMG DBD; 5′-SCTTTGATS-3′ [...] (van de Wetering et al. 1997; van Beest et al. 2000; Hallikas and Taipale 2006; Atcha et al. 2007)."[5]

"The domain is called the “C clamp” to highlight the absolute requirement for four cysteine residues in DNA binding (Atcha et al. 2007) [...]."[5]

"The C clamp carries specificity for a secondary, GC-rich sequence called a “Helper site” [(C/G)C(C/G)G(C/G)] that can occur with variable spacing and orientation relative to the Wnt response element (Atcha et al. 2007; Chang et al. 2008)."[5]

High mobility group proteins

Gene ID: 6932 is TCF7 transcription factor 7 on 5q31.1: "This gene encodes a member of the T-cell factor/lymphoid enhancer-binding factor family of high mobility group (HMG) box transcriptional activators. This gene is expressed predominantly in T-cells and plays a critical role in natural killer cell and innate lymphoid cell development. The encoded protein forms a complex with beta-catenin and activates transcription through a Wnt/beta-catenin signaling pathway. Mice with a knockout of this gene are viable and fertile, but display a block in T-lymphocyte differentiation. Alternative splicing results in multiple transcript variants. Naturally-occurring isoforms lacking the N-terminal beta-catenin interaction domain may act as dominant negative regulators of Wnt signaling."[6]

  1. NP_001128323.2 transcription factor 7 isoform 3: "Transcript Variant: This variant (3, also known as A), differs in the 5' UTR, has multiple coding region differences, uses a downstream start codon, and differs in the 3' UTR, compared to variant 1. The resulting isoform (3) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  2. NP_001333354.1 transcription factor 7 isoform 5: "Transcript Variant: This variant (8) contains an alternate exon in the coding region, compared to variant 1. The resulting isoform (5) is longer, compared to isoform 1."[6]
  3. NP_001333379.1 transcription factor 7 isoform 7: "Transcript Variant: This variant (9) differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (7) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  4. NP_001353431.1 transcription factor 7 isoform 8 [variant 10].[6]
  5. NP_003193.2 transcription factor 7 isoform 1: "Transcript Variant: This variant (1) encodes isoform (1)."[6]
  6. NP_963963.1 transcription factor 7 isoform 2: "Transcript Variant: This variant (2, also known as B), differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (2) is shorter at the N-terminus, compared to isoform 1. Both variants 2 and 5 encode the same isoform."[6]
  7. NP_963965.1 transcription factor 7 isoform 4: "Transcript Variant: This variant (4, also known as C), differs in the 5' UTR, has multiple coding region differences, uses a downstream start codon, and differs in the 3' UTR, compared to variant 1. The resulting isoform (4) is shorter at the N-terminus and has a distinct C-terminus, compared to isoform 1."[6]
  8. NP_998813.1 transcription factor 7 isoform 2: "Transcript Variant: This variant (5) differs in the 5' UTR, has multiple coding region differences, and uses a downstream start codon, compared to variant 1. The resulting isoform (2) is shorter at the N-terminus, compared to isoform 1. Both variants 2 and 5 encode the same isoform."[6]

Gene ID: 6934 is TCF7L2 transcription factor 7 like 2 on 10q25.2-q25.3: "This gene encodes a high mobility group (HMG) box-containing transcription factor that plays a key role in the Wnt signaling pathway. The protein has been implicated in blood glucose homeostasis. Genetic variants of this gene are associated with increased risk of type 2 diabetes. Several transcript variants encoding multiple different isoforms have been found for this gene."[7]

  1. NP_001139746.1 transcription factor 7-like 2 isoform 1: "Transcript Variant: This variant (1) encodes the longest isoform."[7]
  2. NP_001139755.1 transcription factor 7-like 2 isoform 3: "Transcript Variant: This variant (3) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 3) has a distinct C-terminus and is shorter than isoform 1."[7]
  3. NP_001139756.1 transcription factor 7-like 2 isoform 4: "Transcript Variant: This variant (4) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 4) has a distinct C-terminus and is shorter than isoform 1."[7]
  4. NP_001139757.1 transcription factor 7-like 2 isoform 5: "Transcript Variant: This variant (5) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 5, which is shorter than isoform 1."[7]
  5. NP_001139758.1 transcription factor 7-like 2 isoform 6: "Transcript Variant: This variant (6) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 6) has a distinct C-terminus and is shorter than isoform 1."[7]
  6. NP_001185454.1 transcription factor 7-like 2 isoform 7: "Transcript Variant: This variant (7) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 7) has a distinct C-terminus and is shorter than isoform 1."[7]
  7. NP_001185455.1 transcription factor 7-like 2 isoform 8: "Transcript Variant: This variant (8) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 8, which is shorter than isoform 1."[7]
  8. NP_001185456.1 transcription factor 7-like 2 isoform 9: "Transcript Variant: This variant (9) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 9) has a distinct C-terminus and is shorter than isoform 1."[7]
  9. NP_001185457.1 transcription factor 7-like 2 isoform 10: "Transcript Variant: This variant (10) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 10) has a distinct C-terminus and is shorter than isoform 1."[7]
  10. NP_001185458.1 transcription factor 7-like 2 isoform 11: "Transcript Variant: This variant (11) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 11) has a distinct C-terminus and is shorter than isoform 1."[7]
  11. NP_001185459.1 transcription factor 7-like 2 isoform 12: "Transcript Variant: This variant (12) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 12) has a distinct C-terminus and is shorter than isoform 1."[7]
  12. NP_001185460.1 transcription factor 7-like 2 isoform 13: "Transcript Variant: This variant (13) has multiple differences in the coding region, compared to variant 1, one of which results in a translational frameshift. The resulting protein (isoform 13) has a distinct C-terminus and is shorter than isoform 1."[7]
  13. NP_001336799.1 transcription factor 7-like 2 isoform 14: "Transcript Variant: This variant (14) lacks alternate exons in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform (14) has a distinct N-terminus and is shorter than isoform 1."[7]
  14. NP_001336800.1 transcription factor 7-like 2 isoform 15: "Transcript Variant: This variant (15) lacks alternate exons in the 5' UTR, lacks a portion of the 5' coding region, and initiates translation at an alternate start codon, compared to variant 1. The encoded isoform (15) has a distinct N-terminus and is shorter than isoform 1."[7]
  15. NP_001350430.1 transcription factor 7-like 2 isoform 16 [variant 16].[7]
  16. NP_001354872.1 transcription factor 7-like 2 isoform 17 [variant 17].[7]
  17. NP_110383.2 transcription factor 7-like 2 isoform 2: "Transcript Variant: This variant (2) has multiple differences in the coding region but maintains the reading frame, compared to variant 1. This variant encodes isoform 2, which is shorter than isoform 1."[7]

Gene ID: 51176 is LEF1 lymphoid enhancer binding factor 1: "This gene encodes a transcription factor belonging to a family of proteins that share homology with the high mobility group protein-1. The protein encoded by this gene can bind to a functionally important site in the T-cell receptor-alpha enhancer, thereby conferring maximal enhancer activity. This transcription factor is involved in the Wnt signaling pathway, and it may function in hair cell differentiation and follicle morphogenesis. Mutations in this gene have been found in somatic sebaceous tumors. This gene has also been linked to other cancers, including androgen-independent prostate cancer. Alternative splicing results in multiple transcript variants."[8]

  1. NP_001124185.1 lymphoid enhancer-binding factor 1 isoform 2: "Transcript Variant: This variant (2) lacks an alternate in-frame exon in the central coding region, compared to variant 1, resulting in an isoform (2) that is shorter than isoform 1. [...] SOX-TCF_HMG-box, class I member of the HMG-box superfamily of DNA-binding proteins. These proteins contain a single HMG box, and bind the minor groove of DNA in a highly sequence-specific manner. Members include SRY and its homologs in insects and vertebrates, and transcription factor-like proteins, TCF-1, -3, -4, and LEF-1. They appear to bind the minor groove of the A/T C A A A G/C-motif."[8]
  2. NP_001124186.1 lymphoid enhancer-binding factor 1 isoform 3: "Transcript Variant: This variant (3) lacks both an in-frame exon in the central coding region and an exon in the 3' coding region that causes a frameshift, compared to variant 1. The encoded isoform (3) has a distinct C-terminus and is shorter than isoform 1."[8]
  3. NP_001159591.1 lymphoid enhancer-binding factor 1 isoform 4: "Transcript Variant: This variant (4) differs in the 5' UTR and 5' coding region, and lacks an alternate in-frame exon in the central coding region, compared to variant 1. The encoded isoform (4) has a distinct N-terminus and is shorter than isoform 1."[8]
  4. NP_057353.1 lymphoid enhancer-binding factor 1 isoform 1: "Transcript Variant: This variant (1) represents the longest transcript and encodes the longest isoform (1)."[8]

HMG box samplings

Copying a responsive elements consensus sequence (A/T)(A/T)CAAAG and putting the sequence in "⌘F" finds none between ZNF497 and A1BG or none between ZSCAN22 and A1BG as can be found by the computer programs.

For the Basic programs testing consensus sequence (A/T)(A/T)CAAAG (starting with SuccessablesHMG.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. negative strand, negative direction, looking for (A/T)(A/T)CAAAG, 0.
  2. negative strand, positive direction, looking for (A/T)(A/T)CAAAG, 0.
  3. positive strand, negative direction, looking for (A/T)(A/T)CAAAG, 1, ATCAAAG at 2891.
  4. positive strand, positive direction, looking for (A/T)(A/T)CAAAG, 0.
  5. complement, negative strand, negative direction, looking for (A/T)(A/T)GTTTC, 1, TAGTTTC at 2891.
  6. complement, negative strand, positive direction, looking for (A/T)(A/T)GTTTC, 0.
  7. complement, positive strand, negative direction, looking for (A/T)(A/T)GTTTC, 0.
  8. complement, positive strand, positive direction, looking for (A/T)(A/T)GTTTC, 0.
  9. inverse complement, negative strand, negative direction, looking for CTTTG(A/T)(A/T), 2, CTTTGTT at 1585, CTTTGTT at 229.
  10. inverse complement, negative strand, positive direction, looking for CTTTG(A/T)(A/T), 0.
  11. inverse complement, positive strand, negative direction, looking for CTTTG(A/T)(A/T), 0.
  12. inverse complement, positive strand, positive direction, looking for CTTTG(A/T)(A/T), 0.
  13. inverse negative strand, negative direction, looking for GAAAC(A/T)(A/T), 0.
  14. inverse negative strand, positive direction, looking for GAAAC(A/T)(A/T), 0.
  15. inverse positive strand, negative direction, looking for GAAAC(A/T)(A/T), 2, GAAACAA at 1585, GAAACAA at 229.
  16. inverse positive strand, positive direction, looking for GAAAC(A/T)(A/T), 0.

HMG UTRs

  1. Positive strand, negative direction: ATCAAAG at 2891.

HMG distal promoters

  1. Negative strand, negative direction: CTTTGTT at 1585, CTTTGTT at 229.

HMG random dataset samplings

  1. HMGr0: 3, TTCAAAG at 4166, AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr1: 2, TACAAAG at 3722, TACAAAG at 2071.
  3. HMGr2: 0.
  4. HMGr3: 1, AACAAAG at 278.
  5. HMGr4: 3, TACAAAG at 3777, AACAAAG at 3593, ATCAAAG at 672.
  6. HMGr5: 1, TACAAAG at 3734.
  7. HMGr6: 1, TACAAAG at 1499.
  8. HMGr7: 2, ATCAAAG at 2949, TTCAAAG at 252.
  9. HMGr8: 4, AACAAAG at 2658, ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  10. HMGr9: 0.
  11. HMGr0ci: 5, CTTTGTT at 4178, CTTTGAT at 3842, CTTTGAA at 3415, CTTTGTA at 986, CTTTGTT at 617.
  12. HMGr1ci: 1, CTTTGTT at 1640.
  13. HMGr2ci: 0.
  14. HMGr3ci: 0.
  15. HMGr4ci: 0.
  16. HMGr5ci: 1, CTTTGTT at 1983.
  17. HMGr6ci: 2, CTTTGAA at 1712, CTTTGAT at 257.
  18. HMGr7ci: 0.
  19. HMGr8ci: 3, CTTTGTA at 2944, CTTTGTT at 1167, CTTTGAT at 1149.
  20. HMGr9ci: 0.

HMGr arbitrary UTRs

  1. HMGr0: TTCAAAG at 4166, AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr4: TACAAAG at 3777, AACAAAG at 3593.
  3. HMGr0ci: CTTTGTT at 4178, CTTTGAT at 3842, CTTTGAA at 3415.
  4. HMGr8ci: CTTTGTA at 2944.

HMGr alternate UTRs

  1. HMGr1: TACAAAG at 3722.
  2. HMGr5: TACAAAG at 3734.
  3. HMGr7: ATCAAAG at 2949.

HMGr arbitrary negative direction proximal promoters

  1. HMGr8: AACAAAG at 2658.

HMGr alternate positive direction proximal promoters

  1. HMGr0: TTCAAAG at 4166.
  2. HMGr0ci: CTTTGTT at 4178.

HMGr arbitrary negative direction distal promoters

  1. HMGr4: ATCAAAG at 672.
  2. HMGr6: TACAAAG at 1499.
  3. HMGr8: ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  4. HMGr0ci: CTTTGTA at 986, CTTTGTT at 617.
  5. HMGr6ci: CTTTGAA at 1712, CTTTGAT at 257.
  6. HMGr8ci: CTTTGTT at 1167, CTTTGAT at 1149.

HMGr alternate negative direction distal promoters

  1. HMGr1: TACAAAG at 2071.
  2. HMGr3: AACAAAG at 278.
  3. HMGr7: TTCAAAG at 252.
  4. HMGr1ci: CTTTGTT at 1640.
  5. HMGr5ci: CTTTGTT at 1983.

HMGr arbitrary positive direction distal promoters

  1. HMGr1: TACAAAG at 3722, TACAAAG at 2071.
  2. HMGr3: AACAAAG at 278.
  3. HMGr5: TACAAAG at 3734.
  4. HMGr7: ATCAAAG at 2949, TTCAAAG at 252.
  5. HMGr1ci: CTTTGTT at 1640.
  6. HMGr5ci: CTTTGTT at 1983.

HMGr alternate positive direction distal promoters

  1. HMGr0: AACAAAG at 3503, TTCAAAG at 3338.
  2. HMGr4: TACAAAG at 3777, AACAAAG at 3593, ATCAAAG at 672.
  3. HMGr6: TACAAAG at 1499.
  4. HMGr8: AACAAAG at 2658, ATCAAAG at 1142, TTCAAAG at 935, AACAAAG at 620.
  5. HMGr0ci: CTTTGAT at 3842, CTTTGAA at 3415, CTTTGTA at 986, CTTTGTT at 617.
  6. HMGr6ci: CTTTGAA at 1712, CTTTGAT at 257.
  7. HMGr8ci: CTTTGTA at 2944, CTTTGTT at 1167, CTTTGAT at 1149.

HMG box analysis and results

"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[2]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 1 2 0.5 0.5 ± 0.5 (--0,+-1)
Randoms UTR arbitrary negative 9 10 0.9 0.6
Randoms UTR alternate negative 3 10 0.3 0.6
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 1 10 0.1 0.05
Randoms Proximal alternate negative 0 10 0 0.05
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0.1
Randoms Proximal alternate positive 2 10 0.2 0.1
Reals Distal negative 2 2 1 1 ± 1 (--2,+-0)
Randoms Distal arbitrary negative 11 10 1.1 0.8
Randoms Distal alternate negative 5 10 0.5 0.8
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 8 10 0.8 1.35
Randoms Distal alternate positive 19 10 1.9 1.35

Comparison:

The occurrences of real HMG boxes are greater than the randoms. This suggests that the real HMG boxes are likely active or activable.

Helper site samplings

For the Basic programs testing consensus sequence (C/G)C(C/G)G(C/G) (starting with SuccessablesHelp.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:

  1. Negative strand, negative direction: 16, CCGGG at 3929, CCGGC at 3874, CCGGG at 3576, CCCGG at 3567, CCCGC at 3044, GCCGC at 2726, CCCGC at 2723, CCCGC at 2012, CCCGC at 1894, CCCGC at 1808, CCCGC at 1759, CCCGC at 1241, GCGGC at 1154, CCGGC at 512, CCCGG at 511, CCGGC at 373.
  2. Positive strand, negative direction: 17, GCCGG at 4324, GCGGG at 4000, GCGGG at 3091, GCGGC at 2725, CCGGG at 2318, GCCGG at 2317, CCGGG at 2192, GCGGC at 1753, GCGGG at 1681, GCGGG at 1251, CCGGG at 1239, GCCGG at 1238, GCGGC at 957, CCGGG at 514, GCCGG at 513, CCGGG at 375, GCCGG at 374.
  3. Negative strand, positive direction: 64, GCGGG at 4440, GCGGG at 4430, CCGGG at 4245, CCCGC at 4237, CCGGC at 4003, CCCGG at 4002, GCGGG at 3671, CCGGG at 3558, CCCGG at 3557, CCCGC at 3325, GCCGC at 3226, GCGGG at 2486, GCCGC at 2355, GCGGC at 1902, CCGGC at 1847, GCGGC at 1794, GCGGG at 1765, GCCGG at 1759, GCCGC at 1756, GCGGG at 1707, GCGGG at 1681, GCCGC at 1648, GCGGC at 1637, GCGGG at 1591, GCGGC at 1582, CCGGC at 1547, GCGGC at 1438, GCGGC at 1423, GCGGC at 1338, GCGGC at 1323, CCGGC at 1295, GCGGC at 1255, CCGGC at 1211, GCGGC at 1171, GCGGC at 1148, CCGGC at 1043, GCGGC at 1034, GCGGG at 1026, GCGGC at 1003, GCGGG at 972, CCGGG at 911, GCCGG at 910, GCCGC at 903, GCGGG at 872, CCGGG at 811, GCCGG at 810, GCGGC at 751, CCCGG at 743, GCGGC at 721, GCGGC at 667, GCGGC at 637, GCGGC at 583, GCGGC at 499, GCGGG at 490, GCCGG at 484, CCGGG at 477, GCCGG at 476, CCGGG at 443, GCGGG at 407, CCCGC at 393, CCGGC at 376, CCCGG at 375, GCGGC at 354, GCGGC at 332.
  4. Positive strand, positive direction: 68, CCCGC at 4438, CCCGC at 4428, CCCGG at 4304, GCGGG at 4292, CCGGG at 4228, CCCGG at 4227, CCGGG at 3500, CCCGG at 3499, GCCGC at 1918, CCCGC at 1900, GCCGG at 1848, GCCGG at 1795, CCCGC at 1792, GCGGG at 1770, CCCGC at 1767, GCGGC at 1758, CCGGC at 1755, CCCGG at 1754, CCGGG at 1739, CCCGC at 1717, GCGGG at 1673, GCGGG at 1657, CCGGC at 1647, CCCGG at 1646, GCCGC at 1583, CCGGG at 1570, CCCGC at 1562, GCCGC at 1548, GCGGG at 1499, CCGGC at 1486, GCGGG at 1399, CCGGC at 1386, GCCGC at 1296, GCGGG at 1247, CCCGC at 1226, GCCGC at 1212, GCCGG at 1172, GCGGC at 1163, CCGGG at 1150, GCCGG at 1149, GCGGC at 1079, GCCGC at 1044, GCCGG at 981, CCCGC at 974, CCGGC at 950, GCCGG at 881, CCCGC at 874, CCGGC at 850, CCGGC at 765, GCCGG at 764, GCCGG at 722, GCCGC at 638, GCCGC at 540, GCGGG at 453, CCCGC at 445, CCGGG at 421, CCCGG at 420, CCCGC at 405, CCGGG at 390, CCCGG at 389, CCGGG at 372, GCCGC at 355, CCCGC at 352, GCCGG at 326, CCCGG at 283, CCCGG at 248, CCGGG at 200, CCGGG at 93.
  5. Helper boxes (Helper)s ci(C/G)C(C/G)G(C/G) = direct(C/G)C(C/G)G(C/G).

Helper (4560-2846) UTRs

  1. Negative strand, negative direction: CCGGG at 3929, CCGGC at 3874, CCGGG at 3576, CCCGG at 3567, CCCGC at 3044.
  2. Positive strand, negative direction: GCCGG at 4324, GCGGG at 4000, GCGGG at 3091.

Helper positive direction (4445-4265) core promoters

  1. Negative strand, positive direction: GCGGG at 4440, GCGGG at 4430.
  2. Positive strand, positive direction: CCCGC at 4438, CCCGC at 4428, CCCGG at 4304, GCGGG at 4292.

Helper negative direction (2811-2596) proximal promoters

  1. Negative strand, negative direction: GCCGC at 2726, CCCGC at 2723.
  2. Positive strand, negative direction: GCGGC at 2725.

Helper positive direction (4265-4050) proximal promoters

  1. Negative strand, positive direction: CCGGG at 4245, CCCGC at 4237.
  2. Positive strand, positive direction: CCGGG at 4228, CCCGG at 4227.

Helper negative direction (2596-1) distal promoters

  1. Negative strand, negative direction: CCCGC at 2012, CCCGC at 1894, CCCGC at 1808, CCCGC at 1759, CCCGC at 1241, GCGGC at 1154, CCGGC at 512, CCCGG at 511, CCGGC at 373.
  2. Positive strand, negative direction: CCGGG at 2318, GCCGG at 2317, CCGGG at 2192, GCGGC at 1753, GCGGG at 1681, GCGGG at 1251, CCGGG at 1239, GCCGG at 1238, GCGGC at 957, CCGGG at 514, GCCGG at 513, CCGGG at 375, GCCGG at 374.

Helper positive direction (4050-1) distal promoters

  1. Negative strand, positive direction: CCGGC at 4003, CCCGG at 4002, GCGGG at 3671, CCGGG at 3558, CCCGG at 3557, CCCGC at 3325, GCCGC at 3226, GCGGG at 2486, GCCGC at 2355, GCGGC at 1902, CCGGC at 1847, GCGGC at 1794, GCGGG at 1765, GCCGG at 1759, GCCGC at 1756, GCGGG at 1707, GCGGG at 1681, GCCGC at 1648, GCGGC at 1637, GCGGG at 1591, GCGGC at 1582, CCGGC at 1547, GCGGC at 1438, GCGGC at 1423, GCGGC at 1338, GCGGC at 1323, CCGGC at 1295, GCGGC at 1255, CCGGC at 1211, GCGGC at 1171, GCGGC at 1148, CCGGC at 1043, GCGGC at 1034, GCGGG at 1026, GCGGC at 1003, GCGGG at 972, CCGGG at 911, GCCGG at 910, GCCGC at 903, GCGGG at 872, CCGGG at 811, GCCGG at 810, GCGGC at 751, CCCGG at 743, GCGGC at 721, GCGGC at 667, GCGGC at 637, GCGGC at 583, GCGGC at 499, GCGGG at 490, GCCGG at 484, CCGGG at 477, GCCGG at 476, CCGGG at 443, GCGGG at 407, CCCGC at 393, CCGGC at 376, CCCGG at 375, GCGGC at 354, GCGGC at 332.
  2. Positive strand, positive direction: CCGGG at 3500, CCCGG at 3499, GCCGC at 1918, CCCGC at 1900, GCCGG at 1848, GCCGG at 1795, CCCGC at 1792, GCGGG at 1770, CCCGC at 1767, GCGGC at 1758, CCGGC at 1755, CCCGG at 1754, CCGGG at 1739, CCCGC at 1717, GCGGG at 1673, GCGGG at 1657, CCGGC at 1647, CCCGG at 1646, GCCGC at 1583, CCGGG at 1570, CCCGC at 1562, GCCGC at 1548, GCGGG at 1499, CCGGC at 1486, GCGGG at 1399, CCGGC at 1386, GCCGC at 1296, GCGGG at 1247, CCCGC at 1226, GCCGC at 1212, GCCGG at 1172, GCGGC at 1163, CCGGG at 1150, GCCGG at 1149, GCGGC at 1079, GCCGC at 1044, GCCGG at 981, CCCGC at 974, CCGGC at 950, GCCGG at 881, CCCGC at 874, CCGGC at 850, CCGGC at 765, GCCGG at 764, GCCGG at 722, GCCGC at 638, GCCGC at 540, GCGGG at 453, CCCGC at 445, CCGGG at 421, CCCGG at 420, CCCGC at 405, CCGGG at 390, CCCGG at 389, CCGGG at 372, GCCGC at 355, CCCGC at 352, GCCGG at 326, CCCGG at 283, CCCGG at 248, CCGGG at 200, CCGGG at 93.

Helper site random dataset samplings

  1. RDr0: 0.
  2. RDr1: 0.
  3. RDr2: 0.
  4. RDr3: 0.
  5. RDr4: 0.
  6. RDr5: 0.
  7. RDr6: 0.
  8. RDr7: 0.
  9. RDr8: 0.
  10. RDr9: 0.
  11. RDr0ci: 0.
  12. RDr1ci: 0.
  13. RDr2ci: 0.
  14. RDr3ci: 0.
  15. RDr4ci: 0.
  16. RDr5ci: 0.
  17. RDr6ci: 0.
  18. RDr7ci: 0.
  19. RDr8ci: 0.
  20. RDr9ci: 0.

RDr arbitrary (evens) (4560-2846) UTRs

RDr alternate (odds) (4560-2846) UTRs

RDr arbitrary negative direction (evens) (2846-2811) core promoters

RDr alternate negative direction (odds) (2846-2811) core promoters

RDr arbitrary positive direction (odds) (4445-4265) core promoters

RDr alternate positive direction (evens) (4445-4265) core promoters

RDr arbitrary negative direction (evens) (2811-2596) proximal promoters

RDr alternate negative direction (odds) (2811-2596) proximal promoters

RDr arbitrary positive direction (odds) (4265-4050) proximal promoters

RDr alternate positive direction (evens) (4265-4050) proximal promoters

RDr arbitrary negative direction (evens) (2596-1) distal promoters

RDr alternate negative direction (odds) (2596-1) distal promoters

RDr arbitrary positive direction (odds) (4050-1) distal promoters

RDr alternate positive direction (evens) (4050-1) distal promoters

Helper site analysis and results

"The C clamp carries specificity for a secondary, GC-rich sequence called a “Helper site” [(C/G)C(C/G)G(C/G)] that can occur with variable spacing and orientation relative to the Wnt response element (Atcha et al. 2007; Chang et al. 2008)."[5]

Reals or randoms Promoters direction Numbers Strands Occurrences Averages (± 0.1)
Reals UTR negative 0 2 0 0
Randoms UTR arbitrary negative 0 10 0 0
Randoms UTR alternate negative 0 10 0 0
Reals Core negative 0 2 0 0
Randoms Core arbitrary negative 0 10 0 0
Randoms Core alternate negative 0 10 0 0
Reals Core positive 0 2 0 0
Randoms Core arbitrary positive 0 10 0 0
Randoms Core alternate positive 0 10 0 0
Reals Proximal negative 0 2 0 0
Randoms Proximal arbitrary negative 0 10 0 0
Randoms Proximal alternate negative 0 10 0 0
Reals Proximal positive 0 2 0 0
Randoms Proximal arbitrary positive 0 10 0 0
Randoms Proximal alternate positive 0 10 0 0
Reals Distal negative 0 2 0 0
Randoms Distal arbitrary negative 0 10 0 0
Randoms Distal alternate negative 0 10 0 0
Reals Distal positive 0 2 0 0
Randoms Distal arbitrary positive 0 10 0 0
Randoms Distal alternate positive 0 10 0 0

Comparison:

The occurrences of real Helpers are greater than the randoms. This suggests that the real Helpers are likely active or activable.

See also

References

  1. Gregory P. Copenhaver, Christopher D. Putnam, Michael L. Denton and Craig S. Pikaard (1994). "The RNA polymerase I transcription factor UBF is a sequence-tolerant HMG-box protein that can recognize structured nucleic acids" (PDF). Nucleic Acids Research. 22 (13): 2651–7. Retrieved 2017-04-05.
  2. 2.0 2.1 Vincent Laudet, Dominique Stehelin and Hans Clevers (1993). "Ancestry and diversity of the HMG box superfamily" (PDF). Nucleic Acids Research. 21 (10): 2493–501. Retrieved 2017-04-05.
  3. 3.0 3.1 3.2 3.3 3.4 Marc van de Wetering, Mariette Oosterwegel, Klaske van Norren and Hans Clevers (1993). "Sox-4, an Sry-like HMG box protein, is a transcriptional activator in lymphocytes" (PDF). The EMBO Journal. 12 (10): .3847–3854. Retrieved 2017-02-13.
  4. Tomas Valenta, Jan Lukas, Vladimir Korinek (2003). "HMG box transcription factor TCF‐4's interaction with CtBP1 controls the expression of the Wnt target Axin2/Conductin in human embryonic kidney cells". Nucleic Acids Research. 31 (9): 2369–80. doi:10.1093/nar/gkg346. Retrieved 2017-04-05.
  5. 5.0 5.1 5.2 5.3 Ken M. Cadigan and Marian L. Waterman (November 2012). "TCF/LEFs and Wnt Signaling in the Nucleus". Cold Spring Harbor Perspectives in Biology. 4 (11): a007906. doi:10.1101/cshperspect.a007906. PMID 23024173. Retrieved 2023-05-05.
  6. 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 RefSeq (October 2016). "TCF7 transcription factor 7 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 30 April 2020.
  7. 7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 7.09 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 RefSeq (8 February 2019). "TCF7L2 transcription factor 7 like 2 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 30 April 2020.
  8. 8.0 8.1 8.2 8.3 8.4 RefSeq (October 2009). "LEF1 lymphoid enhancer binding factor 1 [ Homo sapiens (human) ]". 8600 Rockville Pike, Bethesda MD, 20894 USA: National Center for Biotechnology Information, U.S. National Library of Medicine. Retrieved 5 April 2020.

External links