Kozak sequence gene transcriptions
Associate Editor(s)-in-Chief: Henry A. Hoff
The Kozak sequence is a nucleic acid motif that functions as the protein translation initiation site in most eukaryotic mRNA transcripts.[1] Regarded as the optimum sequence for initiating translation in eukaryotes, the sequence is an integral aspect of protein regulation and overall cellular health as well as having implications in human disease.[1][2]
A wrong start site can result in non-functional proteins.[3]
As it has become more studied, expansions of the nucleotide sequence, bases of importance, and notable exceptions have arisen.[1][4][5]
The sequence was discovered through a detailed analysis of DNA genomic sequences.[6]
The Kozak Sequence was determined by sequencing of 699 vertebrate mRNAs and verified by site-directed mutagenesis.[7] While initially limited to a subset of vertebrates (i.e. human, cow, cat, dog, chicken, guinea pig, hamster, mouse, pig, rabbit, sheep, and Xenopus), subsequent studies confirmed its conservation in higher eukaryotes generally.[1] The sequence was defined as 5'-(gcc)gccRccATGG-3' IUPAC nucleobase notation.[7]
Human genes
Consensus sequences
Kozak consensus sequence is GAAAATGG.[8]
Consensus sequence for the Kozak is 5'-(GCC)GCC(A/G)CCATGG-3'.[7]
GCC box
See GCC box samplings to see that GCCGCC is present in A1BG promoters but not TSS ± 50.
CCA box samplings
For the Basic programs testing consensus sequence CCATGG (starting with SuccessablesCCA.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for CCATGG, 0.
- positive strand, negative direction, looking for CCATGG, 0.
- positive strand, positive direction, looking for CCATGG, 0.
- negative strand, positive direction, looking for CCATGG, 2, CCATGG at 4222, CCATGG at 3581.
- complement, negative strand, negative direction, looking for GGTACC, 0.
- complement, positive strand, negative direction, looking for GGTACC, 0.
- complement, positive strand, positive direction, looking for GGTACC, 2, GGTACC at 4222, GGTACC at 3581.
- complement, negative strand, positive direction, looking for GGTACC, 0.
- inverse complement, negative strand, negative direction, looking for CCATGG, 0.
- inverse complement, positive strand, negative direction, looking for CCATGG, 0.
- inverse complement, positive strand, positive direction, looking for CCATGG, 0.
- inverse complement, negative strand, positive direction, looking for CCATGG, 2, CCATGG at 4222, CCATGG at 3581.
- inverse positive strand, negative direction, looking for GGTACC, 0.
- inverse negative strand, negative direction, looking for GGTACC, 0.
- inverse positive strand, positive direction, looking for GGTACC, 2, GGTACC at 4222, GGTACC at 3581.
- inverse negative strand, positive direction, looking for GGTACC, 0.
(Kozak) samplings
Copying an apparent consensus sequence for the Kozak sequence of (GCC)GCC(A/G)CCATGG or GCCACCAT and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
For the Basic programs testing consensus sequence GCCGCC(A/G)CCATGG (starting with SuccessablesKoz.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for GCCGCC(A/G)CCATGG, 0.
- positive strand, negative direction, looking for GCCGCC(A/G)CCATGG, 0.
- positive strand, positive direction, looking for GCCGCC(A/G)CCATGG, 0.
- negative strand, positive direction, looking for GCCGCC(A/G)CCATGG, 0.
- complement, negative strand, negative direction, looking for CGGCGG(C/T)GGTACC, 0.
- complement, positive strand, negative direction, looking for CGGCGG(C/T)GGTACC, 0.
- complement, positive strand, positive direction, looking for CGGCGG(C/T)GGTACC, 0.
- complement, negative strand, positive direction, looking for CGGCGG(C/T)GGTACC, 0.
- inverse complement, negative strand, negative direction, looking for CCATGG(C/T)GGCGGC, 0.
- inverse complement, positive strand, negative direction, looking for CCATGG(C/T)GGCGGC, 0.
- inverse complement, positive strand, positive direction, looking for CCATGG(C/T)GGCGGC, 0.
- inverse complement, negative strand, positive direction, looking for CCATGG(C/T)GGCGGC, 0.
- inverse positive strand, negative direction, looking for GGTACC(A/G)CCGCCG, 0.
- inverse negative strand, negative direction, looking for GGTACC(A/G)CCGCCG, 0.
- inverse positive strand, positive direction, looking for GGTACC(A/G)CCGCCG, 0.
- inverse negative strand, positive direction, looking for GGTACC(A/G)CCGCCG, 0.
(Matsumoto) samplings
Copying an apparent consensus sequence for the Kozak sequence of GAAAATGG and putting it in "⌘F" finds none located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
For the Basic programs testing consensus sequence GAAAATGG (starting with SuccessablesKozM.bas) written to compare nucleotide sequences with the sequences on either the template strand (-), or coding strand (+), of the DNA, in the negative direction (-), or the positive direction (+), the programs are, are looking for, and found:
- negative strand, negative direction, looking for GAAAATGG, 0.
- positive strand, negative direction, looking for GAAAATGG, 0.
- positive strand, positive direction, looking for GAAAATGG, 0.
- negative strand, positive direction, looking for GAAAATGG, 0.
- complement, negative strand, negative direction, looking for CTTTTACC, 0.
- complement, positive strand, negative direction, looking for CTTTTACC, 0.
- complement, positive strand, positive direction, looking for CTTTTACC, 0.
- complement, negative strand, positive direction, looking for CTTTTACC, 0.
- inverse complement, negative strand, negative direction, looking for CCATTTTC, 0.
- inverse complement, positive strand, negative direction, looking for CCATTTTC, 0.
- inverse complement, positive strand, positive direction, looking for CCATTTTC, 0.
- inverse complement, negative strand, positive direction, looking for CCATTTTC, 0.
- inverse negative strand, negative direction, looking for GGTAAAAG, 0.
- inverse positive strand, negative direction, looking for GGTAAAAG, 0.
- inverse positive strand, positive direction, looking for GGTAAAAG, 0.
- inverse negative strand, positive direction, looking for GGTAAAAG, 0.
Acknowledgements
The content on this page was first contributed by: Henry A. Hoff.
See also
References
- ↑ 1.0 1.1 1.2 1.3 Kozak, Marilyn (February 1989). "The scanning model for translation: an update". The Journal of Cell Biology. 108 (2): 229–241. doi:10.1083/jcb.108.2.229. ISSN 0021-9525. PMID 2645293.
- ↑ Kozak, Marilyn (2002-10-16). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1): 1–34. doi:10.1016/S0378-1119(02)01056-9. ISSN 0378-1119. PMID 12459250.
- ↑ Kozak, Marilyn (1999-07-08). "Initiation of translation in prokaryotes and eukaryotes". Gene. 234 (2): 187–208. doi:10.1016/S0378-1119(99)00210-3. ISSN 0378-1119. PMID 10395892.
- ↑ De Angioletti M, Lacerra G, Sabato V, Carestia C (2004). "Beta+45 G --> C: a novel silent beta-thalassaemia mutation, the first in the Kozak sequence". British Journal of Haematology. 124 (2): 224–31. doi:10.1046/j.1365-2141.2003.04754.x. PMID 14687034.
- ↑ Hernández, Greco; Osnaya, Vincent G.; Pérez-Martínez, Xochitl (2019-07-25). "Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes". Trends in Biochemical Sciences. 44 (12): 1009–1021. doi:10.1016/j.tibs.2019.07.001. ISSN 0968-0004. PMID 31353284.
- ↑ Kozak, Marilyn (1984-01-25). "Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs". Nucleic Acids Research. 12 (2): 857–872. doi:10.1093/nar/12.2.857. ISSN 0305-1048. PMID 6694911.
- ↑ 7.0 7.1 7.2 Kozak Marilyn (October 1987). "An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs". Nucleic Acids Research. 15 (20): 8125–8148. doi:10.1093/nar/15.20.8125. PMID 3313277.
- ↑ Takuya Matsumoto, Saemi Kitajima, Chisato Yamamoto, Mitsuru Aoyagi, Yoshiharu Mitoma, Hiroyuki Harada and Yuji Nagashima (9 August 2020). "Cloning and tissue distribution of the ATP-binding cassette subfamily G member 2 gene in the marine pufferfish Takifugu rubripes" (PDF). Fisheries Science. 86: 873–887. doi:10.1007/s12562-020-01451-z. Retrieved 27 September 2020.
External links
- GenomeNet KEGG database
- Home - Gene - NCBI
- NCBI All Databases Search
- NCBI Site Search
- PubChem Public Chemical Database