C7orf38 is a gene located on chromosome 7 in the human genome.[1] The gene is expressed in nearly all tissue types at very low levels.[2] Evolutionarily, it can be found throughout the kingdom animalia. While the function of the protein is not fully understood by the scientific community, bioinformatic tools have shown that the protein bares much similarity to zinc finger or transposase proteins. Many of its orthologs, paralogs, and neighboring genes have been shown to possess zinc finger domains.[3] The protein contains a hAT dimerization domain nears its C-terminus.[4] This domain is highly conserved in transposaseenzymes.[5]
C7orf38 is located on chromosome 7 at q22.1. Its genomic sequence contains 5,612 bp. The predominant transcript contains two exons and is 2,507 bp in length.[6] The translated protein contains 573 amino acids.[7]
Protein composition
The 573 amino acid protein has a molecular weight of 66,280.05.[8] The isoelectric point was found to occur at a pH of 5.775, about 1.6 pH lower than that of the average human pH.[9] Two deviations from prototypical human proteins are evident. The protein contains a less than expected number of glycine residues, and is rich in leucine residues.[10] There are not sections of strong hydrophobicity or hydrophilicity. Thus, it is not predicted to be a transmembrane protein.
Gene neighborhood
The four genes in closest proximity to C7orf38 on chromosome 7 exhibit similar function, many of which are transcription factors.[11]
Name
Orientation
Function
ZNF789
Start: 98,908,451 bp from pter
End: 98,923,153 bp from pter
Size: 14,703 bases
Orientation: plus strand
The gene encodes the zinc finger protein 789. Functionally, the gene has been proposed to participate in regulation of transcription. It is expected to use zinc ion binding.
ZNF394
Start: 98,928,790 bp from pter
End: 98,935,813 bp from pter
Size: 7,024 bases
Orientation : minus strand
The gene encodes zinc finger protein 394. Over expression over ZNF394 inhibits the transchription of c-jun and Ap-1. Suggesting that it is a transcriptional repressor.
ZKSCAN5
Start: 98,940,209 bp from pter
End: 98,969,381 bp from pter
Size: 29,173 bases
Orientation: plus strand
The gene encodes zinc finger with KRAB and SCAN domains 5. This gene encodes a zinc finger protein of the Kruppel family. The protein contains a SCAN box and a KRAB A domain.
ZNF655
Start: 98,993,981 bp from pter
End: 99,012,012 bp from pter
Size: 18,032 bases
Orientation: plus strand
The gene encodes zinc finger protein 655. Numerous alternatively spliced transcripts encoding distinct isoforms have been discovered.
Mihuya
Start: 99,149,738 bp from pter
End: 99,149,626 bp from pter
Size: 112 bases
Orientation: plus strand
The Mihuya gene does not encode a large or known functional protein. The antisense relationship to C7orf38 raises the possibility for regulation of expression.
general transcription factor II-I repeat domain-containing protein 2B
NP_001003795.1
949
25
46
GTF2I repeat domain containing 2
NP_775808.2
949
24
45
EPM2A interacting protein 1
NP_055620.1
607
22
42
Orthologs
Orthologs to C7orf38 can be traced back evolutionarily through plants.[3] The following is not an extensive list of orthologs. It is intended to provide an evolutionary overview of the conservation of C7orf38.
Common name
Genus & species
NCBI accession number
Length (AA)
% Identity to C7orf38
% Similarity to C7orf38
Chimp
Pan troglodytes
XP_001139775.1
573
99
99
Macaque monkey
Macaca fascicularis
BAE01234.1
573
96
98
Horse
Equus caballus
XP_001915370.1
573
81
84
Pig
Sus scrofa
XP_001929194
1323
39
61
Cow
Bos taurus
XP_875656.2
1320
38
61
Mouse
Mus musculus
CAM15594.1
1157
37
60
Domestic dog
Canis lupus familiaris
ABF22701.1
609
37
60
Rat
Rattus rattus
NP_001102151.1
1249
37
59
Opossum
Monodelphis domestica
XP_001372983.1
608
37
59
Chicken
Gallus gallus
XP_424913.2
641
37
58
Frog
Xenopus (Silurana) tropicalis
ABF20551.1
656
37
56
Zebra fish
Danio rerio
XP_001340213.1
609
37
56
Pea aphid
Acyrthosiphon pisum
XP_001943527.1
659
36
54
Beatle
Tribolium castaneum
ABF20545.1
599
35
55
Sea squirt
Ciona intestinalis
XP_002119512.1
524
34
52
Hydra
Hydra magnipapillata
XP_002165429.1
572
29
52
Puffer fish
Tetraodon nigroviridis
CAF95678.1
539
28
47
Mosquito
Anopheles gambiae
XP_558399.5
591
28
47
Sea urchin
Strongylocentrotus purpuratus
ABF20546.1
625
27
47
Grass plant
Sorghum bicolor
XP_002439156.1
524
25
40
Broad leaf tree
Populus trichocarpa
XP_002319808.1
788
21
39
Structure
Protein
CBLast was used to determine a structurally related protein with experimentally determined structure. The protein Hermes DNA transposase, of the Hermes DBD superfamily, was shown to be structurally similar (Evalue: 1E-6).[12]
The hAT dimerization domain is found at the C-terminus of transposase elements belonging to the Activator superfamily (hAT element superfamily). The isolated dimerization domain forms extremely stable dimers in vitro.[5]
mRNA
The MFOLD program available at Rensselaer BioInformatics Server was used to predict secondary structure of the mature mRNA sequence.[13]
The primary sequence of the mRNA secondary structures displayed high levels of conservation in orthologs, suggesting structural importance.
Tissue distribution
The gene appears to be expressed in most tissue types.[14] Very low levels of expression were observed through est profiles, and no deviation was observed between health or developmental states.