Biochimica et Biophysica Acta, 1090 (1991) 261-264
© 1991 Elsevier Science Publishers B.V. All rights reserved 0167-4781/91/$03.50 ADONIS 016747819100235L
BBAEXP 90273
S h o r t S e q u e n c e - P a p e r
261
y-Crystallin genes in carp: cloning and characterization
T s c h i n i n g C h a n g ~, C h i n g - L u n g L i n 2, P e n g - H u i C h e n 2 a n d W e n - C h a n g C h a n g 1,2
I Institute of Biological Chemistry, Academia Sinica, Taipei (Taiwan, R.O.C.) and 2 Institute of Biochemical Sciences, National TaiwanUniversity, Taipei (Taiwan, R.O.C.) (Received 6 August 1991)
Key words: y-Crystallin; Gene structure; Amino acid sequence; (C. carpio)
The carp y-crystallin gene family was found to be composed of at least three members: y m l , ym2 and 3"m3. The
encoded products are very similar to other known y-crystallins but with their own peculiarities: (1) they all have a
high methionine content: 12.4%, 14% and 8.4% in y m l , ym2 and ym3, respectively; and (2) the amino acid
sequences are aberrant in the region before connecting peptides and its corresponding region in motif 4. Their
protein structures might remain the same as those of other y-crystollins since they retain all the conserved amino
acid residues essential for maintaining the loops in the protein structures.
The crystallins account for over 90% of the soluble
proteins of the eye lens [1,2]. Immunologically, they
can be distinguished into four major groups: a, //, 3'
and 8 with 8 restricted to birds and reptiles [3-5]. The
short-range spatial order of crystallins was suggested to
be important for lens transparency [6] and the 3'-crys-
tallins are of additional interest because of their impli-
cation in cataractogenesis [7]. Investigaters studying
the lenses of human [8], rat [9] and mice [10] have
concluded that the 3'-crystallins are encoded .by a
multigene family. Those genes are highly conserved in
structure with three exons and two introns in each
gene. The 5' exon is small and encodes three amino
acids. The other two larger exons correspond to two
structurally similar 'Greek key'-like domains of the
protein. Recent studies on the promoter functions of
these genes cor~firmed that they are expressed in a
tissue-specific manner [11,12].
In fish, 3'-crystallins exhibit the conserved structural
features of the known 3'-crystallins but with their own
peculiarities: high methionine content [13] and se-
quence aberrancy in motif 2 and its corresponding
region in motif 4 of the protein (see text). Since the
fish lives in a very different environment compared to
The sequence data in this paper have been submitted to the EMBL Data Library under the accession numbers X55945 (for 3,ml) and X55946 (for 3,m3).
Correspondence: W.C. Chang, Institute of Biological Chemistry, Academia Sinica, P.O. Box 23-106, Taipei, Taiwan, R.O,C.
the terrestrial animals and exhibits remarkable species
diversity [14], it is interesting to study lens crystallins of
fish concerning the gene structure, regulation and
molecular evolution. In this report we will present two
nucleotide sequences and structural features of carp
3'-crystallin genes.
Five positive clones were obtained from a genomic
library of 2- 10 s plaques with 3'ml cDNA as probe.
After restriction enzyme mapping, four of them were
found to be identical and designated as pEM64. The
remaining positive clone was named pEM101. The
complete nucleotide sequences are shown in Fig. 1.
Both pEM64 and pEM101 contain a 3'-crystallin gene
with similar features: three exons and two introns on
the basis of the known carp 3'-crystallin cDNA se-
quences (Fig. 1). The first exon of both genes contains
a 9 bp coding region with translation start codon. Exon
2 would encode motifs 1 and 2 of both gene products,
while exon 3 corresponds to motifs 3 and 4 plus the 3'
untranslated region. All the introns are small. The first
intron is 162 bp and 275 bp long for 3'ml and 3'm3,
respectively, while the second contains 138-139 bp,
much smaller than those of mammalian 3'-crystallins,
most of which are larger than I kb in size [8,16]. All the
introns start with GT and end with AG, in agreement
with the G T / A G splice site rule. The 5' and 3' un-
translated regions of both genes are short (Fig. 1). The
polyadenylation signal, AATAAA, was found in the 3'
untranslated region of both genes. The termination
codons, T A A (for both 3'ml and ym3), were not far
from the polyadenylation signals.
262 yml y m 3 yml 7 m 3 T m 2 yml Y m 3 Tml Tin3 ¥ml Ym3 Tin1 ym3 "Ym2
Tin1
ym2
ym3
7ml
"Ym3 Tm2yml
Tm3 Tm2 7mlYm3
Tm2 Tml 7 m 3 "(mlTm3
Tml
"Ym3
~fm2
Tin1
7 m 3 7 m 2 7ml Tm3 "Tm2 yml 7m3 ~m2 'Yml Ym3 'Ym2 Yml 7m37ml
7m3 1 C C A T C A A A G T A C A G C T G G T A C A G T G A G C T G A A G C A C T G A G A T A A A C A A C A C T C T A C C A T C ~ T G G A C A G A A T C C C A C G A A G A C T A G C A A A Cr
T G G G C A A ~ g t a a t t t g t t t g a a a t a a g t a t g c t t a a t t t t t t a a g c t t t a g t t t c a t a t~
t a t g t a t a g g g t g g c c t t a a t c c a g c a g a g t a g a g c t g c c a a a a a c a g c a t g c a t c g a g t t t t t a t t t a a a g a a t a a a g a c g a a c a g t a t g c a a t t c a g a t c t a a a a g g c a a a t c c t c c a t a a a t t t a a t a a a t g c a a a g a t g g c t t t t t t t t t t t t g t g g a a a t g g t t a a a t t a a t t a t a a a t t t t t t g a a a a c a c t t a a a t t t t c a a t c c c a t c a a c a g . . . g g t a a t t a t a t t c a a c a t t t g c t a c a t a a a c a a g c t c t a a t a c a a a t a c c t g t t g t g t a a ... t g c a a g c c a a a a a a g c c a g t t a c t t c a a a t g t t a a a g g t t a c t g t t t a c a g a g a g c c a c t . . . ~ A T C A T C T T C T A C G A C q c a t a t t t t a a t a t t a a c a t t g t a t t a a a a a a t t t t t c t t t g c a ~ G | I G C T T G J A C A G G A A C T T C C A G G G C C G C A G C T A T G A C g G C A T G A G C G A C T G C T C T G A T A T C T C C T C T ~ T T C G G C T G T A C J T T C T T T G C T C |I
A C C T G A G C C G C G T T G G T T C A A T C A G G G T G G A G A G T G G T T G T T T C A T G G T C T A T G A G C G C ~ C T G C C A T G A C C C T G C G T A T G C A C T T G A C C ¢ A ¢ G ~ A G C T A A ~ A C A G C T A C A T G G G G A A C C A G T T C T T C C T G A G G A G G G G C G A G T A C C A T G A T A T G C A G C G C ~ T A A T A T G C G T A C A T | A A T A T T T A T G C . T A C A T T C T J T G A T G A G C A T G G G C A T G A T T T T T G A C A C T A T C A G A T C C T G C C G C A T G A T T C C T C C ~ g t a c . . . A G G A G --- G T GA~gtaa
T T G A A . . . A T G C G C T A C A T ~ g g c t c t c a c t t t g t t t a c a g t a t t g t g t a g a t t t a g t a a t g c a c a c c t t t t a a t g a t a t t c a a a a a a t a a a a c a a t a a t a a t a t t a t c a t t t t a a t c a t t a a c a a t a a t a g t t c a a t t g c a t a c t g t a c t a t g c t a t a c t a t a g a a a c t g c t a a a t a a t g a t g a t a t t c t a a t c a a t c t t t a a a t t g t t c a g a c t g t a t t g a a t t a g a a a t g c a t t g a a t c t g a c t g a g t c a a t a c a t t ttctttgttttacagJTACAGGGGTTCCTACAGAATGAGGATCTACGAGAGGGACAACTT~
c t t t a c a t c t t a c a ~ A C | C A GI
G G A G G A C A G A T G C A C G A G G T G A T G G A T G A C T G T G A C A A C A T C A T G G A A C G T T A C C G T A T g T T T T c c A T T C T G c T c / A T G c T ^ A G c c G T c c /I
T C T G A C T G G C A G T C T T G T C A T G T G A T G G A C G G C C A C T G G C T C T T C T A T G A G C A G C C A C A q c c c A G c I c c c c c A S C I T A C A G A G G C A G A A T G T G G T A C T T C K G G C C T G G A G A G T A C A G G A G C T T C A G A G A T A T G G G ~ C G T A T A C G ] , G A c A T A C A G C A A C A T G A G A T T C A T G A G C A T G A G G C G T A T C A C T G A T A T C T G T - - . - T A i ~ A C A G C T G C A GCTCTG T AA A TG CT G - - - p, TTCAC G A ~ . . . T G C T C G T A C q A G A A T A T A G A A G G A A A T A A A ~ A G T T A T T T T C A C A A T T A g c t g t g g t g t c t g t g t t a t t g g A G T C T G A A A C T A T A A A T G A T A A C A T A A T C A T A A A C A A T A A A T T T C T C A C C A T G C A T T T T T t t a t t a t t t t a t t a t a t t a c a t t t c g t a g a t t a t t a a a t t t c t g t t g t c t t g g t c t g t t t t c c c c g g c c t t t t 60 120 180 240 300 360 420 480 5 4 0 6 0 0 6 6 0 720 7 8 0 8 4 0 900 9 6 0 1 0 2 0 1 0 8 0Carp 7ml Carp ym3 Carp ym2 Calf yII o • • • 2 0 • G K I I F Y E D R N F Q G R S Y D C M S D C S D I S S Y L S R V G S I R V E S G K I I F Y E D R N F Q G R S Y E C S S D C S D M S T Y L S R C H S C R V E S K V I F Y E D R N F Q G R S Y D C M S D C A D F S S Y M S R C H S C R V S H G K I T F Y E D R G F Q G H C Y E C S S D C P N L Q P Y F S R C N S I R V D S MOTIF t
263
100 120 YPMRIYERDNFGGQMHEVMD DCDNIMERYRMS DWQSCH VMD YPMRTYERENFGGQMYDLTD DCDSFVDRYRMS DCQSCH VMD YPMRIYERENFMGQMYEMAD DCDSIMDRYRMP HCQSCH VMD FRMRIYERDDFRGQMSEITD DCPSLQDRFHLT EVHSLN VLEt
t
t
40 60 80
M O T I F 3
GCFMVYERNSYMGNQFFLRRGEYHDMQR ~ T I R S C R M I P PYRGS
GCFVVYDVPNYMGMQFFMRRGEYADYMR I M GM S ~ T R S C R M V P QYRGP MOTIF 2 GCWMMYDQPNYMGNQYFFRRGEYADYMS [ MF GM S~CIRSCRMIP MHRGS
GCWMLYERPNYQGHQYFLRRGDYDDYQQ WM GF NDSTRSCRLIP QHTGT
1
t I
t
1 40 1 6 0
GHWLFYEQPHYRGRbIWYFRPGEYRSFRD
~ - ' ~ F M S I ~ R R ITDI C
GHWDMYEQPHYRGRTVYFRPGEYRSFRD I M GYSTKFSSVRRTMDLC MOTIF 4 GHWLMYEQPHYRGRMWYFRPGEYRSFSN [M GG ~KFMSMRRIMDSWY
GSWVLYEMPSYRGRQYLLRPGEYRRYLD W GAMNAKVGSLRRVMDFY
t
Fig. 2. Comparison of deduced amino acid sequences of ~,ml, 7m2 and ym3 with calf y-ll protein sequence. Conserved amino acids are labelled with filled circles. The sequences are so aligned that the topologically equivalent amino acid residues can be easily compared. The sequence-aberrant regions are boxed. The conserved methionine i'esidues in all carp y-crystailins are indicated by arrows. The numbering for calf
y-I| sequence is used as reference.
The amino acid sequences derived from the exons of
both genes show that they are closely related (Fig. 2).
Comparison of pEM64 coding region with that of 7ml
cDNA [!3] reveals only four base changes between
them and rive nucleotide differences in the 5' and 3'
noncoding regions (data not shown). Therefore, pEM64
must contain the genomic equivalent of 7ml crystallin.
Hereafter the symbol 7ml will be used to denote this
particular 7-crystallin and its genomic equivalents. The
coding region of pEMI01 is quite different from that of
7ml and 7m2 [13] and it must encode a new species of
carp 7-crystallin which will be symbolized as ~/m3 to
indicate that it is the third y-crystallin with high me-
thionine content ever found in carp. Compared with
calf 7-11, the amino acid sequences of carp 7-crystai-
lins are aberrant in the region from residues 68 to 72
before the connecting peptide and their counterpart in
motif 4. In this region of motif 2, 7ml has an insertion
of four amino acids, 7m2 has a deletion of 2 amino
acids in motif 4 and 7m3 has a deletion of one amino
acid in motif 2 (Fig, 2). Therefore, these three proteins
might have different fine structures. However, since
those amino acids essential for loop-maintaining, i.e.,
Tyr-6, Glu-7, Gly-1 3, Ser-24 and their equivalents in
Fig. 1. Comparison of carp "~,-crystallin genes. The carp (Cyprinus carl~o) egg genomic library was constructed as described by Chiou et al. [15]. Positive clones were mapped, subeloned and reconfirmed by repeated hybridization with the 7ml eDNA as probe. The nucleotide sequences were determined. Exons are shown by capital letters and introns in lower-case. The numbering of pEM64 (Tml) is shown at right margin and begins with the transcription start site ( + 1). Sequence of pEMI01 (ym3) and 7m2 eDNA are aligned with respect to the coding regions which are boxed. Only the nucleotides different from that of pEM64 in the coding region are shown below it. Dashed lines are used to indicate
264
TABLE I
Percent homology of amino acid sequence between 7ml, ~m2 and ynO yml/3,m2 y m l / y m 3 ym2/ym3 Motif I 72 82 74 Motif 2 55 62 64 Motif 3 78 76 78 Motif 4 77 81 68
the other
motifs [17] are conserved,
the protein
skele-
ton should still be similar
to those of
mammalian
3'-crystallins. The sequence comparison motif-by-motif
between yml, 3'm2 [13] and 3'm3 reveals
the lowest
homology in motif
2 (Table I) instead
of motif 3 which
is the most diverse motif found in the mammalian
3'-crystallin. The methionine content is high in 3'm3,
e'GID
Fig. 3. The determination o f the transcription start sites of yml and ym3 by primer extension test. Two antisense primers complementary to a stretch o~ 17 bp just upstream from the first start codon of both yml and ym3 [~-,nomic sequences were used: 5'-GATGGTA- GAGTGTTGTr-3' for yml and 5'-GTIWGCTAG'IuI-ICGTG-Y for ~,m3. Lanes A, (3, C and T are sequence bdders. The nucleotide sequences denoted fec lanes -/ml and ~/m3 are the sequences amend the tranKril~Cioa start site which is indicated with an asterisk.
8.4%, although lower than that of 3,ml and 3,m2. Most
of the methionine residues of these three proteins are
relatively conserved (Fig. 2). It is not clear whether
these methionine residues are functionally important
for carp y-crystallins. A detailed X-ray diffraction
analysis might offer more insight into the possible roles
in protein structure and the interactions between pro-
teins and their environment.
In the 5' flanking regions, a TATA box was found
for yml, but two tandem repeats of TATA box were
found for ym3 (data not shown). The presence of such
a 'double' TATA box in a promoter region is rare and
its poss~le functional role should await further analy-
sis. The cap site of the mRNA was determined by
primer extension method as shown in Fig. 3 and was
found to be a cytosine residue in both 3,ml and ym3
genes. The primer extension experiments also indicate
that both 3'ml and 3"m3 genes are expressed in the fish
eye lens.
Although the gene structures of both 3,ml and 3'm3
are very similar to that of known mammalian 3' crys-
tallins, the existence of aberrent amino acid sequences
and high methione content in the proteins may hint
that the fish is one of the radiation points in the
phylogenetic tree of 3,-crystallin gene evolution. The
exact size of this fish gone family and their detailed
structural analyses will be needed to clarify this point.
References
1 Piatigorsky, J. (1984) Cell 38, 620-621. 2 Bloemendal, H. (1977) Science 197, 127-138.
3 McAvoy, J.W. (1978) J. Embryol. Exp. Morph. 45, 271-281. 4 Papaconstantinou, J. (1976) Science 156, 338-347. 5 McAvoy, J.W. (1978) J. Embryol. Exp. Morph. 44, 149-165. 6 Delaye, M. and Tardieu, A. (1983) Nature 302, 415-417. 7 Harding, J.J. (1981) Molecular and Cellular Biology of the Eye
lens, John Wiley and Sons, I New York.
8 Meakin, S.O., Breitman, M.L. and Tsui, L-C. (1985) MoL Cell. Biol. 5, 1408-1414.
9 Den Dunnen, J.T., Moorman, RJ.M., l.ubsen, N.H. and Schoen- makers, J.G.G. (1986) J. Mo. Biol. 189, 37-46.
10 Lok, S., Tsui, L-C., Shinohara, T., Piatigorsky, J., Gold, R., and Breitman, M.L. (1984) Nucleic Acids Res. 12, 4517-4529. 11 Lok, S., Breitman, M.L., Chepelinsky, A. B., Piatigorsky, J., Gild,
RJ.M. and Tsui, L-C. (1985) Mol. Cell. Biol. 5, 2221-2230. 12 Wistow, G3. and Piatigorsky, J. (1988) Annu. Rev. Biochem. 57,
479-504.
13 Chang, T., Jian8, Y-J., Chiou, S-H. and Chang, W-C. (1988) Biochim. Biopbys. Acta 951, 226-229.
14 Powers, D.A. (1989) Science 246, 352-358.
15 Chiou, C.S., Chen, H.-T. and Chang, W-C. (1990) Biochim. Biophys. Acta 1087, 91-94.
16 Den Dunnen, J.T., Moorman, RJ.M., Lubsen, N.H. and Schuen- makers, J.G.G. (1986) J. Mol. Biol. 189, 37-46.
17 Wistow, G., TurneH, B., Summers, L., Slingsby, C., Moss, D., Miller, L., IAndley, P. and BlundeH, T. (1983) J. Mol. Biol. 170, 175-202.