甲基化賴胺酸對α螺旋與β折板結構穩定度及對核糖核酸辨識與細胞穿透之影響

(1)

୯ҥᆵ᡼εᏢ౛ᏢଣϯᏢس ᅺγፕЎ

Department of Chemistry College of Science

National Taiwan University Master Thesis

Ҙ୷ϯᒘữለჹ͉ᖥ௽ᆶ͊ש่݈ᄬᛙۓࡋ Ϸჹਡᑗਡለᒣ᛽ᆶಒझऀ೸ϐቹៜ!

Effect of Lysine Methylation on α-Helix Propensity, β-Sheet Propensity, RNA Recognition, and Cell Penetration

ቅށဂ!

Mu-Chun Liu

ࡰᏤ௲௤;!ഋѳ!റγ!

Advisor: Richard P. Cheng, Ph.D.

ύ๮҇୯!103!ԃ!7!Д!

July, 2014

(2)

(3)

i

ᇞ ᇞᖴ!

२Ӄाߚதགᖴ! ഋѳԴৣԾεΟԿᅺΒаٰߏԃޑࡰᏤǴගٮΑࡐӭჴᡍ

ᔅշǵፕЎঅׯǵჹԾρ܌অಞޑሦୱԖ׳ӭޑΑှаϷჹ҂ٰޑགྷݤᆶᔅշǶ

ᅺγ੤ٿԃᗨฅӧೈқ፦ሦୱᗋόૈᆉ΢ࢂߚதᆒ೯Ǵՠ࿶җԴৣޑ໼៶࣬௤ࡕ

ΨᏢΑࡐӭ࣬ᜢޑޕ᛽аϷჴᡍБݤǴ׳ख़ाޑࢂ୻ᎦԾρפሡाၗૻޑૈΚǴ

ᕇ੻ؼӭǼ!

ќѦΨߚதଯᑫૈ୼ᇡ᛽ჴᡍ࠻΋ଆӕҒӅधޑუՔॺǺς࿶౥཰ޑਁᏌε

ᏢߏऐЈӦࡰᏤჴᡍБݤаϷှเ೚ӭፐ཰΢ޑୢᚒǴߚதё᎞Ǽذ൛ᏢۊӧӀ

᛼΢ޑεΚᔅշаϷёαޑѠࠄӜౢ೿ࢂኗቪ೭ጇፕЎόё܈લޑाનǹيࣁӕ

ᏆՠૈΚᆶεᏢߏ٠ӈণᇬӧჴᡍکፐ཰΢ޑൂБय़ᔅԆᡣךڙ੻όϿǹᆶႠഩ

کమ܃ϐ໔ޑᅥ٣ᄣፋᡣᅺγғఱύޑ೚ӭᓸΚ๤጗ΑόϿǹᅺ΋ޑᏢ׌ۂॺϓ

ൈǵܿ଻ǵդەǵܱ๮ᡣ೭ঁჴᡍ࠻ᕴࢂόલ៿኷ޑ਻ݗǶߚதགᖴჴᡍ࠻ޑ܌

ԖუՔӧ೭ٿԃϐ໔ޑӭБᔅԆᆶྣ៝Ǵߚதᄪ۩ૈᇡ᛽εৎǼനࡕाགᖴךޑ

Р҆ගٮдॺޑ΋Ϫᡣךૈడค౐௠ޑֹԋךޑᏢ཰Ƕ!

2014.07.01

(4)

ii

ύ

ύЎᄔा

ᙯ᝿ࡕঅႬӧೈқ፦Չࣁ΢ޑቹៜߚதख़ाǴԶᒘữለޑҘ୷ϯࢂځύ΋ᅿ

ߚததـޑጄٯǴҘ୷ϯёૈ཮೷ԋೈқ፦่ᄬ΢ޑᡂϯаϷфૈ΢ޑࢲϯ܈׭

ڋǶᒘữለҘ୷ϯӧғނᡏύԖΟᅿ׎ԄǺൂҘ୷ǵᚈҘ୷کΟҘ୷Ǵόӕำࡋ

ޑҘ୷ϯёૈჹೈқ፦ΨԖόӕቫભکਏ݀ޑቹៜǶӧ่ᄬБय़ǴΟᅿҘ୷ϯᒘ

ữለ೏࿼Εܭٿᅿ୷ҁΒભ่ᄬޑᴏ两ኳ݈ύٰࣴزځჹΒભ่ᄬᛙۓࡋޑቹ

ៜǺ͉.ᖥ௽ک͊.ש݈Ƕ༝ΒՅӀ᛼ሺҔٰෳໆᴏ两ޑᖥ௽ำࡋǴԶש݈ޑ่ᄬ

ၗૻ߾٬ҔΒᆢਡᅶӅਁӀ᛼ٰϩ݋Ƕ!

! ܭфૈБय़ǴҁፕЎ߾ࢂ௖૸ܭΓᜪխࣝલЮੰࢥޑགࢉၸำύߚதख़ाޑ

ፓ௓ೈқǺUbuǶԜೈқύӸӧ๱΋ࢤ൤֖҅ႝ಻ữ୷ለޑ୔ୱȐSLLSSRSSȑ٠Ъ

ځς೏ࣴزрёаଯᒧ᏷܄Ӧᆶ੝ۓ SOB ่ᄬ่ӝᙖаε൯ࡋගଯੰࢥೈқޑ߄

౜ໆǶ೭ࢤߚதख़ाޑׇӈӕਔΨ፟ϒ Ubu ೈқऀ೸ಒझጢޑૈΚٰΕߟځдಒ

झǶԶᒘữለޑҘ୷ϯ೏᛾ჴᆶԜೈқޑࢲϯک׭ڋ৲৲࣬ᜢǶӢԜǴךॺӝԋ

೭ࢤᴏ两٠ӧ 61 ک 62 ဦϩձ࿼ΕΟᅿҘ୷ϯᒘữለٰࣴزځჹᒣᇡ੝ۓ SOB а

Ϸऀ೸ಒझૈΚޑቹៜǶᆶ SOB ϐ໔ޑှᚆதኧҗጤᡏႝݚीᆉǴԶऀ೸ಒझޑ

ૈΚ߾٬ҔࢬԄಒझሺٰ຾Չෳ၂Ƕ!

!

(5)

iii

Abstract

Post-translational modification dominates many protein behaviors. Methylation of

lysine impacts both protein function and structure. There are three variations of

methylated lysines that were identified in proteins. It is logical to assume that different

numbers of methyl groups attached onto the side chain amino group should have

different degrees of effects on proteins. In this study, various types of methylated lysines

are placed into two basic secondary structures: the α-helix and the simplestE-sheet

model “β-hairpin”, to investigate the effect of lysine methylation on structural stability.

The fraction helix of the helical peptides was determined by circular dichorism

spectroscopy. The structural information of the hairpin peptides was analyzed by 2D

NMR.

Lysine methylation also plays an important part in many biological processes. The

regulatory Tat protein contains a basic region (RKKRRQRRR, residue 49 to 57) which

specifically binds to the trans-activating responsive (TAR) element to modulate HIV-1

RNA transcription. The binding between HIV-1 Tat protein and TAR RNA is essential

for HIV-1 virus to efficiency produce full-length viral RNA. To study the effect of

lysine methylation on RNA recognition and cellular uptake, two lysine residues Lys₅₀

and Lys₅₁ were replaced with monothylated, dimethylated, and trimethylated lysines.

(6)

iv

The dissociation constant for the Tat derived peptide-TAR RNA complexes was

determined by gel shift assay. The cellular uptake efficiency of Tat derived peptide into

Jurkat cell was assessed by flow cytometry.

(7)

v

ᇞᖴ ... i

ύЎᄔा ...ii

Abstract ...iii

Table of Contents... v

List of Charts ...viii

List of Figures ... ix

List of Tables ...xiii

List of Schemes ... xv

Abbreviation ... xvi

Chapter 1. Introduction ... 1

1-1 Central Dogma of Molecular Biology... 1

1-2 Proteins ... 2

1-3 Protein Folding and Function... 2

1-4 Hierarchy of Protein Structure ... 3

1-5 Driving Force of Protein Folding ... 9

1-6 RNA Recognition ... 11

1-7 Post-Translational Modifications (PTMs) ... 13

1-8 Thesis Overview ... 15

1-9 References ... 16

Chapter 2. ... 23

2-1 Introduction ... 23

α-Helix ... 23

Lifson-Roig Theory ... 24

β-Sheet ... 26

Lysine Methylation ... 28

2-2 Results and Discussion... 29

Peptide Design and Synthesis ... 29

Circular Dichorism Spectroscopy ... 33

(8)

vi

Helix Formation Parameters ... 36

Hairpin Structure Characterization ... 37

2-3 Conclusions ... 48

2-4 Future Aspects ... 50

2-5 Acknowledgement... 53

2-6 Experimental Section ... 53

General Materials and Methods ... 53

Peptide Synthesis ... 54

Ultraviolet-Visible (UV-vis) Spectroscopy ... 69

Helix Propensity and Capping Parameter Derivation ... 70

Hairpin Peptide Structure Analysis by 2D-NMR ... 71

Chapter 3. ... 85

3-1 Introduction ... 85

Ribonucleic acid (RNA) ... 85

Human Immunodeficiency Virus (HIV) ... 87

Trans-Activation Response Element (TAR) RNA ... 89

Trans-Activator of Transcription (Tat) Protein ... 90

Tat-Mediated Transcription ... 91

Lysine Methylation in Tat Protein ... 93

Cell Penetration ... 93

3-2 Results and Discussion... 95

Peptide Design and Synthesis ... 95

Reductive Methylation on Lysine ... 97

Electrophoretic Mobility Shift Assay in the Presence of Bulk E. coli tRNA ... 100

Cellular Uptake Assay ... 107

3-3 Conclusion...111

3-4 Acknowledgement... 112

3-5 Experimental Section ... 112

General Materials and Methods ... 112

(9)

vii

Peptide Synthesis ... 114

Ultraviolet-Visible (UV-vis) Spectroscopy ... 122

Electrophoretic Mobility Shift Assay ... 124

Circular Dichroism Spectroscopy ... 126

Cells and Cell Cultures ... 126

Cellular uptake Assay... 127

Appendix. ... 136

(10)

viii

List of Charts

Chart 2-1. Chemical Structure, Full Name and 3-Letter Code of Methylated Lysines--- 29 Chart 3-1. Chemical Structure, Full Name and 3-Letter Code of Methylated

Lysines--- 96 Chart 3-2. The Chemical Structures of Commercially Available Methylated

Lysines--- 98

(11)

ix

List of Figures

Figure 1-1. The central dogma of molecular biology is the genetic information

flowing from DNA through RNA protein--- 1

Figure 1-2. The peptide bond and the dihedral angles I ψ in the backbone.--- 5

Figure 1-3. Ramachandran plot.--- 5

Figure 1-4. The structure of an α-helix.--- 6

Figure 1-5. The structure of a β-hairpin.--- 7

Figure 1-6. Four hierarchical levels of protein structure.--- 9

Figure 2-1. Chemical structure of the experimental (A), the fully unfolded (B), and the fully folded (C) hairpin peptides. Xaa was replaced by Mmk, Dmk, or Tmk.--- 32

Figure 2-2. Circular dichorism spectra of the peptide at pH7 (273 K) in 1 mM phosphate, borate, and citrate buffer with 1 M NaCl. (A) KXaa9 peptides, (B) KXaa14 peptides, (C) NCapXaa peptides, (D) CCapXaa peptides,--- 35

Figure 2-3. The Hα chemical shift deviation for peptides HPTMmkAla (A), HPTDmkAla (B), and HPTTmkAla (C). Reference fully unfolded peptides are HPTUMmkAla, HPTUDmkAa and HPTUTmkAla, respectively.--- 38

Figure 2-4. The Hα chemical shift deviation for peptides HPTFMmkAla (A), HPTFDmkAla (B), and HPTFTmkAla (C). Reference fully unfolded peptides are HPTUMmkAla, HPTUDmkAa and HPTUTmkAla, respectively.--- 39

Figure 2-5. The NOEs (A) and Wüthrich diagram (B) of peptide HPTMmkAla. The thickness of the bands reflects the NOE intensity.--- 42

Figure 2-6. The NOEs (A) and Wüthrich diagram (B) of peptide HPTUMmkAla. The thickness of the bands reflects the NOE intensity.--- 42

Figure 2-7. The NOEs (A) and Wüthrich diagram (B) of peptide HPTFMmkAla. The thickness of the bands reflects the NOE intensity.--- 43 Figure 2-8. The NOEs (A) and Wüthrich diagram (B) of peptide HPTDmkAla.

The thickness of the bands reflects the NOE

(12)

x

intensity.--- 43

Figure 2-9. The NOEs (A) and Wüthrich diagram (B) of peptide HPTUDmkAla. The thickness of the bands reflects the NOE intensity.--- 44

Figure 2-10. The NOEs (A) and Wüthrich diagram (B) of peptide HPTFDmkAla. The thickness of the bands reflects the NOE intensity.--- 44

Figure 2-11. The NOEs (A) and Wüthrich diagram (B) of peptide HPTTmkAla. The thickness of the bands reflects the NOE intensity.--- 45

Figure 2-12. The NOEs (A) and Wüthrich diagram (B) of peptide HPTUTmkAla. The thickness of the bands reflects the NOE intensity.--- 45

Figure 2-13. The NOEs (A) and Wüthrich diagram (B) of peptide HPTFTmkAla. The thickness of the bands reflects the NOE intensity.--- 46

Figure 2-14. The folding percentage of each residue for peptide HPTMmkAla (A), HPTDmkAla (B), and HPTTmkAla (C).--- 47

Figure 2-15. Fraction folded for the HPTXaaAla peptides (Xaa = Mmk, Dmk, Tmk).--- 48

Figure 3-1. The chemical structure of four nucleobases for RNA.--- 85

Figure 3-2. The basic constitution of a single-stranded RNA.--- 86

Figure 3-3. Various RNA secondary structures: helix (A), stem-loop (B), and bulge loop (C).--- 87

Figure 3-4. The landmark of HIV-1 genome consists of nine essential genes. The trans-activation response element (TAR, the fixed box on bottom left) located at the viral 5’ LTR promoter and the trans-activator of transcription (Tat) protein displayed in the right hand side.--- 89

Figure 3-5. The sequence and secondary structure of HIV-1 from +17 to +45. This region, TAR RNA, contains a bulge and a loop structures, +23 to +25 and +30 to +35, respectively.--- 90

Figure 3-6. A schematic illustration of the Tat protein.--- 91

Figure 3-7. Trans-activated transcription of HIV-1 via Tat-TAR binding.--- 93

Figure 3-8. The classification of cell-penetrating peptides.--- 94

Figure 3-9. The chemical structure of 6-carboxy-fluorescein.--- 97 Figure 3-10. The analytical RP-HPLC chromatogram of peptide Fl-Dmk51-Tat

synthesized using commercially available Dmk (A), and synthesized

(13)

xi

by reductive methylation (B).--- 98 Figure 3-11. Images of typical gels of electrophoretic mobility shift assay (EMSA)

for Tat-derived peptides. All lanes contain 100 nM fluorescein-labeled HIV-1 TAR RNA in the presence of 10 μg/mL bulk E. coli tRNA.--- 103 Figure 3-12. The global fitting results of Tat-derived peptides binding to TAR

RNA in the presence of 10 μg/mL bulk E. coli tRNA.--- 104 Figure 3-13. Apparent dissociation constants for Tat-derived peptide-TAR RNA

complexes as determined by EMSA in the presence of 10 μg/mL bulk E. coli tRNA. TAR RNA concentration was 100 nM.--- 105 Figure 3-14. CD spectra between 200 to 300 nm of the Tat-derived peptides. The

spectra were acquired in 10 mM Tris buffer at pH 7 and 25 ^oC.

Peptide concentration was 50 PM.--- 107 Figure 3-15. The mean fluorescence intensity of Jurkat cells treated with 7 uM

(A) and 120 uM (B) Tat-derived peptides in cellular uptake assays.---- 108 Figure A-1. Flow cytometry results showing the side scattered light plotted

against the forward scattered light for live control cells, dead control cells, and cells incubated with 7 μM Tat-derived peptides. The gate used to restrict the population of cells analyzed is shown and labeled as P1.--- 138 Figure A-2. Flow cytometry results showing the propidium iodide fluorescence

against the fluorescein fluorescence for live control cells, dead control cells, and cells incubated with 7 μM Tat-derived peptides. The gate used to restrict the fluorescence of cells analyzed is shown and labeled as P2.--- 139 Figure A-3. Flow cytometry results showing the fluorescein fluorescence for live

control cells, and cells incubated with 7 μM Tat-derived peptides for 15 minutes at 37 ^oC.--- 140 Figure A-4. Flow cytometry results showing the side scattered light plotted

against the forward scattered light for live control cells, dead control cells, and cells incubated with 120 μM Tat-derived peptides. The gate used to restrict the population of cells analyzed is shown and labeled as P1.--- 141 Figure A-5. Flow cytometry results showing the propidium iodide fluorescence

against the fluorescein fluorescence for live control cells, dead control cells, and cells incubated with 120 μM Tat-derived peptides. The gate used to restrict the fluorescence of cells analyzed is shown and

(14)

xii

labeled as P2.--- 142 Figure A-6. Flow cytometry results showing the fluorescein fluorescence for live

control cells, and cells incubated with 120 μM Tat-derived peptides for 15 minutes at 37 ^oC.--- 143 Figure A-7. The overlaid bright-foeld and fluorescence microscopy images of

Jurkat cells incubated with 7 μM Fl-Mmk50-Tat (A), Fl-Dmk50-Tat (B), Fl-Tmk50-Tat (C), Fl-Mmk51-Tat (D), Fl-Dmk51-Tat (E), and Fl-Tmk51-Tat (F) for 15 minutes at 37 ^oC in the presence of fetal bovine serum, washed and treated with trypsin at 37 ^oC for 5 minutes.--- 144 Figure A-8. The license agreement for Figure 3-4.--- 145 Figure A-9. The license agreement for Figure 3-7.--- 146

(15)

xiii

List of Tables

Table 2-1. Sequence of Ala-based Peptides for Determining the N-Cap Parameter, C-Cap Parameter, and Helix Propensity of Modified Lys

Analogs --- 30

Table 2-2. Sequences for the Hairpin Peptides HPTXaaAla, the Unfolded Reference Peptides HPTUXaaAla, and the Folded Reference Peptide HPTFXaaAla Containing Midified Lys Analogs--- 32

Table 2-3. The Purity and Weight of the Helical Peptides--- 33

Table 2-4. The Purity and Weight of the Hairpin Peptides--- 33

Table 2-5. Mean Residue Ellipticity at 222 nm and Fraction Helix (fhelix) of Helical Peptides Containing Methylated Lys--- 36

Table 2-6. Statistical Mechanical Helix Formation Parameters for Modified Lys Analogs Derived from Experimentally Measured Fraction Helix Based on Modiﬁed Lifson–Roig Theory--- 37

Table 2-7. The³JNHαCoupling Constant Values (Hz) of Peptides HPTFXaaAla-- 40

Table 2-8. The³J_NH_αCoupling Constant Values (Hz) of Peptides HPTXaaAla --- 40

Table 2-9. The³JNHαCoupling Constant Values (Hz) of Peptide HPTUXaaAla--- 40

Table 2-10. Fraction Folded (%) and ΔGfold(kcal/mol) of the Peptide HPTXaaAla 48 Table 2-11. Sequences for the Future Model Peptides for Investigating Helix and Sheet Stability--- 52

Table 2-12. The¹H Chemical Shift Assignments for Peptide HPTMmkAla--- 72

Table 2-13. The¹H Chemical Shift Assignments for Peptide HPTDmkAla--- 73

Table 2-14. The¹H Chemical Shift Assignments for Peptide HPTTmkAla--- 74

Table 2-15. The¹H Chemical Shift Assignments for Peptide HPTUMmkAla--- 75

Table 2-16. The¹H Chemical Shift Assignments for Peptide HPTUDmkAla--- 76

Table 2-17. The¹H Chemical Shift Assignments for Peptide HPTUTmkAla--- 77

Table 2-18. The¹H Chemical Shift Assignments for Peptide HPTFMmkAla--- 78

Table 2-19. The¹H Chemical Shift Assignments for Peptide HPTFDmkAla--- 79

Table 2-20. The¹H Chemical Shift Assignments for Peptide HPTFTmkAla--- 80

Table 3-1. The Sequences of Tat-Derived Peptides Capped with an Acetyl Group--- 96

Table 3-2. The Sequences of Tat-Derived Peptides Capped with 6-Carboxy-Fluorescein--- 97

Table 3-3. The Purity and Weight of the Tat-Derived Peptides--- 97 Table 3-4. The Apparent Dissociation Constants for Tat-Derived Peptides-TAR

(16)

xiv

RNA Complexes in the Presence of 10 μg/mL bulk E. coli tRNA.

TAR RNA concentration was 100 nM--- 105 Table 3-5. The Z and P values for Comparing the Apparent Dissociation

Constants of Wild Type Peptide and Tat-Derived Peptides in the Presence of 10 μg/mL bulk E. coli tRNA--- 106 Table 3-6. Cellular Uptake of Tat-Derived Peptides Treated into Jurkat Cells in

PBS. Mean fluorescence intensity for each peptide with 7 μM and 120 μM--- 108 Table 3-7. The Z and P Value of the Mean Fluorescence Intensity at 7 μM for All

Peptides of Cellular Uptake Assays--- 110 Table 3-8. The Z and P Value of the Mean Fluorescence Intensity at 120 μM for

All Peptides of Cellular Uptake Assays--- 111 Table 3-9. Amount of Reagents for Preparation of the Separating Gel--- 126 Table 3-10. Amount of Reagents for Preparation of Samples with Different

Concentrations--- 126 Table A-1. The Secondary Structural Occurrence of Methylated Lysine in

Natural Proteins --- 136 Table A-2. The Z and P Values for w₉Value of KXaa9--- 137

(17)

xv

List of Schemes

Scheme 3-1. Synthesis of Fl-Dmk51-Tat via Reductive Methylation --- 100 Scheme 3-2. Mechanism of Reductive Methylation --- 100

(18)

xvi

Abbreviation

α-SYN α-Synuclein

Aβ Amyloid β peptide

Ac Acetyl

AD Alzheimer’s disease

AIDS Acquired immune deficiency syndrome

Ala Alanine

APS Ammonium persulfate

Arg Arginine

Bis-acrylamide N,N’-methylene-bis-acrylamide

CCR5 C-C chemokin receptor type 5

CD Circular dichroism

CD4 Cluster of differentiation 4

CDK9 Cyclin-dependent kinase 9

CPPs Cell-penetrating peptides

Cys Cysteine

CXCR4 C-X-C chemokine receptor type 4

DIEA Diisopropylethylamine

DMF Dimethylformamide

Dmk Dimethyllysine

DNA Deoxyribonucleic acid

DQF-COSY Double-quantum filtered-correlated spectroscopy E. coli tRNA Escherichia coli transfer ribonucleic acid

EMSA Electrophoretic mobility shift assay

FBS Fetal bovine serum

Fl 6-Carboxy-fluorescein

Fmoc N-9-Fluorenylmethoxycarbonyl

Gln Glutamine

Gly Glycine

gp120 Envelope glycoprotein GP120

HBTU O-1H-benzotriazol-1-yl-1,1,3,3-tetramethyluronium hexafluorophosphate

HIV Human immunodeficiency virus

HOBT 1-Hydroxybenzotriazole

Ile Isoleucine

K_D Dissociation constant

Leu Leucine

(19)

xvii

LTR Long terminal repeat

Lys Lysine

MALDI-TOF Matrix-assisted laser desorption ionization time-of-flight

MeOH Methanol

Mmk Monomethyllysine

NELF Negative elongation factor

NMR Nuclear magnetic resonance

NOE Nuclear Overhauser effect

NOESY Nuclear Overhauser effect spectroscopy

Orn Ornithine

PD Parkinson’s disease

PMT PRC2

Photomultiplier tube

Polycomb repressive complex 2

Pro Proline

PrP Prion protein

P-TEFb Positive transcriptional elongation factor-b PTMs Post-transcriptional modifications

RNA Ribonucleic acid

RNAPII RNA polymerase II

ROESY Rotating-frame nuclear Overhauser effect correlation spectroscopy

RT Reverse transcriptase

RP-HPLC RPMI

Reversed phase high-performance liquid chromatography Roswell Park Memorial Institute medium

SPPS Solid phase peptide synthesis TAR Trans-activation response element Tat Trans-activator of transcription

TEMED N,N,N’,N’-Tetramethylethylenediamine

TFA Trifluoroacetic acid

Thr Threonine

Tmk Trimethyllysine

TOCSY Total correlation spectroscopy

Tris Tris (hydroxylmethyl)-aminomethane

Tyr Tyrosine

UV-vis Ultraviolet-visible

Val Valine

(20)

Chapter 1 Introduction

(21)

1

Chapter 1. Introduction

1-1 Central Dogma of Molecular Biology

DNA, RNA, and proteins are the three crucial marcomolecules in living organisms.

The central dogma was introduced by Crick to describe the process of producing

proteins from DNA through RNA in 1958 (Figure 1-1).¹ DNA is a biopolymer that

carries genetic information. DNA is duplicated before a cell undergoes self-replication.

DNA is also used to produce pre-RNA through transcription. Both duplication and

transcription of DNA occur in the cell nucleus. RNA is processed through RNA splicing

to remove the non-coding regions before translation. The mature RNA is transported to

the cytoplasm. Proteins are built based on the corresponding genetic code on the mature

RNA through translation.

Figure 1-1. The central dogma of molecular biology is the genetic information flowing from DNA through RNA to proteins.¹The solid arrows indicate the information flow that occurs in all eukaryotic cells. The dashed arrow indicates the information flow that occasionally occurs in viruses through reverse transcriptases.

(22)

2

1-2 Proteins

Proteins are the end products of the central dogma. Based on the unique genetic

code carried by the RNA, each protein is composed of different types and number of

amino acid. Most amino acids are L-α-amino acids. Proteins are linear biopolymers with

peptide bonds linking an α-carboxyl group of one amino acid and an α-amino group of

another. The peptide bond is planar with six atoms in the same plane. The length of a

peptide bond is 1.32 Å, which is between a C-N single bond (1.49 Å) and a double bond

(1.27 Å), suggesting partial double bond character.² Each amino acid contains a

different side chain functional group, allowing proteins to perform various bioactivities.

Proteins are essential elements that control nearly all cellular functions. There are

several types of proteins differing in utility including structural components,³ signal

transduction,⁴ catalysis,⁵ and immune response.⁶ Proteins are responsible for almost all

bioactivities in the cell, and thus studies to enhance the fundamental knowledge on

proteins should improve our understanding of nature, along with potential technological

advancement.

1-3 Protein Folding and Function

In order to perform various biological functions, proteins must fold into

three-dimensional structures with high accuracy. Different protein structures give rise to

(23)

3

various protein functions.^{3, 7}For example, at least 15 distinct enzyme families require a

specific protein fold named αβ barrel to construct the appropriate active site geometry.⁸

If proteins are denatured or mutated and cannot fold correctly into the corresponding

three dimensional shape, proteins lose their functions or even lead to protein misfolding

diseases such as Alzheimer’s,⁹ Parkinson’s,¹⁰ Huntington’s,¹¹ and Crutzfeldt-Jacob

(prion) diseases.¹² Alzheimer’s disease (AD) is a clinical syndrome caused by

neurodegeneration and was estimated that 24.3 million people suffered from it in

2001.¹³AD is related to the abnormal formation and accumulation of amyloid E peptide

(Aβ) and tau protein.¹⁴ Parkinson’s disease (PD) is a common nerval syndrome caused

by the abnormal aggregation of a stable tetrameric protein, α-synuclein (α-SYN), to

form insoluble fibrils.¹⁵ Prion disease is also caused by the aggregation of a

helical-containing protein called prion protein (PrP).¹⁶ These three diseases are all

involved in peculiar protein stacking of once structurally diverse proteins into β-sheet

structured amyloid fibrils. Importantly, the exact conformation of a protein plays an

important role in its function. Thus, a thorough study of protein function at the

molecular level requires detailed structural analysis.

1-4 Hierarchy of Protein Structure

In 1952, Linderstrøm-Lang proposed the hierarchy of protein structure with four

(24)

4

levels: primary, secondary, tertiary, and quaternary.¹⁷ In Linderstrøm-Lang’s model,

each level was constructed by the elements of the previous level and was characterized

by specific patterns of interactions.¹⁷ The primary structure reveals the direct

composition of a protein in the unit of various types of amino acids, starting from the

amino-terminal end (N) to the carboxyl-terminal end (C’). The main-chain atoms are an

NH group of one residue bound to Cα, a central carbon atom (Cα) to which the side

chain (R) is attached, and a carbonyl group C’=O linked to the NHof another residue.

The backbone atoms are basically composed of a repeating unit (NH- Cα+C’=O)n,

which serve as the common framework of an amino acid (Figure 1-2). In order to

describe the structural properties of a protein, another method is introduced to

characterize the main chain. The original repeating unit can be viewed as one central

carbon (Cαn+1) extending to its prior (Cαn) and subsequent central carbons (Cαn+2). As

discussed earlier, the peptide C-N bond has partial double bond character.¹⁸ This

character allows the peptide bond to arrange six main chain atoms

(Cαn-C’O-NH-Cαnand Cαn+1-C’O-NH-Cαn) in a rigid planar structure.² Two

neighboring rigid planar structures are linked by the covalent bonds with the Cαatom,

rotating through N-Cα and Cα-C’ bonds. The two conventional dihedral angles for these

two bonds are named phi (I) and psi (ψ), respectively (Figure 1-2).

(25)

5

Figure 1-2. The peptide bond and the dihedral anglesIandψ in the backbond.

The combinations of the dihedral anglesare used to describe the structural

properties of the main chain. Most of the combinations of φ and ψ angles are not

allowed due to steric clashes between the peptide backbone and the side chains.¹⁹G. N.

Ramachandran calculated and plotted the sterically allowed regions as Ramachandran

plots with the dihedral angles ranging from -180° to 180° (Figure 1-3).¹⁹ The allowed

regions depend on the permitted van der Waals contact distance and the combination of

dihedral angles.¹⁹

Figure 1-3. Ramachandran plot.¹⁹The X axis is φ and the Y axis is ψangles, and the angle regions are from -180° to 180°.

Secondary structure is defined by patterns of hydrogen bonds between the

backbone amide and carboxyl groups. The basic secondary structures are α-helix

(26)

6

andβ-sheet.²⁰ The α-helix was first described by Pauling in 1951.²¹ The α-helix is a

right-handed coil with dihedral angles I = -57° and ψ = -47°.^{22, 23}The coil-like structure

has 3.6 residues per turn and is characterized by consecutive, main-chain, i←i+4

hydrogen bonds between each carbonyl oxygen (i) and an amide hydrogen (i+4) on the

adjacent helical turn (Figure 1-4).²⁴ One third of all protein residues adopt an α-helix

conformation, showing that helical proteins play important roles in living organism.²⁵

Figure 1-4. The structure of an α-helix (an α-helix from a four-α-helix bundle, PDB

2I7U).

β-Sheet is another common secondary structure. It is a flat plate configuration

containing multiple β-strands with inter-strand hydrogen bonds between backbone

C’=O and N-H on neighboring strands. β-Sheets can be further categorized into two

types: parallel and anti-parallel, distinguished by the arrangement of the hydrogen bond

orientation.²⁶ A parallel β-sheet is characterized by a series of twelve-membered

hydrogen-bonded rings, while an anti-parallel β-sheet is characterized by an alternating

series of ten-and fourteen-membered hydrogen-bonded rings. The dihedral angles of

parallel and anti-parallel β-sheets are (I = -119°, ψ = +113°) and (I = -139°, ψ = +135°),

respectively. β-Hairpins are one of the simplest super-secondary structures, consisting of

(27)

7

two anti-parallel β-strands connected through a short loop region (Figure 1-5).^27-29

Figure 1-5. The structure of a β-hairpin (the C-termini β-hairpin from GB1 protein,

PDB 2PLP).

Tertiary structure refers to the stable three-dimensional structure formed by a

polypeptide chain.³⁰ Various recurring secondary structures assemble to form the

tertiary structure, which is required to perform different and precise protein functions.

X-ray analysis has revealed significant relationship between function and structure.

Domains are the fundamental units of tertiary structure, which are also closely related to

protein function. The concept of a domain was first introduced by Wetlaufer after X-ray

studies of hen lysozyme and papain,^{31, 32}and proteolysis studies of immunoglobulins.^33,

34 Protein tertiary structures can be divided into four major classes based on their

secondary structure content of the domain: all-D domains, all-E domains, α+β domains,

and α/β domains.³⁵ According to an algorithm named “Structural Classification of

Proteins (SCOP) Database”, which investigates sequences and structures, these common

folds account for 16.2%, 22.6%, 25.4%, and 23.4% of the total 87681 structural hits,

respectively.³⁶ Pyruvate kinase is a phosphate group-transferring enzyme that plays an

crucial role in glycolysis. It contains three major domains: an all-β regulatory domain,

(28)

8

an α/β substrate binding domain, and an α/β nucleotide binding domain. Each

structurally different domain serves a different purpose in phosphate group transfer. A

typical tertiary structure has its nonpolar residues buried in the interior, forming a

hydrophobic core.³⁷ Polar and charged residues are more frequently found on the

surface, where proteins can interact with the aqueous environment through the

hydrophilic side chains.³⁷

Quaternary structure is the spatial assemble of multiple polypeptide chains.³⁸

Examples of proteins with quaternary structure include hemoglobin, DNA polymerase,

and ion channels. Conformational change or re-orientation of individual polypeptides

can induce changes in quaternary structure or connection between polypeptides.

Through such structural changes, protein function can be regulated and exert their

physiological function.

Each level of protein structure is held together by characteristic interactions and

forces. Higher levels of proteins structure are assembled through the structural units of

the lower level (Figure 1-6). Among the protein structure hierarchy, the secondary

structural level plays a key role in protein folding. Therefore, research on the factors

that affect the formation of secondary structure is important for understanding protein

structure formation and prediction.

(29)

9 Primary

Structure

Secondary Structure

Tertiary Structure

Quaternary Structure

Figure 1-6. Four hierarchical levels of protein structure (triosephosphate isomerase, PDB 8TIM).

1-5 Driving Force of Protein Folding

Proteins must fold into the native structure to carry out its function. There are four

dominant forces for protein folding and all these four forces are non-covalent in

nature.³⁹ These four forces are hydrophobics, electrostatics interaction, hydrogen

bonding, and van der Waals.^{37, 39-46}

Protein residues can be divided into two groups, polar and non-polar, depending on

their side chains. When a protein folds, most of the non-polar residues are buried inside

and form a hydrophobic core, while polar residues are mostly exposed to solvent. This

phenomenon is entropically favored and therefore leads to the increased stability of

(30)

10

proteins.^{37, 47, 48}The hydrophobic effect was first described by Kauzmman in 1959.

Polar residues are mostly charged and free to interact with their environment,

including solvent molecules and other polar functional groups. Electrostatic interactions

can be divided into three types: ion-ion, ion-dipole, and dipole-dipole.^{41, 49} A charged

side chain can interact with an oppositely charged functional group located on another

residue or the protein terminus. Dipoles are formed by the asymmetric distribution of

electrons due to the differences in electronegativity of the two atoms in a covalent bond.

Electrostatic interactions through ionic charges or dipoles contribute to protein stability

and the formation of protein structures.^{50, 51}

A hydrogen bond is an interaction between a hydrogen atom in an X-H group and a

highly electronegative atom Y such as nitrogen, oxygen, or fluorine.^{40, 52} The partial

positive charge on the H atom interacts with the partial negative charge on the Y atom.^40,

52Such an interaction is important for stabilizing secondary and tertiary structures.^{44, 53,}

54 The backbone hydrogen bond C=O···H-N is the most prevalent (68.1%), with

C=O···side chain (10.9%), N-H···side chain (10.4%), and side chain···side chain

hydrogen bond (10.6%) account for the remainder of the hydrogen bonds in protein

structures.⁴⁴

Another intermolecular interaction is van der Waals. Van der Waals force is a

dispersion force caused by the fluctuating polarization of the nearby entities.⁵⁵ In a

(31)

11

symmetrical molecule, there is no charge distribution on average. In reality, electrons

are mobile and might more towards one end of the molecule, forming a slight negatively

charged end (δ-) and a slightly positively charged end (δ+).⁵⁵ Individual van der Waals

interactions are very weak, yet a massive number of such weak forces can still

significantly influence protein structure and stability.⁵⁶

1-6 RNA Recognition

RNA-protein interactions are important in various fundamental biological processes,

including transcription, translation,⁵⁷ RNA processing and modification.⁵⁸ Both double

helical RNA and DNA are constructed by multiple complementary base pair such as

A-U, C-G and A-T, C-G.⁵⁹ There are three factors that control the binding affinity

between RNA and protein: electrostatic interaction between the protein positively

charged region and the negatively charged phosphate groups on the RNA backbone,

hydrogen bonding, and the interactions between the RNA groove and the protein side

chains. Specific proteins bind to specific sites on specific RNAs. The appropriate

binding of such proteins acts as a switch for RNA activation or repression. Therefore,

studies on RNA-protein recognition are important for understanding many diseases

related to RNA.

(32)

12

Human immunodeficiency virus (HIV) is a type of RNA retrovirus that causes the

acquired immune deficiency syndrome (AIDS).⁶⁰ A retrovirus is a single-stranded RNA

virus that targets a host cell as an obligate parasite.⁶¹ In most viruses, DNA is

transcribed into RNA, and RNA is translated into viral protein. In retroviruses, however,

RNA is reverse-transcribed into DNA by a virally encoded reverse transcriptase, and

then integrated into the genome of the host cell by a virally encoded integrase.⁶² Most

retroviruses contain three common genes in RNA genomes: gag, pol, and env. These

genes contain the information necessary for building the structural proteins and

important enzymes for new virus particles. The gag and env genes code for the core

nucleocapsid polypeptides and surface-coat proteins of the virus, respectively.⁶³The pol

gene code for the viral reverse transcriptase and other enzymes.⁶⁴ In the HIV-1 viral

RNA genome, there are six additional regulatory genes (tat, rev, nef, vif, vpr, and vpu)

that code for proteins that control the infection by HIV and the production of new viral

particles.⁶⁴ The tat gene encodes for the Tat protein, which serves as a transcriptional

trans-activator by binding TAR RNA. The Tat protein is important for HIV-1

replication.

Trans-activator of transcription (Tat) protein contains a basic region that can

recognize RNA: RKKRRQRRR (residue 49 to 57). The Tat protein targets the

trans-activating responsive element (TAR) RNA located at the 5’end of nascent HIV-1

(33)

13

transcripts.⁶⁵The TAR RNA contains a stem-loop structure composed of 59 nucleotides.

Two essential regions are the pentanucleotide loop (⁺²⁹CUGGG⁺³³) and the three-base

bulge (⁺²²UCU⁺²⁴) at the sites from +17 to +45. By interacting with this loop and bulge

region, Tat proteins alters the properties of the transcriptional complex and recruits

crucial enzymes, including the positive transcription elongation complex and RNA

polymerase II, for efficient production of full-length viral RNA.⁶⁶The Tat-TAR binding

provides a positive feedback cycle and allows HIV to have an explosive response once

the threshold amount of Tat protein is reached.⁶⁷Blocking this protein-RNA interaction

may repress the transcription of HIV-1 and serve as a potential treatment towards

AIDS.⁶⁸

1-7 Post-Translational Modifications (PTMs)

Proteins are synthesized through the following biological steps: translation,

polymerization, termination, and processing.⁶⁹ There are only 20 amino acids encoded

by the triple nucleotide codons in mRNA. However, there are about 140 amino acids

derivatives that have been identified in different proteins.⁷⁰ These 20 encoded amino

acids must undergo various modifications to increase or even alter their functionalities.

Any modification that occurs after the completion of translation is considered a

(34)

14 post-translational modification (PTM).⁷⁰

PTMs are a series of covalent processing events including peptide bond cleavage

and functional group attachment onto individual amino acids. Some common PTMs are

phosphorylation,⁷¹ acetylation,⁷² glycosylation,⁷³ acylation,⁷⁴ and methylation.⁷⁵ PTMs

are responsible for protein function regulation and structural change.⁷⁶

Protein methylation is a common post-translational modification that affects

thermal stability,⁷⁷ cellular stress response,⁷⁸ protein aging,⁷⁹ ,gene regulation,^80-82 and

transcriptional regulation.⁸³ Protein methylation typically takes place on arginine (Arg)

or lysine (Lys) residues in the protein sequence.⁷⁵Lysine can be methylated once, twice,

or three times by lysine methyltransferases into monomethyllysine (Mmk),

dimethyllysine (Dmk), and trimethyllysine (Tmk), respectively.⁸⁴ Lysine methylation

leads to the increase of the positive charge effective radius and hydrophobicity. Such

methylated lysines play an important role in protein-protein and protein-nucleic acid

regulation.^{85, 86} For the Tat protein, several post-translational modifications have been

identified that modulate the interactions of Tat with TAR and other essential enzyme

complexes.⁸⁷ These modifcations include lysine methylation at the residue adjacent to

the basic region.⁸⁷ Accordingly, in this thesis, we investigate the effect of various types

of lysine methylation on TAR RNA recognition by Tat_47-57derivatives.

(35)

15

1-8 Thesis Overview

Post-translational modifications are responsible for many protein behaviors. Lysine

methylation alters the physiological properties of the residue and may impact both

protein function and structure. There are three variations of methylated lysines that are

identified in proteins. It is logical to assume that the different numbers of methyl groups

attached on the side chain amino group should have different effects on proteins. In this

study, various types of methylated lysines are placed into two basic secondary structures:

α-helix and the simplest β-sheet model, “β-hairpin”, to investigate the effect of lysine

methylation on structural stability (Chapter 2).

Lysine methylation also plays an important part in biological processes.⁷⁵ The

regulatory Tat protein contains a basic region (RKKRRQRRR, residue 49 to 57), which

can specifically bind to the trans-activating responsive (TAR) element to modulate

transcription.⁸⁸ The binding between HIV-1 Tat protein and TAR RNA is essential for

the HIV-1 virus to efficiently produce viral RNA.⁸⁸ To study the effect of lysine

methylation on RNA recognition and cellular uptake, two lysine residues Lys₅₀ and

Lys₅₁ were replaced with monothylated, dimethylated, and trimethylated lysines

individualing in Chapter 3. The dissociation constant for the Tat derived peptide-TAR

RNA complexes was determined by gel shift assay. The cellular uptake efficiency of Tat

(36)

16

derived peptides into Jurkat cell was assessed by flow cytometry.

1-9 References

1. Crick, F. Central dogma of molecular biology. Nature 1970, 227, 561-563.

2. Pauling, L.; Corey, R. B. The Planarity of the Amide Group in Polypeptides. J. Am.

Chem. Soc. 1952, 74, 3964-3964.

3. Hall, A. Rho GTPases and the actin cytoskeleton. Science 1998, 279, 509-514.

4. Nishizuka, Y. The Role of Protein Kinase-C in Cell-Surface Signal Transduction and Tumor Promotion. Nature 1984, 308, 693-698.

5. Radzicka, A.; Wolfenden, R. A Proficient Enzyme. Science 1995, 267, 90-93.

6. Aderem, A.; Ulevitch, R. J. Toll-like receptors in the induction of the innate immune response. Nature 2000, 406, 782-787.

7. Gavin, A. C.; Aloy, P.; Grandi, P.; Krause, R.; Boesche, M.; Marzioch, M.; Rau, C.;

Jensen, L. J.; Bastuck, S.; Dumpelfeld, B.; Edelmann, A.; Heurtier, M. A.; Hoffman, V.; Hoefert, C.; Klein, K.; Hudak, M.; Michon, A. M.; Schelder, M.; Schirle, M.;

Remor, M.; Rudi, T.; Hooper, S.; Bauer, A.; Bouwmeester, T.; Casari, G.; Drewes, G.; Neubauer, G.; Rick, J. M.; Kuster, B.; Bork, P.; Russell, R. B.; Superti-Furga, G.

Proteome survey reveals modularity of the yeast cell machinery. Nature 2006, 440, 631-636.

8. Wierenga, R. K. The TIM-barrel fold: a versatile framework for efficient enzymes.

FEBS Lett. 2001, 492, 193-198.

9. Georges, J. Alzheimer's disease in real life. Eur. J. Neurol. 2005, 12, 328-328.

10. Lee, J. C.; Gray, H. B.; Winkler, J. R. Copper(II) binding to α-synuclein, the Parkinson's protein. J. Am. Chem. Soc. 2008, 130, 6898-6899.

11. Bates, G. P. Huntington's disease - Exploiting expression. Nature 2001, 413, 691-694.

12. Prusiner, S. B.; Groth, D.; Serban, A.; Stahl, N.; Gabizon, R. Attempts to Restore Scrapie Prion Infectivity after Exposure to Protein Denaturants. Proc. Natl. Acad.

Sci. U. S. A. 1993, 90, 2793-2797.

13. Ferri, C. P.; Prince, M.; Brayne, C.; Brodaty, H.; Fratiglioni, L.; Ganguli, M.; Hall, K.; Hasegawa, K.; Hendrie, H.; Huang, Y. Q.; Jorm, A.; Mathers, C.; Menezes, P. R.;

Rimmer, E.; Scazufca, M.; Intl, A. D. Global prevalence of dementia: a Delphi consensus study. Lancet 2005, 366, 2112-2117.

14. Ballard, C.; Gauthier, S.; Corbett, A.; Brayne, C.; Aarsland, D.; Jones, E.

Alzheimer's disease. Lancet 2011, 377, 1019-1031.

(37)

17

15. Kahle, P. J. α-synucleinopathy models and human neuropathology: similarities and differences. Acta Neuropathol. 2008, 115, 87-95.

16. Caughey, B.; Chesebro, B. Prion protein and the transmissible spongiform encephalopathies. Trends. Cell Biol. 1997, 7, 56-62.

17. Linderstrøm-Lang, K. U. Proteins and Enzymes. Lane. Medical. Lectures 1952, 6.

18. Invernizzi, G.; Papaleo, E.; Sabate, R.; Ventura, S. Protein aggregation:

Mechanisms and functional consequences. Int. J. Biochem. Cell B 2012, 44, 1541-1554.

19. Ramachandran, G. N.; Ramakrishnan, C.; Sasisekharan, V. Stereochemistry of Polypeptide Chain Configurations. J. Mol. Biol. 1963, 7, 95-99.

20. Richardson, J. S. The anatomy and taxonomy of protein structure. Adv. Protein Chem. 1981, 34, 167-339.

21. Pauling, L.; Corey, R. B.; Branson, H. R. The Structure of Proteins - 2 Hydrogen-Bonded Helical Configurations of the Polypeptide Chain. Proc. Natl.

Acad. Sci. U. S. A. 1951, 37, 205-211.

22. Arnott, S.; Wonacott, A. J. Atomic co-ordinates for an α-helix: refinement of the crystal structure of α-poly-l-alanine. J. Mol. Biol. 1966, 21, 371-383.

23. Barlow, D. J.; Thornton, J. M. Helix geometry in proteins. J. Mol. Biol. 1988, 201, 601-619.

24. Pauling, L.; Corey, R. B. The structure of synthetic polypeptides. Proc. Natl. Acad.

Sci. U. S. A. 1951, 37, 241-250.

25. Cheng, R. P.; Girinath, P.; Suzuki, Y.; Kuo, H. T.; Hsu, H. C.; Wang, W. R.; Yang, P.

A.; Gullickson, D.; Wu, C. H.; Koyack, M. J.; Chiu, H. P.; Weng, Y. J.; Hart, P.;

Kokona, B.; Fairman, R.; Lin, T. E.; Barrett, O. Positional Effects on Helical Ala-Based Peptides. Biochemistry 2010, 49, 9372-9384.

26. Pauling, L.; Corey, R. B. The pleated sheet, a new layer configuration of polypeptide chains. Proc. Natl. Acad. Sci. U. S. A. 1951, 37, 251-256.

27. Sibanda, B. L.; Thornton, J. M. β-hairpin families in globular proteins. Nature 1985, 316, 170-174.

28. Sibanda, B. L.; Blundell, T. L.; Thornton, J. M. Conformation of β-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. J. Mol. Biol. 1989, 206, 759-777.

29. Sibanda, B. L.; Thornton, J. M. Conformation of β hairpins in protein structures:

classification and diversity in homologous structures. Methods Enzymol. 1991, 202, 59-82.

30. Janin, J.; Chothia, C. Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 1985, 115, 420-430.

(38)

18

31. Phillips, D. C. 3-Dimensional Structure of an Enzyme Molecule. Sci. Am. 1966, 215, 78-90.

32. Drenth, J.; Jansoniu.Jn; Koekoek, R.; Swen, H. M.; Wolthers, B. G. Structure of Papain. Nature 1968, 218, 929-932.

33. Porter, R. R. Structural Studies of Immunoglobulins. Science 1973, 180, 713-716.

34. Edelman, G. M. Antibody Structure and Molecular Immunology. Science 1973, 180, 830-840.

35. Levitt, M.; Chothia, C. Structural Patterns in Globular Proteins. Nature 1976, 261, 552-558.

36. Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures.

J. Mol. Biol. 1995, 247, 536-540.

37. Pace, C. N.; Shirley, B. A.; McNutt, M.; Gajiwala, K. Forces contributing to the conformational stability of proteins. FASEB J. 1996, 10, 75-83.

38. Klotz, I. M.; Langerman, N. R.; Darnall, D. W. Quaternary structure of proteins.

Annu. Rev. Biochem. 1970, 39, 25-62.

39. Dill, K. A. Dominant forces in protein folding. Biochemistry 1990, 29, 7133-7155.

40. Hagler, A. T.; Huler, E.; Lifson, S. Energy functions for peptides and proteins. I.

Derivation of a consistent force field including the hydrogen bond from amide crystals. J. Am. Chem. Soc. 1974, 96, 5319-5327.

41. Perutz, M. F. Electrostatic effects in proteins. Science 1978, 201, 1187-1191.

42. Barlow, D. J.; Thornton, J. M. Ion-pairs in proteins. J. Mol. Biol. 1983, 168, 867-885.

43. Nicholls, A.; Sharp, K. A.; Honig, B. Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 1991, 11, 281-296.

44. Stickle, D. F.; Presta, L. G.; Dill, K. A.; Rose, G. D. Hydrogen bonding in globular proteins. J. Mol. Biol. 1992, 226, 1143-1159.

45. Pace, C. N.; Grimsley, G. R.; Scholtz, J. M. Protein ionizable groups: pK values and their contribution to protein stability and solubility. J. Biol. Chem. 2009, 284, 13285-13289.

46. Stigter, D.; Dill, K. A. Charge effects on folded and unfolded proteins. Biochemistry 1990, 29, 1262-1271.

47. Pace, C. N. Contribution of the hydrophobic effect to globular protein stability. J.

Mol. Biol. 1992, 226, 29-35.

48. Pace, C. N.; Fu, H.; Fryar, K. L.; Landua, J.; Trevino, S. R.; Shirley, B. A.;

Hendricks, M. M.; Iimura, S.; Gajiwala, K.; Scholtz, J. M.; Grimsley, G. R.

Contribution of hydrophobic interactions to protein stability. J. Mol. Biol. 2011, 408,

(39)

19 514-528.

49. Yoder, C. H. Teaching ion-ion, ion-dipole, and dipole-dipole interactions. J. Chem.

Educ. 1977, 54, 402-408.

50. Wada, A.; Nakamura, H. Nature of the charge distribution in proteins. Nature 1981, 293, 757-758.

51. Hol, W. G.; Halie, L. M.; Sander, C. Dipoles of the α-helix and β-sheet: their role in protein folding. Nature 1981, 294, 532-536.

52. Hagler, A. T.; Lifson, S. Energy functions for peptides and proteins. II. The amide hydrogen bond and calculation of amide crystal properties. J. Am. Chem. Soc. 1974, 96, 5327-5335.

53. Baker, E. N.; Hubbard, R. E. Hydrogen bonding in globular proteins. Prog. Biophys.

Mol. Biol. 1984, 44, 97-179.

54. McDonald, I. K.; Thornton, J. M. Satisfying hydrogen bonding potential in proteins.

J. Mol. Biol. 1994, 238, 777-793.

55. Feinberg, G.; Sucher, J. General Theory of the van der Waals Interaction: A Model-Independent Approach. Phys. Rev. A 1970, 2, 2395-2415.

56. Levitt, M.; Gerstein, M.; Huang, E.; Subbiah, S.; Tsai, J. Protein folding: the endgame. Annu. Rev. Biochem. 1997, 66, 549-579.

57. Matsuo, H.; Li, H.; McGuire, A. M.; Fletcher, C. M.; Gingras, A. C.; Sonenberg, N.;

Wagner, G. Structure of translation factor eIF4E bound to m7GDP and interaction with 4E-binding protein. Nat. Struct. Biol. 1997, 4, 717-724.

58. Varani, G.; Nagai, K. RNA recognition by RNP proteins during RNA processing.

Annu. Rev. Biophys. Biomol. Struct. 1998, 27, 407-445.

59. Seeman, N. C.; Rosenberg, J. M.; Rich, A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl. Acad. Sci. U. S. A. 1976, 73, 804-808.

60. Weiss, R. A. How does HIV cause AIDS? Science 1993, 260, 1273-1279.

61. Yoshida, M. Discovery of HTLV-1, the first human retrovirus, its unique regulatory mechanisms, and insights into pathogenesis. Oncogene 2005, 24, 5931-5937.

62. Smith, J. A.; Daniel, R. Following the path of the virus: the exploitation of host DNA repair mechanisms by retroviruses. ACS Chem. Biol. 2006, 1, 217-226.

63. King, S. R. HIV - Virology and Mechanisms of Disease. Ann. Emerg. Med. 1994, 24, 443-449.

64. Greene, W. C. The molecular biology of human immunodeficiency virus type 1 infection. N. Engl. J. Med. 1991, 324, 308-317.

65. Weeks, K. M.; Ampe, C.; Schultz, S. C.; Steitz, T. A.; Crothers, D. M. Fragments of the HIV-1 Tat protein specifically bind TAR RNA. Science 1990, 249, 1281-1285.

66. Mujeeb, A.; Bishop, K.; Peterlin, B. M.; Turck, C.; Parslow, T. G.; James, T. L.

NMR Structure of a Biologically-Active Peptide-Containing the RNA-Binding

(40)

20

Domain of Human-Immunodeficiency-Virus Type-1 Tat. Proc. Natl. Acad. Sci. U. S.

A. 1994, 91, 8248-8252.

67. Cullen, B. R. Regulation of HIV-1 Gene-Expression. FASEB J. 1991, 5, 2361-2368.

68. Stevens, M.; De Clercq, E.; Balzarini, J. The regulation of HIV-1 transcription:

Molecular targets for chemotherapeutic intervention. Med. Res. Rev. 2006, 26, 595-625.

69. Merrick, W. C. Mechanism and regulation of eukaryotic protein synthesis.

Microbiol. Rev. 1992, 56, 291-315.

70. Uy, R.; Wold, F. Post-translational covalent modification of proteins. Science 1977, 198, 890-896.

71. Lipman, F. A.; Levene, P. A. Serinephosphoric acid obtained on hydrolysis of vitellinic acid. J. Biol. Chem. 1932, 98, 109-114.

72. Choudhary, C.; Kumar, C.; Gnad, F.; Nielsen, M. L.; Rehman, M.; Walther, T. C.;

Olsen, J. V.; Mann, M. Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions. Science 2009, 325, 834-840.

73. Moremen, K. W.; Tiemeyer, M.; Nairn, A. V. Vertebrate protein glycosylation:

diversity, synthesis and function. Nat. Rev. Mol. Cell. Bio. 2012, 13, 448-462.

74. Towler, D. A.; Gordon, J. I.; Adams, S. P.; Glaser, L. The Biology and Enzymology of Eukaryotic Protein Acylation. Annu. Rev. Biochem. 1988, 57, 69-99.

75. Paik, W. K.; Kim, S. Protein Methylation. Science 1971, 174, 114-119.

76. Seo, J.; Lee, K. J. Post-translational modifications and their biological functions:

proteomic analysis and systematic approaches. J. Biochem. Mol. Biol. 2004, 37, 35-44.

77. Febbraio, F.; Andolfo, A.; Tanfani, F.; Briante, R.; Gentile, F.; Formisano, S.;

Vaccaro, C.; Scire, A.; Bertoli, E.; Pucci, P.; Nucci, R. Thermal stability and aggregation of sulfolobus solfataricus β-glycosidase are dependent upon the N-epsilon-methylation of specific lysyl residues: critical role of in vivo post-translational modifications. J. Biol. Chem. 2004, 279, 10185-10194.

78. Desrosiers, R.; Tanguay, R. M. Methylation of Drosophila histones at proline, lysine, and arginine residues during heat shock. J. Biol. Chem. 1988, 263, 4686-4692.

79. Najbauer, J.; Orpiszewski, J.; Aswad, D. W. Molecular aging of tubulin:

accumulation of isoaspartyl sites in vitro and in vivo. Biochemistry 1996, 35, 5183-5190.

80. Kramer Jamie, M. Epigenetic regulation of memory: implications in human cognitive disorders. BioMol. Concepts 2013, 4, 1-12.

81. Nakayama, J.; Rice, J. C.; Strahl, B. D.; Allis, C. D.; Grewal, S. I. Role of histone H3 lysine 9 methylation in epigenetic control of heterochromatin assembly. Science

(41)

21 2001, 292, 110-113.

82. Grewal, S. I.; Rice, J. C. Regulation of heterochromatin by histone methylation and small RNAs. Curr. Opin. Cell. Biol. 2004, 16, 230-238.

83. Chen, D.; Ma, H.; Hong, H.; Koh, S. S.; Huang, S. M.; Schurter, B. T.; Aswad, D.

W.; Stallcup, M. R. Regulation of transcription by a protein methyltransferase.

Science 1999, 284, 2174-2177.

84. Paik, W. K.; Paik, D. C.; Kim, S. Historical review: the field of protein methylation.

Trends. Biochem. Sci. 2007, 32, 146-152.

85. Martin, C.; Zhang, Y. The diverse functions of histone lysine methylation. Nat. Rev.

Mol. Cell Bio. 2005, 6, 838-849.

86. Zhang, X.; Wen, H.; Shi, X. B. Lysine methylation: beyond histones. Acta Bioch.

Bioph. Sin. 2012, 44, 14-27.

87. Hetzer, C.; Dormeyer, W.; Schnolzer, M.; Ott, M. Decoding Tat: the biology of HIV Tat posttranslational modifications. Microbes Infect. 2005, 7, 1364-1369.

88. Debaisieux, S.; Rayne, F.; Yezid, H.; Beaumelle, B. The Ins and Outs of HIV-1 Tat.

Traffic 2012, 13, 355-363.

(42)

22

(43)

Chapter2

Effect of Lysine Methylation on α-Helix and

β-Sheet Propensity

(44)

23

Chapter 2. 2-1 Introduction

α-Helix

The most abundant secondary structure in proteins is the α-helix, which is adopted

by nearly one third of all protein residues.¹The α-helix is characterized by consecutive,

main-chain, i←i+4 hydrogen bonds between each carbonyl oxygen (i) and an amide

hydrogen (i+4) on the adjacent helical turn.²α-Helix stability is determined by N- and

C-capping effects, side chain-helix macrodipole interactions, side chain-side chain

interactions, and the intrinsic structure forming tendencies of the constituting amino

acids.³ Relative occurring frequencies of each amino acid adopting different secondary

structures were analyzed by Chou and Fasman.⁴ These statistic results revealed that

each amino acid has its own propensity for different secondary structures.⁴ The

thermodynamic helix propensities were determined by Baldwin and co-coworkers in

alanine-based peptides with minimum side chain interaction based on circular dichroism

spectra using modified Lifson-Roig theory.^{5, 6} The thermodynamic propensities of the

amino acids were converted to the free energy of helix formation/propagation. Also, the

nucleation of a helix formation is thought to be more difficult than propagation,

therefore capping effects were considered for helix formation.⁶As such, the Lifson-Roig

(45)

24

theory was modified by Baldwin and coworkers to incorporate N- and C-capping

parameters.^5-7 The basic assumption is that helix propensity of each amino acid is

position independent.⁸

Lifson-Roig Theory

Statistical mechanical models had been used to describe the helix-coil equilibrium

including Zimm-Bragg theory and Lifson-Roig theory.^{9, 10}Both models assume that the

helix-coil equilibrium of each residue is a two-state equilibrium. Residues can only

adopt either helix (h) or coil (c) conformation. Zimm-Bragg theory introduced two

parameters to describe the helix-coil equilibrium of each residue: nucleation (σ) and

propagation (s).⁹ The statistical weight for initiating a helical unit (hc or ch) is defined

as σs. A statistical weight of two successive coil residues (cc) is unity (set to 1). The

statistical weight of two successive helical residues (hh) is σs². Through the

Zimm-Bragg model, the helicity of a peptide can be deduced by a partition function

with a few parameters. However, the original Zimm-Bragg model neglected many other

factors that affect helicity such as N- and C-capping, electrostatics, and macrodipole.¹¹

Modified Zimm-Bragg theory with additional parameters to include other important

interactions had been proposed.^{12, 13}

The Lifson-Roig theory, similar to Zimm-Bragg theory, employs two parameters

(46)

25

(w and v) to describe the equilibrium between α-helix and random-coil states in a

statistical mechanical manner.¹⁰ However, only the statistical weight of the helical

conformation with at least three continuous residues adopting helix conformation would

be considered a helix. This is due to the fact that a helix cannot exist without the

stabilization of (i, i+4) hydrogen bonding. The statistical weight of each state of each

residue is based on the residue's own state and the state of the two neighboring

residues.¹⁰The Lifson-Roig model utilizes a 4x4 transfer matrix to describe the statistic

weight of a residue, while Zimm-Bragg model uses a simpler 2x2 transfer matrix.

hh hc ch cc

hh w v 0 0

hc 0 0 1 1

ch v v 0 0

cc 0 0 1 1

The parameters were introduced to describe the different structural state of a

residue: u, coil state; v, helix state adjacent to coil state; w, helix state located between

helix states. The parameter v describes helix initiation,¹⁰ which is an independent and

energetically uphill process in helix formation. Baldwin and co-workers later discovered

the different capping effects for N- and C- capping, thereby introducing two additional

parameters to describe the N- and C-capping (n and c).⁵ The modified 4x4 transfer

matrix was: