• 沒有找到結果。

1-1 Central Dogma of Molecular Biology

DNA, RNA, and proteins are the three crucial marcomolecules in living organisms.

The central dogma was introduced by Crick to describe the process of producing

proteins from DNA through RNA in 1958 (Figure 1-1).1 DNA is a biopolymer that

carries genetic information. DNA is duplicated before a cell undergoes self-replication.

DNA is also used to produce pre-RNA through transcription. Both duplication and

transcription of DNA occur in the cell nucleus. RNA is processed through RNA splicing

to remove the non-coding regions before translation. The mature RNA is transported to

the cytoplasm. Proteins are built based on the corresponding genetic code on the mature

RNA through translation.

Figure 1-1. The central dogma of molecular biology is the genetic information flowing from DNA through RNA to proteins.1The solid arrows indicate the information flow that occurs in all eukaryotic cells. The dashed arrow indicates the information flow that occasionally occurs in viruses through reverse transcriptases.

2

1-2 Proteins

Proteins are the end products of the central dogma. Based on the unique genetic

code carried by the RNA, each protein is composed of different types and number of

amino acid. Most amino acids are L-α-amino acids. Proteins are linear biopolymers with

peptide bonds linking an α-carboxyl group of one amino acid and an α-amino group of

another. The peptide bond is planar with six atoms in the same plane. The length of a

peptide bond is 1.32 Å, which is between a C-N single bond (1.49 Å) and a double bond

(1.27 Å), suggesting partial double bond character.2 Each amino acid contains a

different side chain functional group, allowing proteins to perform various bioactivities.

Proteins are essential elements that control nearly all cellular functions. There are

several types of proteins differing in utility including structural components,3 signal

transduction,4 catalysis,5 and immune response.6 Proteins are responsible for almost all

bioactivities in the cell, and thus studies to enhance the fundamental knowledge on

proteins should improve our understanding of nature, along with potential technological

advancement.

1-3 Protein Folding and Function

In order to perform various biological functions, proteins must fold into

three-dimensional structures with high accuracy. Different protein structures give rise to

3

various protein functions.3, 7For example, at least 15 distinct enzyme families require a

specific protein fold named αβ barrel to construct the appropriate active site geometry.8

If proteins are denatured or mutated and cannot fold correctly into the corresponding

three dimensional shape, proteins lose their functions or even lead to protein misfolding

diseases such as Alzheimer’s,9 Parkinson’s,10 Huntington’s,11 and Crutzfeldt-Jacob

(prion) diseases.12 Alzheimer’s disease (AD) is a clinical syndrome caused by

neurodegeneration and was estimated that 24.3 million people suffered from it in

2001.13AD is related to the abnormal formation and accumulation of amyloid E peptide

(Aβ) and tau protein.14 Parkinson’s disease (PD) is a common nerval syndrome caused

by the abnormal aggregation of a stable tetrameric protein, α-synuclein (α-SYN), to

form insoluble fibrils.15 Prion disease is also caused by the aggregation of a

helical-containing protein called prion protein (PrP).16 These three diseases are all

involved in peculiar protein stacking of once structurally diverse proteins into β-sheet

structured amyloid fibrils. Importantly, the exact conformation of a protein plays an

important role in its function. Thus, a thorough study of protein function at the

molecular level requires detailed structural analysis.

1-4 Hierarchy of Protein Structure

In 1952, Linderstrøm-Lang proposed the hierarchy of protein structure with four

4

levels: primary, secondary, tertiary, and quaternary.17 In Linderstrøm-Lang’s model,

each level was constructed by the elements of the previous level and was characterized

by specific patterns of interactions.17 The primary structure reveals the direct

composition of a protein in the unit of various types of amino acids, starting from the

amino-terminal end (N) to the carboxyl-terminal end (C’). The main-chain atoms are an

NH group of one residue bound to Cα, a central carbon atom (Cα) to which the side

chain (R) is attached, and a carbonyl group C’=O linked to the NHof another residue.

The backbone atoms are basically composed of a repeating unit (NH- Cα+C’=O)n,

which serve as the common framework of an amino acid (Figure 1-2). In order to

describe the structural properties of a protein, another method is introduced to

characterize the main chain. The original repeating unit can be viewed as one central

carbon (Cαn+1) extending to its prior (Cαn) and subsequent central carbons (Cαn+2). As

discussed earlier, the peptide C-N bond has partial double bond character.18 This

character allows the peptide bond to arrange six main chain atoms

(Cαn-C’O-NH-Cαnand Cαn+1-C’O-NH-Cαn) in a rigid planar structure.2 Two

neighboring rigid planar structures are linked by the covalent bonds with the Cαatom,

rotating through N-Cα and Cα-C’ bonds. The two conventional dihedral angles for these

two bonds are named phi (I) and psi (ψ), respectively (Figure 1-2).

5

Figure 1-2. The peptide bond and the dihedral anglesIandψ in the backbond.

The combinations of the dihedral anglesare used to describe the structural

properties of the main chain. Most of the combinations of φ and ψ angles are not

allowed due to steric clashes between the peptide backbone and the side chains.19G. N.

Ramachandran calculated and plotted the sterically allowed regions as Ramachandran

plots with the dihedral angles ranging from -180° to 180° (Figure 1-3).19 The allowed

regions depend on the permitted van der Waals contact distance and the combination of

dihedral angles.19

Figure 1-3. Ramachandran plot.19The X axis is φ and the Y axis is ψangles, and the angle regions are from -180° to 180°.

Secondary structure is defined by patterns of hydrogen bonds between the

backbone amide and carboxyl groups. The basic secondary structures are α-helix

6

andβ-sheet.20 The α-helix was first described by Pauling in 1951.21 The α-helix is a

right-handed coil with dihedral angles I = -57° and ψ = -47°.22, 23The coil-like structure

has 3.6 residues per turn and is characterized by consecutive, main-chain, i←i+4

hydrogen bonds between each carbonyl oxygen (i) and an amide hydrogen (i+4) on the

adjacent helical turn (Figure 1-4).24 One third of all protein residues adopt an α-helix

conformation, showing that helical proteins play important roles in living organism.25

Figure 1-4. The structure of an α-helix (an α-helix from a four-α-helix bundle, PDB

2I7U).

β-Sheet is another common secondary structure. It is a flat plate configuration

containing multiple β-strands with inter-strand hydrogen bonds between backbone

C’=O and N-H on neighboring strands. β-Sheets can be further categorized into two

types: parallel and anti-parallel, distinguished by the arrangement of the hydrogen bond

orientation.26 A parallel β-sheet is characterized by a series of twelve-membered

hydrogen-bonded rings, while an anti-parallel β-sheet is characterized by an alternating

series of ten-and fourteen-membered hydrogen-bonded rings. The dihedral angles of

parallel and anti-parallel β-sheets are (I = -119°, ψ = +113°) and (I = -139°, ψ = +135°),

respectively. β-Hairpins are one of the simplest super-secondary structures, consisting of

7

two anti-parallel β-strands connected through a short loop region (Figure 1-5).27-29

Figure 1-5. The structure of a β-hairpin (the C-termini β-hairpin from GB1 protein,

PDB 2PLP).

Tertiary structure refers to the stable three-dimensional structure formed by a

polypeptide chain.30 Various recurring secondary structures assemble to form the

tertiary structure, which is required to perform different and precise protein functions.

X-ray analysis has revealed significant relationship between function and structure.

Domains are the fundamental units of tertiary structure, which are also closely related to

protein function. The concept of a domain was first introduced by Wetlaufer after X-ray

studies of hen lysozyme and papain,31, 32and proteolysis studies of immunoglobulins.33,

34 Protein tertiary structures can be divided into four major classes based on their

secondary structure content of the domain: all-D domains, all-E domains, α+β domains,

and α/β domains.35 According to an algorithm named “Structural Classification of

Proteins (SCOP) Database”, which investigates sequences and structures, these common

folds account for 16.2%, 22.6%, 25.4%, and 23.4% of the total 87681 structural hits,

respectively.36 Pyruvate kinase is a phosphate group-transferring enzyme that plays an

crucial role in glycolysis. It contains three major domains: an all-β regulatory domain,

8

an α/β substrate binding domain, and an α/β nucleotide binding domain. Each

structurally different domain serves a different purpose in phosphate group transfer. A

typical tertiary structure has its nonpolar residues buried in the interior, forming a

hydrophobic core.37 Polar and charged residues are more frequently found on the

surface, where proteins can interact with the aqueous environment through the

hydrophilic side chains.37

Quaternary structure is the spatial assemble of multiple polypeptide chains.38

Examples of proteins with quaternary structure include hemoglobin, DNA polymerase,

and ion channels. Conformational change or re-orientation of individual polypeptides

can induce changes in quaternary structure or connection between polypeptides.

Through such structural changes, protein function can be regulated and exert their

physiological function.

Each level of protein structure is held together by characteristic interactions and

forces. Higher levels of proteins structure are assembled through the structural units of

the lower level (Figure 1-6). Among the protein structure hierarchy, the secondary

structural level plays a key role in protein folding. Therefore, research on the factors

that affect the formation of secondary structure is important for understanding protein

structure formation and prediction.

9 Primary

Structure

Secondary Structure

Tertiary Structure

Quaternary Structure

Figure 1-6. Four hierarchical levels of protein structure (triosephosphate isomerase, PDB 8TIM).

1-5 Driving Force of Protein Folding

Proteins must fold into the native structure to carry out its function. There are four

dominant forces for protein folding and all these four forces are non-covalent in

nature.39 These four forces are hydrophobics, electrostatics interaction, hydrogen

bonding, and van der Waals.37, 39-46

Protein residues can be divided into two groups, polar and non-polar, depending on

their side chains. When a protein folds, most of the non-polar residues are buried inside

and form a hydrophobic core, while polar residues are mostly exposed to solvent. This

phenomenon is entropically favored and therefore leads to the increased stability of

10

proteins.37, 47, 48The hydrophobic effect was first described by Kauzmman in 1959.

Polar residues are mostly charged and free to interact with their environment,

including solvent molecules and other polar functional groups. Electrostatic interactions

can be divided into three types: ion-ion, ion-dipole, and dipole-dipole.41, 49 A charged

side chain can interact with an oppositely charged functional group located on another

residue or the protein terminus. Dipoles are formed by the asymmetric distribution of

electrons due to the differences in electronegativity of the two atoms in a covalent bond.

Electrostatic interactions through ionic charges or dipoles contribute to protein stability

and the formation of protein structures.50, 51

A hydrogen bond is an interaction between a hydrogen atom in an X-H group and a

highly electronegative atom Y such as nitrogen, oxygen, or fluorine.40, 52 The partial

positive charge on the H atom interacts with the partial negative charge on the Y atom.40,

52Such an interaction is important for stabilizing secondary and tertiary structures.44, 53,

54 The backbone hydrogen bond C=O···H-N is the most prevalent (68.1%), with

C=O···side chain (10.9%), N-H···side chain (10.4%), and side chain···side chain

hydrogen bond (10.6%) account for the remainder of the hydrogen bonds in protein

structures.44

Another intermolecular interaction is van der Waals. Van der Waals force is a

dispersion force caused by the fluctuating polarization of the nearby entities.55 In a

11

symmetrical molecule, there is no charge distribution on average. In reality, electrons

are mobile and might more towards one end of the molecule, forming a slight negatively

charged end (δ-) and a slightly positively charged end (δ+).55 Individual van der Waals

interactions are very weak, yet a massive number of such weak forces can still

significantly influence protein structure and stability.56

1-6 RNA Recognition

RNA-protein interactions are important in various fundamental biological processes,

including transcription, translation,57 RNA processing and modification.58 Both double

helical RNA and DNA are constructed by multiple complementary base pair such as

A-U, C-G and A-T, C-G.59 There are three factors that control the binding affinity

between RNA and protein: electrostatic interaction between the protein positively

charged region and the negatively charged phosphate groups on the RNA backbone,

hydrogen bonding, and the interactions between the RNA groove and the protein side

chains. Specific proteins bind to specific sites on specific RNAs. The appropriate

binding of such proteins acts as a switch for RNA activation or repression. Therefore,

studies on RNA-protein recognition are important for understanding many diseases

related to RNA.

12

Human immunodeficiency virus (HIV) is a type of RNA retrovirus that causes the

acquired immune deficiency syndrome (AIDS).60 A retrovirus is a single-stranded RNA

virus that targets a host cell as an obligate parasite.61 In most viruses, DNA is

transcribed into RNA, and RNA is translated into viral protein. In retroviruses, however,

RNA is reverse-transcribed into DNA by a virally encoded reverse transcriptase, and

then integrated into the genome of the host cell by a virally encoded integrase.62 Most

retroviruses contain three common genes in RNA genomes: gag, pol, and env. These

genes contain the information necessary for building the structural proteins and

important enzymes for new virus particles. The gag and env genes code for the core

nucleocapsid polypeptides and surface-coat proteins of the virus, respectively.63The pol

gene code for the viral reverse transcriptase and other enzymes.64 In the HIV-1 viral

RNA genome, there are six additional regulatory genes (tat, rev, nef, vif, vpr, and vpu)

that code for proteins that control the infection by HIV and the production of new viral

particles.64 The tat gene encodes for the Tat protein, which serves as a transcriptional

trans-activator by binding TAR RNA. The Tat protein is important for HIV-1

replication.

Trans-activator of transcription (Tat) protein contains a basic region that can

recognize RNA: RKKRRQRRR (residue 49 to 57). The Tat protein targets the

trans-activating responsive element (TAR) RNA located at the 5’end of nascent HIV-1

13

transcripts.65The TAR RNA contains a stem-loop structure composed of 59 nucleotides.

Two essential regions are the pentanucleotide loop (+29CUGGG+33) and the three-base

bulge (+22UCU+24) at the sites from +17 to +45. By interacting with this loop and bulge

region, Tat proteins alters the properties of the transcriptional complex and recruits

crucial enzymes, including the positive transcription elongation complex and RNA

polymerase II, for efficient production of full-length viral RNA.66The Tat-TAR binding

provides a positive feedback cycle and allows HIV to have an explosive response once

the threshold amount of Tat protein is reached.67Blocking this protein-RNA interaction

may repress the transcription of HIV-1 and serve as a potential treatment towards

AIDS.68

1-7 Post-Translational Modifications (PTMs)

Proteins are synthesized through the following biological steps: translation,

polymerization, termination, and processing.69 There are only 20 amino acids encoded

by the triple nucleotide codons in mRNA. However, there are about 140 amino acids

derivatives that have been identified in different proteins.70 These 20 encoded amino

acids must undergo various modifications to increase or even alter their functionalities.

Any modification that occurs after the completion of translation is considered a

14 post-translational modification (PTM).70

PTMs are a series of covalent processing events including peptide bond cleavage

and functional group attachment onto individual amino acids. Some common PTMs are

phosphorylation,71 acetylation,72 glycosylation,73 acylation,74 and methylation.75 PTMs

are responsible for protein function regulation and structural change.76

Protein methylation is a common post-translational modification that affects

thermal stability,77 cellular stress response,78 protein aging,79 ,gene regulation,80-82 and

transcriptional regulation.83 Protein methylation typically takes place on arginine (Arg)

or lysine (Lys) residues in the protein sequence.75Lysine can be methylated once, twice,

or three times by lysine methyltransferases into monomethyllysine (Mmk),

dimethyllysine (Dmk), and trimethyllysine (Tmk), respectively.84 Lysine methylation

leads to the increase of the positive charge effective radius and hydrophobicity. Such

methylated lysines play an important role in protein-protein and protein-nucleic acid

regulation.85, 86 For the Tat protein, several post-translational modifications have been

identified that modulate the interactions of Tat with TAR and other essential enzyme

complexes.87 These modifcations include lysine methylation at the residue adjacent to

the basic region.87 Accordingly, in this thesis, we investigate the effect of various types

of lysine methylation on TAR RNA recognition by Tat47-57derivatives.

15

1-8 Thesis Overview

Post-translational modifications are responsible for many protein behaviors. Lysine

methylation alters the physiological properties of the residue and may impact both

protein function and structure. There are three variations of methylated lysines that are

identified in proteins. It is logical to assume that the different numbers of methyl groups

attached on the side chain amino group should have different effects on proteins. In this

study, various types of methylated lysines are placed into two basic secondary structures:

α-helix and the simplest β-sheet model, “β-hairpin”, to investigate the effect of lysine

相關文件