• 沒有找到結果。

The knowledge of the secondary structure of a protein has great importance in the study of the protein functionality. Currently, the main technique to determine protein structure is X-ray crystallography, which is a slow, and often a difficult process. On the other hand, the protein sequence data arising from sequencing projects is growing rapidly. Thus, it is increasingly important to predict the structure of proteins whose sequences are known. One significant step towards elucidating the structure and function of a protein is the prediction of its secondary structure.

Protein secondary structure is characterized by regular elements such as α-helices and β-sheets and non-repetitive motifs such as tight turns, bulges and random coil structures. A tight turn in protein structure is defined as a site where a polypeptide chain reverses its overall direction, i.e., leads the chain to fold back on itself by nearly 180°, and the amino acid residues directly involved in forming the turn are no more than six. Depending on the number of residues forming the turn, tight turns are further classified as δ, γ, β, α, and π-turns [1]. Among the tight turns, β-turn is the most predominant one. A β-turn involves four amino acid residues. The β-turns originally recognized by Venkatachalam [2] are stabilized by a hydrogen bond between the backbone CO(i) and the backbone NH(i + 3). However, Lewis et al. [9] found that 25% of β-turns are “open,” i.e., have no intra-turn hydrogen bond at all as stipulated by Venkatachalam [2]. Open turns do not lend themselves to classification by dihedral angles. Therefore, the definition widely accepted for b-turns is: A β-turn comprises four consecutive residues where the distance between Cα(i) and Cα(i + 3) is less than 7 Å, and the tetrapeptide chain is not in a helical conformation. The distance between the Cα atoms in the first and last residues of a tetrapeptide, i.e., Cα(i) and Cα(i + 3), is a

key criterion common to all β-turns, further the backbone dihedral angles in the inner residues i + 1 and i + 2 will define different types of β-turns.

On average, about 25% of all protein residues comprise β-turns [5]. As one of the most common types of non-repetitive motifs in proteins, β-turns bear great significance in protein structure and function. Both from structural and functional point of view, β-turns play important biological roles as reflected from the following points: First, β-turns are four-residue reversals in proteins so that they help in the formation of higher-order structure [6]. A polypeptide chain cannot fold into a globular fold without β-turns; Second, β-turns usually occur on the exposed surface of a protein and are likely to be involved in molecular recognition processes and interactions between receptors and substrates [1; 4],and provide very useful information for designing template structures for the design of new molecules such as drugs, pesticides, and antigens. Furthermore, being at solvent-exposed surfaces, the residues that form β-turns tend to be hydrophilic residues.; and third, also play an important role in protein folding and stability [6]. Further, one major secondary structural feature of many biologically active peptides is β-turn. β-Turn forms an integral component in the fundamental building block for anti-parallel β-sheets, which plays a good candidate for molecular recognition processes since being at solvent-exposed surfaces, and its formation is an important stage during the process of protein folding [6]. Therefore, to improve on the identification of structural motifs such as the building block for anti-parallel β-sheets and fold recognition, an accurate method to identify the location of β-turns in a protein sequence needs to be developed. Consequently, prediction of β-turns would be small step toward the overall prediction of three-dimensional structure of a protein from its amino acid sequence. It will also help in identification of structural motifs such as β-hairpin. β-turns provide very useful information for

defining template structures for the design of new molecules such as drugs, pesticides, and antigens.

A number of β-turn prediction methods have been developed, they can be divided into two categories: statistics-based and machine learning-based methods. The majority of statistics-based methods empirically employed the ‘positional preference approaches" [7; 8; 9; 10; 11]. In the Chou–Fasman method [8], a set of probabilities is assigned to each residue and the conformational parameters and positional frequencies are determined by calculating the relative frequency of each secondary structure. In the 1–4 and 2–3 correlation model [11], the coupling effects between the first and fourth residues and between the second and third residues are taken into account. In the sequence coupled model developed by Chou [7] within the first-order Markov chain framework, the sequence correlation effect for an entire oligo-peptide is considered. GORBTURN uses the positional frequencies and equivalent parameters [12] to remove the potential helix and strand forming residues from the β-turn prediction [13]. As to machine learning-based methods, a neural network method, BTPRED, was developed by Shepherd et al. [14] to predict the location and type of β-turns in proteins. The prediction performance could not be objectively compared because of the different dataset in these methods. Kaur and Raghava evaluated these methods and found that BTPRED was most accurate among these β-turn prediction methods [15].

BetaTPred2, an improved neural network method was developed by Kaur and Raghava [16]. In that method, they use multiple sequence alignment as input instead of the single amino acid sequence, and a great improvement in prediction performance has been achieved (Matthews correlation coefficient MCC = 0.43). k–nearest neighbor method, which is combined with a filter that uses predicted protein secondary

structure information was developed by Kim [17].

The SVM is an extremely successful learning theory that usually outperforms other machine learning technologies such as artificial neural networks (ANNs) and nearest neighbor methods. In recent years, SVMs have performed well in diverse applications of bioinformatics in several aspects including prediction of secondary structure [18; 19], classification of protein quaternary structure [20], etc. In this work, we attempt to predict β-turns in proteins using support vector machine (SVM) with various information derive from protein sequence, comparing with some other β-turn prediction methods that were recently evaluated by Kaur and Raghava [15], and with the other β-turn prediction methods that using the same data set. In this study. we employ a support vector machine (SVM) method to predict β-turns in proteins, and attempt to seek helpful input feature vectors only based on the information of protein sequences.

相關文件