• 沒有找到結果。

Chapter 1 Introduction

1.2 Related Works

Knowledge-based energy functions are generally derived from distributions of experiment structural data [16, 17]. A previous comparative study showed that knowledge-based functions usually perform better [18]. This is part of the reason that most previous works have been focused on knowledge-based energy functions in structure prediction applications [16, 17, 19-21]. Given a database of high-quality structural information, knowledge-based energy functions can often produce the desired results with far less computational overhead. However, there are a few exceptions. For example, the Park et al. compared various knowledge-based and physics-based functions and found that the knowledge-based functions are not better than physics-based functions at ranking native structures [22]. In addition, the Tobi et al. showed that it is not possible to find a pairwise

knowledge-based potential with a resolution of 1 Å or better [23, 24].

Energy functions derived from experimental structure information are constrained by their underlying statistics, which means that their accuracy and applicability are intrinsically tied to the data source used for parameterization; if the particular data set overrepresents a certain class of structural properties (e.g., helical structures), the resulting energy function would also reflect this statistical bias in its scoring. For the reasons of simplification and fast, reduced representation are usually used in knowledge-based energy functions. These energy functions may contain pseudo-potentials which lack the physical meanings.

On the other hand, physics-based energy functions are based on physical mechanisms.

And they do not have such inherent limitations when they are carefully parameterized.

Because they are derived from ab initio quantum mechanical calculations based on the principles of physics alone, they do not have any intrinsic bias toward any particular structural properties. Physics-based approaches assume that the protein potential energy functions can be broken down into terms of bond stretching, angle bending, torsional, and non-bonded interactions. These parameters in these terms are then fitted to high-level ab initio quantum mechanical calculations and small molecule thermodynamic and spectroscopy data. The advantage of a physics-based energy function is the clear physical meaning of each individual term. Great efforts have been invested to understand the driving forces or dominant forces in its discriminative ability.

Despite their perceived advantages, physics-based energy functions have not been widely considered practical for fold recognition or protein structure prediction types of applications. This finding was mostly due to the high-computation cost required and the cumulative inaccuracies introduced in parameterization of the energy functions compounded by the fact that most of the earlier energy functions were calibrated against

rather sparse and often qualitative experimental data. Because of the continued improvement in computer speed and advances in energy function design, this situation has begun to change in recent years; physics-based energy functions are now showing signs of living up to their potential [25-33].

As more new energy functions are being developed, one problem that became apparent was the lack of a standardized benchmark to allow comparisons of performances across different energy functions parameterized by using different properties and methodologies.

One of the earliest studies that resembled a benchmarking test for protein potentials was the study carried out by Novotny et al. [34], in which two proteins with the same number of residues but different folds were considered and the sequence of one was “threaded” onto the fold of the other. The resulting correct and incorrect models were then evaluated with use of the CHARMM potential. The conclusion of this study was frequently misinterpreted as supporting evidence that modern molecular mechanics-based potentials were not of sufficient accuracy to discriminate between native and non-native folds.

In the spirit of the Novotny test, several groups have created decoy sets (non-native or near-native conformations) as a testing benchmark for evaluating the usefulness of a new scoring function [22, 35-37]. These decoy sets provide an objective common platform on which new energy functions can be evaluated. Furthermore, one may also view the decoys as the products of a previous step in a hierarchical structure prediction scheme; a high-quality scoring function could then be integrated as a filter to select the best candidate from among a set of low-resolution prediction of the native fold.

Another problem for developing a physics-based energy function is that the determination of potential parameters is conceptually and practically difficult. Although it would seem the most deductive and logical, determining potential parameters solely from

electronic structure calculation of small molecules does not necessarily give the best performance for modeling proteins in solvent. Instead of this bottom-up approach, we might ask whether we can infer physical forces from their consequences, that is, the structures of proteins already in the structural database. Recently, structural genomics projects have started to produce thousands of 3D protein structures. If we can use this information to improve protein energy functions, this would yield energy functions that are practically powerful for many purposes and should be conceptually helpful for gaining insight into the physical principles of protein architecture.

In this thesis, we try to develop a new energy function that combine the advantages both of knowledge-based and physics-based energy functions and avoid the disadvantages of them. We used simplified energy terms physical mechanisms to form our energy functions. In order to optimize the parameters of our energy functions for protein folding, we adopt a reduced optimization scheme that to consider the overall weight for each energy term as the parameter of each energy term. For our purpose, we use an evolutionary algorithm and a well-developed decoy sets as our training set to optimize the overall weights.

相關文件