國 立 交 通 大 學
應用化學系分子科學碩士班
碩 士 論 文
由耦合簇理論計算建立用於
電荷自洽密度泛函緊束縛法參數化程序的位能函式庫
Constructing A Library of Potential Energy
Functions from Coupled Cluster Calculations to Be
Used in the SCC-DFTB Parameterization
研 究 生:楊博宇
指導教授:魏恆理 博士
由耦合簇理論計算建立用於
電荷自洽密度泛函緊束縛法參數化程序的位能函式庫
Constructing A Library of Potential Energy Functions from Coupled Cluster Calculations to Be Used in the SCC-DFTB Parameterization
研究生:楊博宇 Student:Po-Yu Yang 指導教授:魏恆理 博士 Advisor:Dr. Henryk Witek
國 立 交 通 大 學 應用化學系分子科學碩士班
碩 士 論 文
A Thesis
Submitted to M. S. Program, Institute of Molecular Science, Department of Applied Chemistry
National Chiao Tung University In partial Fulfillment of the Requirements
For the Degree of Master
in
Molecular Science August 2011
Hsinchu, Taiwan, Republic of China
i
由耦合簇理論計算建立用於
電荷自洽密度泛函緊束縛法參數化程序的位能函式庫
學生:楊博宇 指導教授:魏恆理 博士 國立交通大學應用化學系分子科學碩士班中文摘要
電荷自洽密度泛函緊束縛法(以下簡稱SCC-DFTB)是量子化學領域中一種強大的半 經驗方法,可應用於大尺度化學分子系統的計算。但是SCC-DFTB仍然存在一些應用上 的限制,第一個是以目前的SCC-DFTB參數組無法同時得到精確的能量、平衡結構和振 動頻率,這是由於目前該方法在參數化程序中使用太少的參考位能面來擬合參數;第二 個限制則是來自目前的SCC-DFTB參數組只能應用在少部分特定元素的計算。為了克服 這些限制,在參數化的程序中必須使用多樣化的參考位能面來擬合各種元素間不同的鍵 結狀態。在我們快速的參數化程序中,大多數高階的精確量子化學方法都因為太過於昂 貴而不適用於產生這些參考位能面,因此,我們建構出一種類力場的位能函數,利用四 階泰勒近似來展開這些位能函數。藉由精確的耦合簇理論計算,我們的位能函數不僅提 供精確的位能面,並且節省了許多計算的時間。我們建構的位能函式庫總共包含了74種 常見化學分子的位能函數,這些分子不但包含了週期表中前三週期的元素,並且存在這 些元素之間最典型的鍵結狀態。在建構位能函式庫的過程中,我們設計了驗證程序來評 估其精確性;在擬合函數的程序中,我們將每個位能函數的均方根誤差值控制在小於10-4 a.u.;另外我們也將這些位能函數實作到可被Gaussian09程式調用的外部程式當中,並進 行結構優化和振動頻率計算。對於所有選定的化學分子,我們的位能函數可以得到和耦 合簇理論計算(CCSD(T)/cc-pVTZ)幾乎相同的平衡結構和振動頻率,而且振動頻率的誤 差皆小於10cm-1。
ii
Constructing A Library of Potential Energy Functions from Coupled
Cluster Calculations to Be Used in the SCC-DFTB Parameterization
Student: Po-Yu Yang Advisor: Dr. Henryk Witek
M. S. Program,
Institute of Molecular Science Department of Applied Chemistry
Abstract
The SCC-DFTB method is a powerful semi-empirical method of quantum chemistry,
which is able to treat huge molecular systems. However, SCC-DFTB has certain limitations,
which diminish its strength for particular chemical applications. The first limitation concerns
the fact that accurate energies, equilibrium geometries, and vibrational frequencies cannot be
obtained with the existing parameter sets simultaneously. In the current parameterization of
SCC-DFTB, insufficient number of reference potential energy surfaces were used for
determination of the parameters. The second limitation comes from the fact that the current
SCC-DFTB parameter sets are available only for few selected elements. To overcome those
limitations, numerous reference potential energy surfaces representing various bonding
mechanisms between atoms are required in the parameterization process. For generating
reference potential energy surfaces usable in a fast parameterization process, most of
iii
we construct a collection of force-field like potential energy functions on the 4-th order Taylor
approximation to expand those functions. Generated by accurate Coupled cluster calculations,
our potential energy function not only provide us with accurate potential energy surfaces but
also allow for substantial savings in the computational time. The final library of potential
energy functions were determined for 74 common molecules containing the elements of the
first, second, and third row of the periodic table of elements. The list of these molecules
attempts to represent the most typical bonding situations between these elements. We
conclude the derivation of the library of potential energy functions by presenting a
verification algorithm designed to validate the accuracy of our library. In the fitting process,
we are able to control the RMS error to be less than 10-4 a.u. in all the studied cases. We have
also implemented each of potential energy function as an external program, which can be
invoked from the Gaussian09 program, for performing geometry optimization and calculating
vibrational frequencies. In all of studied cases, our library of potential energy functions can
reproduce the equilibrium geometry and vibrational frequencies giving results almost
identical with those from the CCSD(T)/cc-pVTZ calculation. For all the calculated vibrational
iv
Acknowledgment
First, I want to express my deep gratitude to my advisor Prof. Henryk Witek. In scientific
advices, he taught me abundant knowledge of the quantum chemistry and mathematical. He
always has the patience to answer my questions and gave many useful suggestions when I
was confused. I will be grateful with him in my whole life!
Next, I want to thank Prof. Chu and Prof. Shiegeto for taking time to read my thesis and
giving me many valuable comments.
And I want to thank my lab mates: Chien-pin, Cristopher, Chun-hao, Amy and Shu-wei.
Thank all of you gave me a lot of help in my master studies. I especially appreciate the help
from Chien-pin, who taught me how to write the program and help me solve variable
computational troubles. I also want to thank Shu-wei, he shoulder most of the work for
laboratory.
Also, I want to thank many administrative staffs of NCTU and teacher Lai in resources
classroom who gave essential support for me. Thus, I can focus on my master studies.
Last but not the least, I want to express my gratitude to my family, especially my Mom.
Thank her take care of my life in the study period in Hsinchu. Because of her, I could pay all my
attention on school life without worrying about the difficulties in daily life. And thank Dad and
v
Content
中文摘要 ... i Abstract ... ii Acknowledgement ... iv List of Figures ... vi Chapter 1 Introduction ... 1Chapter 2 Definition of the Potential Energy Function ... 4
2.1 The Force-field-like Potential Energy Function ... 4
2.2 Taylor Series ... 4
2.3 The Types of Internal Coordinates ... 7
2.4 Symmetry Adaptation ... 9
2.5 Generation of the Data Points ... 11
Chapter 3 Coupled-Cluster Theory ... 14
Chapter 4 Verification Process and Other Techniques ... 16
4.1 Verification process ... 16
4.2 Solving the Overdetermined System ofLinear Equations ... 16
4.3 Numerical Energy Gradient ... 18
4.4 Vibrational Frequency Analysis ... 19
Chapter 5 Results and Discussions ... 23
Chapter 6 The Parameterization of the SCC-DFTB Repulsive Potentials ... 26
Reference ... 29
Appendix A ... 31
vi
List of Figures
Figure 2.1 The internal coordinates in H2, H2O and NH3 7
Figure 2.2 The internal coordinates in CH4 and C2H4 8
Figure 2.3 Do symmetry operations on water 10
Figure 2.4 The z-matrix input for water molecule 11
Figure 2.5 The dihedral angle in four atoms molecular system 12
Figure 2.6 The z-matrix input of methane molecule 12
r
1
Chapter 1
Introduction
In recent years, owing to dynamic development of computational science and quantum chemistry, molecular simulations became a powerful and useful tool used by many chemists
in their research. There exist many quantum chemical methods, which can be applied
successfully to various chemical problems. Most of these methods can be classified as
post-Hartree-Fock techniques or various variants of density functional theory (DFT).
However, if one wants to generate a potential energy surface, determine vibrational
frequencies, or obtain other physicochemical information for large molecules, those accurate
methods are probably too expensive to be used. Therefore, another class of somewhat less
accurate methods commonly known as semi-empirical methods was developed for huge
chemical systems.
Self-consistent charge density-functional tight-binding (SCC-DFTB) method(1) is one of
such powerful semi-empirical methods. It was primarily developed as an approximation of
DFT(2), (3), applicable to very large molecular systems appearing in either physical or chemical
considerations. SCC-DFTB is based on tight-binding (TB) theory(4) and on the second-order
2
to treat huge molecular systems with more than 1000 atoms. Its development can be
considered as an important contribution to computational chemistry and solid-state material
science.
However, SCC-DFTB has certain limitations, which diminish its strength for particular
chemical applications; removing some of these limitations is one of the most important
directions of development of the SCC-DFTB method. The first limitation concerns the fact
that the accurate energies, equilibrium geometries, and vibrational frequencies cannot be
obtained with SCC-DFTB parameter sets simultaneously.(5) This is related to the fact that
SCC-DFTB does not represent the potentials between two elements in a very accurate fashion.
In the current parameterization of SCC-DFTB, insufficient reference potential energy surfaces
to be used for constructing the parameters, which is the primary reason for the current
situation. The second limitation comes from the fact that currently the SCC-DFTB parameter
sets are available only for a set of few selected elements.
To overcome those limitations, we decided to generate numerous reference potential
energy surfaces required in the parameterization process. In order to keep a high level of
accuracy in the reference potential energy surfaces, high level quantum chemical calculations
are needed. Most of the available techniques are too expensive to be employed in a fast
parameterization process. To solve this problem, we construct potential energy functions by
3
required potential energy surface with similar degree of accuracy but in much shorter time.
In order to represent most typical bonding situations between the elements of the first,
second, and third row of the periodic table of elements, we construct a library of the potential
energy functions for 73 common molecules containing those elements. For generating each
potential energy function, we perform a series of accurate quantum chemical calculations
using coupled-cluster theory. The accuracy of each potential energy function is verified via
determination of equilibrium geometries and harmonic frequencies.
This chapter contains a general introduction to the topics studied in this Thesis. Details
concerning our choice of potential energy function are shown in Chapter 2. A short outline of
coupled-cluster theory is briefly discussed in Chapter 3. In Chapter 4, we describe the
verification process and discuss the simulation techniques used in our work. In Chapter 5, we
present the results and give a discussion of our work. In Chapter 6 we suggest how to apply
4
Chapter 2
Definition of the Potential Energy Function
2.1 The Force-field-like Potential Energy Function
The force-field functions used routinely in molecular mechanics methods provide a
relationship between energy and the molecular geometry close to the equilibrium.(6) Usually,
the force-field functions are derived from experimental data or accurate quantum mechanical
calculations. Following the previous description, we want to generate a potential energy
function for each molecule in form of a force-field-like function. By fitting the parameters to
accurate quantum mechanical calculations, the potential energy function can reproduce
accurate energies for a wide range of molecular geometries.
2.2 Taylor Series
According to Taylor theorem(7), all sufficiently smooth function can locally be
approximated by polynomials. Therefore, we use Taylor series to expand our potential energy functions. The Taylor expansion of a real function f x that is infinitely many times ( )
differentiable in a neighborhood of x can be expressed in form of a power series e (3) 2 3 ''( ) ( ) ( ) ( ) '( )( ) ( ) ( ) 2! 3! e e e e e e e f x f x f x f x f x xx xx xx (2.1)
5
In other words, this power series can describe the function f around a given point x . e
Replacing the formal derivatives terms ( ) ( ) ! i e f x i by adjustable parameters k i 2 3 0 1 2 3 0 ( ) ( ) ( ) ( ) ( ) e e e i i e i f x k k x x k x x k x x k x x
(2.2)yields a formally equivalent series that can be used to describe the behavior of f around
e
x . We further truncate the infinite Taylor series to a finite form, which is usually referred to
as the n-th order Taylor approximation
0 ( ) ( ) n i i e i f x k x x
(2.3) Though higher-order approximations may provide a better description of reality, they can alsoincrease the computational effort. Thus, in our work, the potential energy function is
expressed in a form of the 4th-order Taylor function, which combines reasonable accuracy
with affordable computational load.
To construct our potential energy functions, we expand the energy in a Taylor series
around the equilibrium geometry. Thus, the point x in Eq. (2.3) can be regarded as the e
equilibrium geometry. The point x is an arbitrary geometry located around the equilibrium
geometry. For representing the geometric variation of the arbitrary geometry x , the geometries are expressed by some internal coordinates. The point x and x are e
transformed into internal coordinates (ae ,be ,ce ...) and (a b c , , ...). For example, the 4-th
6 as ) )( ( ) )( ( ) )( ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , ( 011 101 110 2 002 2 020 2 200 001 010 100 000 e e e e e e e e e e e e c c b b k c c a a k b b a a k c c k b b k a a k c c k b b k a a k k c b a E 4 , , 0 ( , , ) ( e) ( e) ( e) E a b c k a a b b c c
(2.4) In the fitting process, we determine the parameters k using the following steps. First,we perform CCSD(T) geometry optimization to determine the equilibrium geometry and the
corresponding energy. Second, we generate a large number of geometries around the
equilibrium geometry, and compute the corresponding CCSD(T) single point energies. The
CCSD(T) calculations, using the cc-pVTZ basis sets, are performed with the MOLPRO2k9
program.(8) Third, we transfer the equilibrium geometry and all the non-equilibrium
geometries into their corresponding internal coordinates and substitute them into the potential
energy function. For each of the non-equilibrium geometries, we can obtain one equation
from the potential energy function. If the number of equations is larger than the number of
unknown geometric parameters used for constructing the potential energy function, the
resultingoverdetermined system of linear equations may be solved by the QR decomposition
7
2.3 The Types of Internal Coordinates
Following the previous description, the internal coordinates in potential energy functions
are used to represent the geometry variation. In order to represent the molecular potential
energy surfaces more accurately by our functions, the representation of molecular motions are
also important for internal coordinates.
For small molecules, we can use bond lengths and bond angles as internal coordinates.
Below, we discuss H2, H2O and NH3 as examples of internal coordinates.
Figure 2.1
The internal coordinates in H2, H2O and NH3
In the H2 molecule, one internal coordinate is sufficient to represent the H-H stretching: the
bond length between the two hydrogens. In the H2O molecule, there are three internal
coordinates. Two of them are the O-H bond lengths, which correspond to the O-H stretching.
The third one is the angle formed between two O-H bonds, which represent the bending
between the two O-H bonds. In the NH3 molecule, there are six internal coordinates. Three of
them are the N-H bond lengths, which correspond to the N-H stretching. Other three are the
angles formed between three N-H bonds, which represent the bending and wagging of those
N-H bonds. r r1 r2 a r1 r3 r2 a1 a3 a2
8
When the size of a molecule becomes larger, the interrelation between bond lengths and
bond angles may become complicated. Therefore, in the molecular systems which have more
than five atoms, we set all the interatomic distances to be the internal coordinates. Below, we
discuss methane and ethane as examples of internal coordinates.
Figure 2.2
The internal coordinates in CH4 and C2H4
For methane, there are ten internal coordinates in total, which are separated into two types.
One type is the C-H bond lengths (red lines), which correspond to the C-H stretching. And
another is the distances between two hydrogens (green lines) which represent the bending,
rocking, wagging or twisting between the C-H bonds. For ethene, there are fifteen internal
coordinates in total, which are separated into three types. The first type is the C-C bond length
(red line), which represent the C-C stretching. The second type is the distances between any
carbon and hydrogen (green lines). And the third type is the distances between any two
hydrogens (blue lines). Using these three types of internal coordinates, we can represent all
the motions of those C-H bonds successfully.
9
any two atoms as the internal coordinates, the number of coordinates and the size of potential
energy functions may become too large and hard to determine the parameters. In order to
balance the accuracy and computational load, some of the distances are ignored. The number
of internal coordinates for each molecule is shown in Appendix A.
2.4 Symmetry Adaptation
The examples given in last section, suggest that there are symmetry links between some of internal coordinates. The coefficients of those symmetry-related internal coordinates in the
potential energy functions are identical. If we can figure out those symmetry-relations
between the internal coordinates and organize them together in the potential energy function,
then the number of parameters will be greatly reduced. This step will speed up the process of
QR decomposition.
In this work, we use symmetry operation to determine the symmetry-adapted internal
coordinates for each molecule. First, we check the point group of a molecule, and construct
the symmetry operations for this molecule. The operation may permutate some of the internal
coordinates, which allows for separating the internal coordinates into many different
symmetry classes. The water molecule has the “C2v” symmetry, with symmetry operations: E,
C2, σv’ and σv”. The operator E and σv’ can be ignored because they do not interchange any
10
From Figure 2.3, we observe the permutation between the internal coordinate r1 and r2.
Therefore, we can determine r1 and r2 are the symmetry-equivalent internal coordinates. The
original potential energy function of the water molecule is
) )( ( ) )( ( ) )( ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , ( 2 2 011 1 1 101 2 2 1 1 110 2 002 2 1 1 020 2 1 1 200 001 2 2 010 1 1 100 000 2 1 e e e e e e e e e e e e a a r r k a a r r k r r r r k a a k r r k r r k a a k r r k r r k k a r r E (2.5) After collecting the symmetry-equivalent internal coordinates together, the Eq. (2.5) can be rewritten as
) )( ( ) )( ( ) )( ( ) ( ) ( ) ( ) ( ) ( ) ( ) , , ( 2 2 1 1 7 2 2 1 1 6 2 5 2 1 1 2 1 1 4 3 2 2 1 1 2 1 2 1 e e e e e e e e e e e e a a r r a a r r k r r r r k a a k r r r r k a a k r r r r k k a r r E (2.6) This process is called symmetry adaptation. It can not only reduce the computational load inthe fitting process, but it can also increase the evaluation speed of the potential energy
functions, especially for large and high-symmetry molecular system. For the water molecule,
the number of parameters is reduced from 35 to 22; for cyclopropane, the number of
r1 r2 C2 r2 r1 σv(yz) a a Figure 2.3 Do symmetry operations on water
11
O
H 1 OH1
H 1 OH2 2 HOH
parameters is reduced from 91390 to 8986. The number of parameters before and after the
symmetry adaptation is shown in Appendix A for each molecule.
2.5 Generation of the Data Points
In the fitting process, the quality and quantity of the data points in solving the
overdetermined system of linear equations are both important. If the geometries are far from
the equilibrium geometry, our potential energy functions may not generate an accurate
potential energy surface. Therefore, we use z-matrix to generate the geometries of data points,
since those geometries can be easily controlled.
The z-matrix is a way to represent a system built of atoms. It provides a description of
each atom in a molecule in terms of its atomic number, a bond length, a bond angle and a
dihedral angle. To describe a molecular system with less than four atoms, we can only use
bond lengths and bond angles. For example for water molecule,
Figure 2.4
The z-matrix input for water molecule.
where the OH1 and OH2 are the O-H bond lengths and the HOH is the angle between two
O-H bonds. In this case, the z-matrix is totally identical with the definition of our internal
OH1 OH2
12 C H 1 R1 H 1 R2 2 A1 H 1 R3 2 A2 3 1 H 1 R4 2 A3 3 2 coordinates.
In the molecular system with more than four atoms, the new term is introduced into the
z-matrix, which is called dihedral angle. In Figure 2.5, we show the definition of dihedral
angle , defined as the angle between two planes, plane 2-1-3 and plane 1-3-4.
Figure 2.5
The dihedral angle in four atoms molecular system.
The z-matrix input for methane shown in Figure 2.6 should contain four bond lengths (R1, R2,
R3, R4), three bond angles (A1, A2, A3) and 2 dihedral angles (1, 2). The z-matrix input of
methane can be shown as:
Figure 2.6
The z-matrix input of methane molecule.
1 2 r2 r3 r1 r4 ang1 ang3 ang2
13
By using the z-matrix, we can use a random generator to produce easily the data points.
In our work, the bond lengths are distributed in the range of 10% from the equilibrium
14
Chapter 3
Coupled-Cluster Theory
Coupled-cluster (CC) theory(9),(10) is a post-Hartree-Fock technique, which constructs a
multi-electron wavefunction using the exponential cluster operator applied to the
Hartree-Fock determinant HF . Over the past years, this method has developed as perhaps
the most reliable computational method for prediction of molecular properties.
Coupled-cluster theory constructs an approximate solution to the time-independent
Schrödinger equation H E (3.1) where H
is the Hamiltonian of the system and the CC wavefunction is written in an exponential form T HF
e
(3.2) The exponential term in Eq. (3.2) can be expanded as
! 3 ! 2 1 ! 3 2 0 T T T N T e N N T (3.3) In Eq. (3.3), T is a cluster operator, which includes all possible excitation operators.
n
T
T
T
T
T
1 2 3
(3.4) where 1 T
correspond to all single excitations, 2
15
so on. In the formalism of second quantization these excitation operators are conveniently
expressed as i k ik k i
a
a
c
T
1 (3.12) j i l k l k j i kl ija
a
a
a
c
T
, , , 24
1
(3.13)In the above formula, a and a are denoted as creation and annihilation operators
respectively and ,i j stand for occupied and k l for unoccupied orbitals. ,
The exponential form of cluster operator usually recovers more correlation energy than
configuration interaction (CI), and it also guarantee the size extensively of the solution, but
makes the computations very expensive. Thus, we usually reduce the number of operators in
T
. For example, if we only keep 1
T
and 2
T
in
T
then the reduced technique is called theCCSD method. If we only keep 2
T
then it is called the CCD method. One of high accurate reduced techniques is called CCSD(T), which is applied in our work. In the CCSD(T), thecluster operator not only contains 1
T
and 2
T
, but also includes the contribution from 3
T
in perturbative fashion.16
Chapter 4
Verification Process and
Other Techniques
4.1 Verification process
For verifying the accuracy of our potential energy functions, we have implemented each of them as an external FORTRAN program, which can be invoked from the Gaussian09
program(11) that is used as a convenient external tool for performing the geometry
optimization and calculating the vibrational frequencies. By comparing the equilibrium
geometries and vibrational frequencies with the results obtained with CCSD(T)/cc-pVTZ, we
get a direct evidence supporting the accuracy of our potential energy functions. The
verification process employs numerical energy gradient for optimization and double
numerical hessian for the vibrational analysis. The details are discussed in the following
sections.
4.2 Solving the Overdetermined System of Linear Equations
Following the description in Section 2.1, we introduce the QR decomposition technique
17
for the potential energy function, which depends on n unknown fit parameters, and we get a
system of m linear equations, which can be expressed in matrix formalism as
) ,..., 2 , 1 ( 1 m i k X E n j j ij i
(4.1)where Xij is a matrix of powers of the displacements of the internal coordinates, E is a i
vector of the corresponding CCSD(T) energies, and kj denotes a vector of the unknown fit
parameters. Next, we transform those m linear equations into a matrix form,
XK
E
(4.2) where 1 11 12 1 1 2 21 22 2 2 1 2 , , n n m m m mn m E X X X k E X X X k E X K E X X X k (4.3)The least squares approach for solving this problem is based on the concept of
minimization of the sum of squares of "errors" between the right- and left-hand sides of
Eq. (4.2), that is to find the minimum of the function 2
E
XK
(4.4) For finding the best solution K, I have written a program in the FORTRAN languageinvolving a routine called DGELS from Linear Algebra PACKage (LAPACK) library(12). The
DGELS routine uses QR decomposition of matrix X to solve an overdetermined system of
18
errors from the fitting process are shown in the Appendix A.
4.3 Numerical Energy Gradient
For determination of the energy gradient in the geometry optimization process, we apply
numerical differentiation in our program. To ensure good numerical accuracy of calculations,
the five-points stencil formulas are employed for each coordinate dimension. Assume that the
spacing between points in the grid is h, the one-dimensional five-point stencil of a point x
is {x2h, x h , x , xh, x2h}. The first derivative of a function f at the point x can be obtained by writing out the function f at the remaining four points of the stencil as a
Taylor series: ) ( ) ( ) ( ! 3 ) ( ! 2 ) ( ) ( ) ( 3 1 4 3 3 2 2 2 h O x x x f h x x f h x x f h x f h x f (4.6) ) ( ) ( ! 3 ) ( ! 2 ) ( ) ( ) ( 3 1 4 3 3 2 2 2 h O x x f h x x f h x x f h x f h x f (4.7) ) ( ) ( ! 3 8 ) ( ! 2 4 ) ( 2 ) ( ) 2 ( 3 2 4 3 3 2 2 2 h O x x f h x x f h x x f h x f h x f (4.8) ) ( ) ( ! 3 8 ) ( ! 2 4 ) ( 2 ) ( ) 2 ( 3 2 4 3 3 2 2 2 h O x x f h x x f h x x f h x f h x f (4.9)
Evaluating Eq. (4.6) – Eq. (4.7) and Eq. (4.8) – Eq. (4.9) gives us
) ( ) ( 3 ) ( 2 ) ( ) ( 3 1 4 3 h O x x f h x x f h h x f h x f (4.10) ) ( ) ( 3 8 ) ( 4 ) 2 ( ) 2 ( 3 2 4 3 h O x x f h x x f h h x f h x f (4.11)
We evaluate 8Eq. (4.10) – Eq (4.11) to eliminate the 3 3 ( ) f x x
term, which gives us
) ( ) ( 12 ) 2 ( ) 2 ( ) ( 8 ) ( 8 4 h O x x f h h x f h x f h x f h x f (4.12)
19 Then we get x x f ( ) as h h x f h x f h x f h x f h O h h x f h x f h x f h x f x x f 12 ) 2 ( ) 2 ( ) ( 8 ) ( 8 ) ( 12 ) 2 ( ) 2 ( ) ( 8 ) ( 8 ) ( 4 (4.13)
The magnitude of the error in this approximation is O h( 4), which evaluates to 10-12 u with
the choice of h of 0.001 a.u. in our program.
4.4 Vibrational Frequency Analysis
In quantum mechanics, the molecular motion in a normal vibration can be described as a
kind of simple harmonic motion. According to Hook’s Law and Newton’s Law, the
vibrational frequency v in harmonic oscillator can be expanded as
m k v 2 1 (4.14) where k is the force constant and m is the mass. In molecular mechanics, the force
constant can be calculated from the Hessian matrix, which is the second partial derivation of
the total energy E with respect to the Cartesian displacements of the atoms. For a molecular
system, containing N atoms, the force constants matrix F can be expand as a 3N3N
20 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 N N ij i j N N N N N E E E E x x y x z x z E E E E y x y y z y z E F E E E E z x z y z z z E E E E z x z y z z z (4.15)
Each element of the hessian matrix can be evaluated by numerical differentiation with a
displacement h. There are two kinds of elements in the hessian matrix. The first type,
2 2 ) ( x x f
, is located on the diagonal of the matrix. Those elements can be determined as
follows:
Evaluating Eq. (4.6) + Eq. (4.7) gives us
) ( ) ( ) ( 2 ) ( ) ( 2 1 4 2 2 h O x x f h x f h x f h x f (4.16) The element 2 2 ) ( x x f can be determined as 2 4 1 2 2 2 ) ( 2 ) ( ) ( ) ( ) ( 2 ) ( ) ( ) ( h x f h x f h x f h O h x f h x f h x f x x f (4.17)
The second type has a two-dimensional form
y x y x f 2 ( , )
. It can be obtained by writing out the
function f at the six points of the two-dimensional stencil as a Taylor series:
) ( ) , ( ! 2 1 ) , ( ) , ( ) , ( 2 1 3 2 2 h O x y x f h x y x f h y x f y h x f (4.18) ) ( ) , ( ! 2 1 ) , ( ) , ( ) , ( 2 1 3 2 2 h O x y x f h x y x f h y x f y h x f (4.19) ) ( ) , ( ! 2 1 ) , ( ) , ( ) , ( 2 3 2 2 2 h O y y x f h y y x f h y x f h y x f (4.20)
21 ) ( ) , ( ! 2 1 ) , ( ) , ( ) , ( 2 2 3 2 2 h O y y x f h y y x f h y x f h y x f (4.21) ) ( ) , ( ) , ( ! 2 1 ) , ( ! 2 1 ) , ( ) , ( ) , ( ) , ( 3 3 2 2 2 2 2 2 2 2 h O y x y x f h y y x f h x y x f h y y x f h x y x f h y x f h y h x f (4.22) ) ( ) , ( ) , ( ! 2 1 ) , ( ! 2 1 ) , ( ) , ( ) , ( ) , ( 3 3 2 2 2 2 2 2 2 2 h O y x y x f h y y x f h x y x f h y y x f h x y x f h y x f h y h x f (4.23)
Evaluating Eq. (4.18) + Eq. (4.19) and ignore the error function O
) , ( 2 ) , ( ) , ( ) , ( 2 2 2 y x f y h x f y h x f x y x f h (4.24) Evaluating Eq. (4.20) + Eq. (4.21) and ignore the error function O
) , ( 2 ) , ( ) , ( ) , ( 2 2 2 y x f h y x f h y x f y y x f h (4.25)
Evaluating Eq. (4.22) + Eq. (4.23) and also ignore the error function O
y x y x f h y y x f h x y x f h y x f h y h x f h y h x f ) , ( 2 ) , ( ) , ( ) , ( 2 ) , ( ) , ( 2 2 2 2 2 2 2 2 (4.26)
From Eq. (4.24) and Eq. (4.25), we can rewrite Eq. (4.26) as
y x y x f h h y x f h y x f y h x f y h x f y x f h y h x f h y h x f ) , ( 2 ) , ( ) , ( ) , ( ) , ( ) , ( 2 ) , ( ) , ( 2 2 (4.27)
Finally, the element
y x y x f 2 ( , ) can be determined as
) , ( ) , ( ) , ( ) , ( ) , ( 2 ) , ( ) , ( 2 1 ) , ( 2 2 h y x f h y x f y h x f y h x f y x f h y h x f h y h x f h y x y x f (4.28)To determine the vibrational frequencies in Eq. (4.14), the force constants matrix has been
22 j i ij cart c w m m m F F . .. , (4.29)
Diagonalization of the mass weighted force constant Fm w c. . . yields a set of 3N eigenvectors
and 3N eigenvalues. After separating the translation and rotation modes, the roots of the
23
Chapter 5
Results and Discussions
For constructing available library in SCC-DFTB parameterization process, I have fitted the potential energy functions for 73 common molecules, representing the most common
bonding situations between the elements of the first-, second- and third-rows of the periodic
table. Our potential energy functions can reproduce the potential energy as accurate as
CCSD(T)/cc-pVTZ. Our function can be easily invoked by the SCC-DFTB parameterization
program. The input for the potential energy routine is the molecular geometry; the routine
returns the corresponding energy.
For evaluating the accuracy of our potential energy functions, I first evaluate the RMS
error of the fitting process. For the cases with large errors, I try to put more data points or
remove some data points with large displacements from the equilibrium. If the error problem
still cannot be solved, improving the definition of internal coordinates sometimes is effective.
In order to represent the variation of the geometry or the motions of atoms in the molecule
accurately, I change the types of internal coordinate or use a larger number of coordinates.
Finally in all the studied cases we are able to control the RMS error to be less than 10-4 a.u.
24
A potential energy function characterized by small RMS error does not necessarily
generate an accurate potential energy surface. Therefore, following the previous description in
Section 4.1, we also verify the accuracy of potential energy function by performing geometry
optimization and vibrational frequency analysis. Most of the potential energy functions can
reproduce the equilibrium geometry and vibrational frequency giving results almost identical
with those from the CCSD(T)/cc-pVTZ calculation. In a few cases, some vibrational modes
have larger errors. We have figured out an effective way to overcome this problem. Following
the eigenvector of such a vibrational mode, we generate an additional set of geometries. After
we put those extra data points into the potential energy linear fitting procedure, we can obtain
acceptable vibrational frequencies for each molecule. For all vibrational frequencies, we
request the error to be smaller than 10 cm-1. The detailed information about the geometries
and vibrational frequencies is shown in Appendix B.
Though our scheme could be successfully applied for generating potential energy
functions for many different molecules, there are still 5 molecules, for which the potential
energy function could not be determined in a satisfactory fashion. For benzene, furane and
pyrrole, our potential energy functions produce good equilibrium geometries but the accurate
vibrational frequencies have substantial errors. For acetone and cyclopropane, the geometry
optimization process has failed.
25
last chapter, we briefly describe the future work about applying our references potential
26
Chapter 6
The Parameterization of
the SCC-DFTB Repulsive Potentials
In the SCC-DFTB method, the total energy function is separated into an electronic part
and a repulsive part.1 The electronic part is described by a Hamiltonian, which is derived from
DFT. SCC-DFTB uses as the reference density the superposition of neutral atomic densities
together with a minimal basis of valence atomic wave functions. By applying the variational
principle, the electronic energy can be determined by solving a generalized eigenvalue
problem set up by the Hamiltonian and the overlap SCC-DFTB matrices. It is important to
mention that SCC-DFTB uses the frozen-core approximation to reduce the computational
effort. As a result, SCC-DFTB considers only the valence atomic orbitals (AOs) in the
calculation. The remaining part of energy: the core-core repulsion, nuclear-nuclear repulsion
and other energy contributions not included in the electronic part are accounted for in the
repulsive part.(13) The repulsive energy is approximated in SCC-DFTB as a sum of
short-range pair potentials between all atomic pairs.
In the SCC-DFTB calculations, the two-center Hamiltonian and overlap integrals are
27
are expressed as a collection of spline functions for each elements pair.(14) Looking up the
integrals in the pre-computed tables makes the SCC-DFTB method fast and enables one to
apply it to huge chemical systems.
In SCC-DFTB, the repulsive potential is represented on a grid defined by a set of
interatomic distances {r }; i I = [0, 0 r ), 1 I1 = [r , 1 r2)…, I = [n rn,rn1), and In1 =
[rn1, r).
For the first interval I , the repulsive potential has an exponential form 0
) exp( ) ( 0 r r S (6.1) For other intervals, the repulsive potentials is represented by a cubic spline
3 , 0 ( ) ( )k i k i i k S r a r r
(6.2) In the original parameterization procedure, the repulsive potentials are constructed using
as a reference set of data the difference between the electronic SCC-DFTB energy and the
DFT energy. The free parameters defining the spline functions are determined from a linear
regression fit. For some chemical systems, accurate energies and vibrational frequencies
cannot be obtained with the existing SCC-DFTB parameters. This problem may relate to the
fact that the linear regression fitting is not a good way to solve those nonlinear equations. In
addition, insufficient number of molecules were used in the parameterization process.
Therefore, we are planning to employ another method in the parameterization process,
28
by iterative improvement to a candidate solution. First, we guess many random candidate
solutions and construct the new repulsive potentials for each such a candidate solution.
Second, we use each of the new repulsive potentials to generate a new potential energy
surface. By comparing with the reference potential energy surface, we can compute the error
for each candidate solution. This allows us for assessing the quality of each candidate. If the
new candidate solution is characterized by better fitness than any of the previously formed
candidate solutions, it becomes a new PSO minimum used for giving the after PSO particles.
After several cycles of optimization, we can obtain a solution with small error, which is
usually a good approximation to the real global minimum. The accuracy of the reference
potential energy surfaces is important here, especially in the PSO method, thus the high
accurate quantum chemical calculations are required, e.g. CCSD(T).
The CCSD(T) calculations are very expensive and slow. In addition, we cannot predict
how many CCSD(T) energies are required to obtain a good solution in the PSO optimization.
The CCSD(T) calculations can be regarded as the bottleneck of the PSO procedure since they
seriously reduce the speed of the parameterization process. Therefore, our potential energy
29
Reference
1. M. Elstner et al., Physical Review B 58, 7260 (1998).
2. T. Frauenheim et al., Physica Status Solidi (B) 217, 41 (2000).
3. R. Dronskowski, Computational chemistry of solid state materials: a guide for materials scientists, chemists, physicists and others. (Wiley-VCH, 2005). 4. P. E. A. Turchi, A. Gonis, L. Colombo, Tight-binding approach to computational
materials science: symposium held December 1-3, 1997, Boston, Massachusetts, U.S.A. (Materials Research Society, 1998), pp. 131.
5. E. Małolepsza, H. A. Witek, K. Morokuma, Chemical Physics Letters 412, 237 (2005).
6. F. Jensen, Introduction to computational chemistry. (John Wiley & Sons, 2007), pp. 22.
7. S. L. Salas, E. Hille, G. J. Etgen, Salas and Hille's calculus: one and several variables. (Wiley, 1999).
8. H.-J. Werner et al. (Cardiff, UK, 2009).
9. C. J. Cramer, Essentials of computational chemistry: theories and models. (J. Wiley, 2002).
10. A. Szabó, N. S. Ostlund, Modern quantum chemistry: introduction to advanced electronic structure theory. (Dover Publications, 1996).
11. G. W. T. M. J. Frisch, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino, G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M.
30
Cossi, N. Rega, J. M. Millam, M. Klene, J. E. Knox, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, R. L. Martin, K. Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dannenberg, S. Dapprich, A. D. Daniels, Ö . Farkas, J. B. Foresman, J. V. Ortiz, J. Cioslowski, and D. J. Fox.
12. E. Anderson et al., LAPACK Users' Guide. (Society for Industrial and Applied Mathematics, 1999).
13. W. M. C. Foulkes, R. Haydock, Physical Review B 39, 12520 (1989).
14. M. Gaus, C.-P. Chou, H. Witek, M. Elstner, The Journal of Physical Chemistry A 113, 11866 (2009).
15. J. Kennedy and R. C. Eberhart, Proceedings of IEEE International Conference on Neural Networks, 1942 (1995).
31
Appendix A
I. Details of the Fitting Procedure
This section shows various information concerning the fitting procedure, which include: (1) the number of internal coordinates (IC) used for construction of the potential energy
function
(2) the number of original parameters in the potential energy function (3) the number of parameters* after the symmetry adaptation process (4) the number of data points used in the fitting work
(5) the RMS error in the linear-least square fitting process
1. Hydrogen Group
Formula IC Parameters Parameters* Data points RMS error
H2 1 5 5 11 3.599E-7
2. Lithium Group
Formula IC Parameters Parameters* Data points RMS error
Li2 1 5 5 11 1.094E-7 HLi 1 5 5 11 3.784E-7 CLi4 10 1000 83 10000 7.412E-6 Li3N 6 210 47 10000 6.649E-6 Li2O 3 35 22 1331 3.871E-6 FLi 1 5 5 11 2.712E-6 Li2S 3 35 22 1331 3.148E-6
32
ClLi 1 5 5 11 2.489E-6
3. Beryllium Group
Formula IC Parameters Parameters* Data points RMS error
BeH2 3 35 22 1331 1.082E-6
Be=O 1 5 5 11 4.249E-6
BeF2 3 35 22 1331 6.720E-6
BeCl2 3 35 22 1331 6.440E-6
4. Boron Group
Formula IC Parameters Parameters* Data points RMS error
BH3 6 210 47 10890 1.780E-6
BF3 6 210 47 10000 5.057E-5
BCl3 6 210 47 10000 1.129E-5
5.
Carbon Group
Formula IC Parameters Parameters* Data points RMS error
CH4 10 1001 83 10000 6.089E-6 CH3-CH3 28 35960 3361 11000 1.717E-5 CH2=CH2 15 3876 1056 10000 3.601E-6 CHCH 6 210 120 10000 4.756E-6 CH2=C=CH2 21 12650 2379 10000 3.085E-5 C3H6 (Cyclopropane) 36 91390 8986 10000 5.747E-6 CH3-NH2 21 12650 6490 10000 2.282E-6
33 CH2=NH 10 1001 1001 10000 4.584E-6 CHN 3 35 35 1331 7.348E-6 NH=C=NH 10 1001 525 10000 7.296E-6 C=O 1 5 5 11 8.229E-6 CO2 3 35 22 1331 1.219E-5 CH2=O 6 210 120 10000 5.946E-6 CH3-OH 15 3876 2180 12000 3.591E-6 CH3-O-CH3 27 31459 8439 12000 1.136E-5 CH3COOH 23 17549 10235 11000 8.699E-7 C2H4O (Epoxide) 21 12650 3330 10000 2.344E-6 CF4 10 1001 83 10000 9.218E-5 CCl4 10 1001 83 10000 5.104E-5
6. Nitrogen Group
Formula IC Parameters Parameters* Data points RMS error
N2 1 5 5 11 8.949E-6 NH3 6 210 47 10500 3.262E-6 N2H2 (Z form) 6 210 120 10000 6.038E-6 N2H2 (E form) 6 210 120 10000 6.891E-6 NH2-NH2 15 3876 1996 10000 5.668E-6 NH2-OH 10 1001 561 10000 5.992E-6 NH=O 3 35 35 1331 8.276E-6
34
HNO3 10 1001 1001 10000 1.824E-5
F3N 6 210 47 10000 2.717E-5
Cl3N 6 210 47 10000 2.226E-5
7. Oxygen Group
Formula IC Parameters Parameters* Data points RMS error
O2 1 5 5 11 8.484E-6 O3 3 35 22 1331 4.051E-6 H2O 3 35 22 1331 2.648E-6 H2O2 6 210 120 10000 5.404E-6 F2O 3 35 22 1331 1.368E-5 Cl2O 3 35 22 1331 6.372E-6
8. Fluorine Group
Formula IC Parameters Parameters* Data points RMS error
F2 1 5 5 11 4.520E-6 HF 1 5 5 11 2.103E-6 FNa 1 5 5 11 3.791E-6 F2Mg 3 35 22 1331 9.144E-6 AlF3 6 210 47 10000 1.096E-5 F4Si 10 1001 83 10000 1.563E-5 F3P 6 210 47 10000 2.305E-5 F2S 3 35 22 1331 1.486E-5
35
9. Sodium Group
Formula IC Parameters Parameters* Data points RMS error
HNa 1 5 5 11 6.300E-7
Na2S 3 35 22 1331 4.405E-6
NaCl 1 5 5 11 3.798E-6
10. Magnesium Group
Formula IC Parameters Parameters* Data points RMS error
H2Mg 3 35 22 1331 1.088E-6
Cl2Mg 3 35 22 1331 7.514E-6
11. Aluminum Group
Formula IC Parameters Parameters* Data points RMS error
AlH3 10 210 47 10000 2.994E-6
AlCl3 10 210 47 10000 9.676E-4
12. Silicon Group
Formula IC Parameters Parameters* Data points RMS error
H4Si 10 1001 83 10000 4.701E-6
Cl4Si 10 1001 83 10000 1.439E-5
13. Phosphorus Group
Formula IC Parameters Parameters* Data points RMS error
36
Cl3P 6 210 47 10000 1.929E-5
14. Sulfur Group
Formula IC Parameters Parameters* Data points RMS error
H2S 3 35 22 1331 2.825E-6
Cl2S 3 35 22 1331 1.411E-5
15. Chlorine Group
Formula IC Parameters Parameters* Data points RMS error
Cl2 1 5 5 11 7.902E-6
II. The Form of Potential Energy Function
This section, shows the complete form of the potential energy function and parameters
used for the following molecules: H2, H2O, and CH2O. The potential energy functions are
expressed in a form of the 4th-order Taylor function, and the parameters are determined from
the linear-least square technique. Detailed discussion of these three molecules helps
understanding the form of our potential energy functions.
1. H
2Cartesian coordinates (a.u.):
Atoms X Y Z 1H H1x H1y H1z
2H H2x H2y H2z
37
Internal coordinates
: 2 2 2 1 2 1 2 1 2 ( x x) ( y y) ( z z) r H H H H H HConstants:
1.403412 a.u. eq r Potential energy function:
2 3 4 ( ) (1) (2) ( - eq) (3) ( - eq) (4) ( - eq) (5) ( - eq) E Hartrees P P r r P r r P r r P r r
Parameters:
(1) -1.172337 (2) 0.000038 (3) 0.185324 (4) -0.216571 (5) 0.181181 P P P P P 2. H
2O
Cartesian coordinates (a.u.):
Atoms X Y Z 1O O1x O1y O1z 2H H2x H2y H2z 3H H3x H3y H3z
Internal coordinates
: 2 2 2 1 1 2 1 2 1 2 2 2 2 2 1 3 1 3 1 3 2 1 3 1 2 1 3 1 2 1 3 1 1 1 2 ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( )( ) ( )( ) cos x x y y z z x x y y z z x x x x y y y y z z z z r O H O H O H r O H O H O H H O H O H O H O H O H O a r r Constants:
eq 1.403412 a.u. a 1.807874 a.u. eq r r2 r1 a Equilibriun energy 1st-order stretch 2nd-order stretch 3rd-order stretch 4th-order stretch38
Potential energy function:
2 1 2 2 2 2 1 2 1 1 2 3 3 2 1 2 1 1 ( ) (1) (2) ( ) (3) ( ) (4) ( ) (5) ( ) (6) ( )( ) (7) ( )( ) (8) ( ) (9) ( ) (10) ( ) ( ) ( i eq eq i i eq eq eq eq i eq eq i i i eq eq eq eq i E Hartree P P r r P a a P r r P a a P r r r r P r r a a P r r P a a P r r r r r
2 2 2 2 2 2 1 1 2 4 4 1 2 1 3 2 3 1 2 1 )( ) (11) ( ) ( ) (12) ( )( ) (13) ( )( )( ) (14) ( ) (15) ( ) ( ) ( ) ( P(16) + )( ) +P( eq eq i eq eq i eq eq i i eq eq eq i eq eq i eq eq eq eq r r r P r r a a P r r a a P r r r r a a P r r P a a r r r r r r r r
2 3 1 2 2 3 2 2 1 1 2 2 2 2 1 2 1 2 1 2 1 17) + P(18) P(19) P(20) P(21) P(22 ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )( ) i eq eq i i eq eq i eq eq i i eq eq eq eq eq eq eq eq eq r r a a r r a a r r a a r r r r r r r r a a r r r r a a r r r
2 2 req)(a aeq) Parameters:
3. CH
2O
Cartesian coordinates (a.u.):
Atoms X Y Z 1C C1x C1y C1z 2O O2x O2y O2z 3H H3x H3y H3z 4H H4x H4y H4z P(12)=-0.018773 P(13)=-0.033802 P(14)=0.280525 P(15)=-0.005940 P(16)=-0.002280 P(17)=-0.008572 P(18)=0.014224 P(19)=-0.004180 P(20)=0.000815 P(21)=0.004672 P(22)=0.018809 P(1)=-76.332216 P(2)=0.000099 P(3)=0.000009 P(4)=0.272209 P(5)=0.083040 P(6)=-0.00619 P(7)=0.033186 P(8)=-0.340899 P(9)=-0.026569 P(10)=-0.000822 P(11)=-0.003748 Equilibriun energy 1st-order stretch 1st-order bend 2nd-order stretch 2nd-order bend
2nd-order stretch/stretch coupling 2nd-order stretch/bend coupling 3rd-order stretch
3rd-order bend
3rd-order stretch/stretch coupling 3rd-order stretch/bend coupling
3rd-order stretch/bend coupling 3rd-order stretch/bend coupling 4th-order stretch
4th-order bend
4th-order stretch/stretch coupling 4th-order stretch/bend coupling 4th-order stretch/bend coupling 4th-order stretch/bend coupling 4th-order stretch/stretch coupling 4th-order stretch/bend coupling 4th-order stretch/bend coupling
a
b1 b2
c1
c2