This study uses a computer simulation code, which is developed by Kuo-Hsien Hsu, MUST, NCTU. The simulation code is three-dimensional, finite-element, nonlinear Poisson Boltzmann. General flowchart of the simulation is shown in Fig.
2.1.
2-1 The Poisson-Boltzmann Equation for the EDL Potential Ψ
Consider a liquid phase containing positive and negative ions in contact with a planar negatively charged surface. An EDL field will be established. According to the theory of electrostatics, the relationship between the electrical potentialΨ and the net charge density per unit volume ρe at any point in the solution is described by the three-dimensional Poisson equation.
0
ρe is the charge density, ε is the dimensionless dielectric constant of the solution, and εo is permittivity of vacuum. For any fluid consisting of two kinds of two kinds of ions equal and opposite charge (z+ and z-), the number of ions of each type is given by the Boltzmann equation.
) absolute temperature. The Boltzmann distribution is applicable only when the system is in the thermodynamic equilibrium state.
Then the net charge density in a unit volume of the fluid is given by:
Substituting Eq. [2-3] in Eq. [2-1], we can get a nonlinear second-order three-dimensional Poisson-Boltzmann equation:
Nondimensionalizing the above equation via
T
We obtain a dimensionless form of Eq. [2-4] that can be written as
( )
ψ characteristic of the EDL.2-2 Discretization Using Finite Element Method
The FEM is the computer-aid mathematical technique for obtaining approximate numerical solution to the abstract of calculus that predicts the response of physical system subjected to the external influences.
Such problems arise in many areas of engineering, science, and applied mathematics. Applications to date have occurred principally in the areas of solid mechanics, heat transfer, fluid mechanics, and electromagnetism.
There are many salient features in FEM:
1. The domain is divided into smaller regions called elements. Adjacent elements touch without overlapping, and there are no gaps between the elements. The shapes of the elements are intentionally made as simple as possible.
2. In each element the governing equations, usually in differential or variation (integral) form, are transformed into algebraic equation. The element equations are algebraically identical for all elements of the same type, which usually need to be derived for only one or two typical elements.
3. The resulting numbers are assembled (combined) into a much larger set of algebraic equations called the system equations. Such huge systems of equations can be solved economically because the matrix of coefficients is “spares”.
4. Solved the matrix problem.
FEM seeks an approximate solutionU~
, an explicit expression for U, in terms of known functions, which approximately satisfies the governing equations and boundary conditions. It obtains an approximate solution by using the classical trial-solution procedure. Normally, it is not possible to obtain strong solutions for the problem at hand. Instead one usually decretive the otherwise continuous problem and obtain so-called weak solutions; weak in the sense of approximate.
Now we star to derive the 3D Nonlinear Poisson solver formulation via FEM method.
Where x, y, z are the independent variables in the problems. The functions
(
x y z)
Nj , , are known functions called trial functions (basis).
The purpose is to determine specific numerical values for each of the parameters a. We use the Galerkin method. For each parameter aj we require that a weighted average of R (x, y, z; a) over the entire domain be zero. The weighting functions are trial functions N
(
x,y,z)
associated with each aj.Step2: Galerkin residual method
( )
ΩThe Galerkin residual method is a weighting residual method, which uses the trial function as the weighting function.N is not only the weighting function, but also the i shape function.
Where the integral
∫
Ω d means integrate over the volume of element. We use Ω tetrahedral element, which have four degree of freedom (DOF).The tetrahedral element’s shape function can desired as:
( )
1~4Step3: Integrated by part
Take Eq. [2-7] to Eq. [2-8] and integrated by part
∫∫∫
Step4: Gauss divergence theorem
Use Gauss divergence theorem to Eq. [2-10].
∫∫∫
Take trial function [2-7] into Eq. [2-11] then we can get Eq. [2-12]
∫∫∫
And Eq. [2-12] can be rewrote as follows,
dxdy
Then, take differential
( )
Finally we can get the matrix form of the Poisson-Boltzmann Equation.
{ } { }
a F e n Kije ] je je :1~[ () () = ( ) [2-20]
The matrix problem of the Galerkin method is not simple to take numerically. To obtain an accurate solution it is often necessary to employ a fine mesh containing many elements. The N×N matrix of Eq. [2-20] thus becomes very large. This sparsity is utilized fully when implementing good computer codes for the finite element method. The sparsity leads to a significant reduction in memory requirements since only the non-zero matrix elements need to be stored together with an index of where they are stored. Moreover, the sparsity implies a huge reduction in the number of arithmetic operations needed to solve the problem. For a full matrix problem this number is proportional to N3 for standard Gauss elimination, but by using
direct-banded matrix schemes, or iterative methods like conjugate gradient methods or multi-grid methods, it can be reduced to becoming proportional to N2 or even
N .
2-3 Force Calculations
From the potential distribution obtained by solving the Poisson-Boltzmann equation, the electrostatic force on the spherical particles is calculated by integrating the total stress tensor, defined as
2 ] [EE 1E EI
Tij =∆∏+ε − ⋅ [2-21]
Where E =−∇ψ is the electric field vector; I represents the identity tensor, and
∏
∆ is the osmotic pressure difference between the electrolyte at the particle surface and bulk solution, ∆ can defined as ∏
∆∏=2nbkT
(
coshψ −1)
[2-22]Where n is the bulk ion number density of the electricity neutral electrolyte. b We took integral the stress tensor over the surface of a particle, defined as =
∫
⋅S
ij ndS
T
F [2-23]
Here Fis the force acting on the spherical particle. We note here that the stress tensor is calculated from the potential but the force on the sphere also includes a contribution from the osmotic pressure generated by the electric field; however, pressure variation due to the field at the sphere surface is normally canceled by electrical effects so that we can assume the ∆ term is zero. Where n is the unit ∏ outward surface normal, and the subscript Srepresents integration over the closed surface of the particle. This integration can be performed either on the sphere surface
or on the midplane for the case of two identical spherical particles.
2-4 Conjugate Gradient Method for Linear Algebra Equation
In the linear system Ax = b suppose the coefficient matrix A is symmetric and positive definite. Solving Ax = b is equivalent to minimizing the quadratic functionQ
( )
x = xTAx−xTb2
1 . The conjugate gradient method is based on the idea
that the convergence to the solution could be accelerates if we minimize Q over just the line that points down gradient. To determine xi+1 we minimize Q over
)
where the p represent previous search directions. An added advantage to this k approach is that, if we can select thep to be linearly independent, then the dimension k of the hyperplane will grow one dimension with each iteration of the conjugate gradient method. This would imply that (assuming infinite precision arithmetic) the solution of the linear system Ax=b would be obtained in no more than N steps, where N is the number of unknowns in the system.
Let x0 be an initial estimate to the solution ofAx=b. For our first search direction we proceed down a Q-gradient and choose
0
From the discussion of the method of steepest descent, we have
0
To understand what follows in the description of the conjugate gradient method, it is
important to note that
Rather than try to establish the above orthogonality relation with a calculation, use the following calculus argument. By definition, r is the gradient of Q at1 x1, where x1 is the conjugate gradient estimate to follow the initial guessx0. Also,
0
0 p
r = is the search direction along the linex0 +α0p0. Calculus tells us that the gradient of Q at x1must be orthogonal to the search direction.
Not as a proof of the last statement about calculus and orthogonality, but to motivate the statement, consider the layers of an onion to be surfaces of Q = constant.
Imagine piercing the onion with a skewer. In general, the skewer will pass through several outer layers of the onion, tangentially touch one of the inner layers, and again pass through the other layers and then exit. The innermost layer that the skewer touches tangentially is given by
( )
x1 x1Q =
∇ [2-27]
and the direction of the skewer is r0 = p0.
The conjugate gradient method calls for defining successive approximates by
i
1. We want the span of the search directions to fill the space we are searching as the number of iterations increases.
2. Searching down Q-gradients was basically a good idea. But, to guarantee linearly independent successive search directions, we generally need to choose conjugate
gradient search directions to be perturbations of steepest descent search directions.
We have already how to define p and0 α , so 0 x1and r and 1 r1 =b−Ax1 can be considered known. To take the next step using the conjugate gradient method, we must determine values for α and 0 β so that we can calculate 1 p and1 x2. Then we will see a pattern emerge.
In taking this next step in the conjugate gradient method we are seeking to minimize Q over the plane,
which is zero provided
1 0
0
Having decided to proceed from x1to x2 along the search direction defined byp1 =r1+β0p0, the same calculus argument used to determine gives
So a step of the conjugate gradient method is complete.
2-5 Monotone Iterative Methods
The method of monotone iterations is a classical tool for the study of the existence of the solution of nonlinear PDE of certain types. It is also useful for the numerical solution of nonlinear types of problems approximated, for instance, by the finite difference, finite element or boundary element method. It is a constructive method that depends essentially on only one parameter, called the monotone parameter herein, which determines the convergence behavior of the iterative process.
Based on adaptive 1-irregular finite element meshes, we extend this classical method to device simulation the Poisson Boltzmann equation [9]. Compared with the NI method, the MI method requires no Jacobian matrix and does not encounter any convergence problems and numerical difficulties. Furthermore, the MI algorithm is practical, easy in implementation, and inherently parallel for large-scale 3D simulation.
Consider an example−∇u=eu, after discretization using Finite Element Method.
We can get Eq. [2-33]
We set a parameterλ , which define as follow:
)
'(U Fi
i =−
λ [2-34]
then set the monotone iterative form, which show as }
where variable with subscript n represents one value obtained on previous iteration.
2-6 Parallel Implementation of the P-B Equation Solver
2-6.1 Introduction to Parallel Distributed Computing
A parallel computer, as defined by Almasi and Gottlieb [10], is a collection of processing element that cooperate and communicate to solve large problems fast.
Parallel computers can be viewed as a collection of processors and memory units which are connected by an interconnection network. The purpose of parallel computing is to get work done in less time and also to solve problems which can not fit in a single computers memory.
There are two main architectural paradigms associated with parallel computing:
Distributed memory and Shared memory [11]. In Distributed Memory architecture parallel computers, memory is physically distributed among processors; each local memory is directly accessible only by its processor. Synchronization is achieved by moving data between processors by a fast interconnection network, see Fig2.2
In Shared memory architecture parallel computer the same memory is accessible by multiple processors. All processors associated with the program access the same storage as shown in Fig2.3.
In our laboratory, we had setup PC Clusters which are use distributed memory system. The advantage of this kind of architecture is that it is scalable to a large number of commodity processors. A major concern with this scheme is data decomposition; the data being operated should be divided equally to balance the load,
and also should be done in a efficient manner in order to minimize communication.
The scalability depends on the type of interconnection between the processors.
2-6.2 Parallel Implementation
In order to parallelize the Poisson-Boltzmann solver, we use Message Passing Interface (MPI) to be the basic toll. MPI is a library specification for message-passing, proposed as a standard by broadly based committee of vendors, implementers, and users. MPI library of a set of standard subroutine calls which allow parallel programs to be written in distributed memory system. We use Metis to distribute the cells and nodes. Running the solver to test and verity on PC Cluster belongs to our MuST laboratory. In distributed parallel finite element analysis, the domain considered is first decomposed into a number of subdomains. Then in general, one processor is assigned for each subdomain, though other variations are also possible. Computations are then performed on each processor on the local subdomain and communication takes place with the other processors whenever needed. The domain might be divided into a set of overlapping subdomains in which case Overlapping Schwartz methods are used for the solution of system. If the domain is partition into a set of non-overlapping subdomains then the methods used are called iterative substructuring methods. The two basic types of non-overlapping domain decomposition methods used in the finite element methods are the subdomain-by-subdomain and Schur complement method [11]. The first approach is based on multi-element group partitioning of the entire domain. The global stiffness matrix is stored as a partitioned matrix and the dominant matrix-vector operations of the stiffness matrix and the precondition solutions in the iterative algorithm are performed on the subdomain-by-subdomain basis. In the second approach the iterative algorithm is applied to the interface problem after first eliminating the internal degrees of freedom.
In our parallel system, we use the non-overlapping subdomains.
Fig 2.4 illustrates the flow chart of parallel Poisson-Boltzmann solver (PPB).
Detail is described step by step in the followings.
1. Setup initial grids and input data for every processor.
2. Initialize MPI and synchronize all processors to prepare for parallel computation.
3. Compute the coefficients of shape functions.
4. Setup initial and boundary conditions
5. Compute each processor’s coefficient matrix and communications among processor.
6. Compute each processor’s loading matrix and communications among processor.
7. Solve each processor’s matrix and communications among processors
8. To determine results of potential converge or not, if converge go to the next step, otherwise update potential and return to step 5.
9. After gathers all potential data, calculate the electric field.
10. Host gathers all data and output it.
2-7 Parallel Adaptive Mesh Refinement
In this section, the parallel mesh refinement module would be introduced and the algorithm would be outlined. And in order to understand easily, two-dimensional mesh would be used to explain additionally. Finally the parallel Poisson-Boltzmann solver would couple with parallel mesh refinement module to test and verify. The detail of parallel refinement can refer to [12].
2-7.1 The Basic Algorithm of Parallel Adaptive Mesh Refinement
In the adaptive mesh refinement module, the data to record new added nodes is based on cell. In other words, every cell would only know new added nodes that belong to it. The number of added nodes is fixed in each process. However, before to divide cells, the number of added nodes must be unique in all processors.
In this module, it needs a neighbor identifying arrays. This array defines the interfacial cell for each face. It would record the global cell number of interfacial cells for each face of the cell.
At each mesh refinement step, individual edges are marked for refinement, or no change, based on an error indicator calculated from the solution, for example, the electric field. These cells which need to refine would add new nodes on each edge.
They are called isotropic cells. For two-dimensional mesh, a parent cell [13] is divided to form four child cells. For three-dimensional mesh and tetrahedral cell, the parent cell is divided to form eight child cells, as show in Fig2.5.
When isotropic cells add nodes on their edges, the cells neighbor them would appear one to three handing nodes at the same time. The cells which have hanging nodes but not belong to isotropic cells are called anisotropic cells. In order to remove hanging nodes in anisotropic cells, it has some procedures to do. Using two-dimensional mesh as an example is shown in Fig2.6. When hanging nodes appear in a triangular cell, the cell must be divided into different way. However, for three-dimensional mesh, the division is more complex. The division would consider the number and position of the hanging nodes, to decide whether to add new nodes to obey the refinement rules. There are three results. One is to divide into eight child cells. Another is to divide into four child cells. The other is to divide into two child cells. The detail refinement rules are be show in Fig2.7.
2-7.2 Cell Quality Controls
A problem associated with repeated adaptive mesh refinement (AMR) operations.
Then, most mesh smoothing schemes tend to change the structure of a given mesh to achieve the “smoothing effect” (a better aspect ratio) by rearranging nodes in the mesh. The change made by a smoothing scheme, however, could modify the desired distribution of element density produced by AMR procedure, and the cost of performing a global mesh smoothing could be high. Alternatively, it is possible to prevent, or slow down, the degradation of cell quality during a repeated adaptive refinement process. The cell quality control scheme we have applied classifies element based on how they will be refined. This allows us to avoid creating elements with poor aspect ratios during the refinement. After identifying those elements, we can refine them with a better refinement by contrast.
Detail rules are shown in Fig2.8. For example, Fig2.9 shows a typical cell (1-2-3-4). Because it is affecting by other cell, it has hanging nodes (8, 9, A). In order to handle hanging nodes, we connect nodes (8, 9, A) to node 4. But the aspect ratios of typical cells (1-4-8-A, 5-4-8-9 and 7-4-9-A) will be worse. However, if we add three nodes (B, C and D), and connect them with other nodes. Those eight child typical cells (4-B-C-D, 1-8-A-B, 5-8-9-D, 7-9-A-C, 8-9-A-B, 8-9-B-D, 9-A-B-C and 9-B-C-D) would have better aspect ratios.
However in some cases, previous developed cell-quality-control would not be effective that is shown in Fig2.10. For example, as showed in Fig2.11. At this situation, after the last refinement there are four child cells. But at this refinement, there are three hanging nodes appear on edges (8, B, D). While cells are divided, it will be divided to four child cells. And these four child cells will not get worse aspect ratios. And, it would get better aspect ratios. So cell-quality-control will allow this
division and not to affect it.
If the boundaries of the computational domain are not straight, it is not sufficient to place the new node in the midway of the edge of the face of the parent cell. If this is done, it would dual to a rough piecewise representation of the original geometry results. What must be done is to move the new node location onto the real boundary contour surface. In the current implementation, it is assumed that the boundaries can
If the boundaries of the computational domain are not straight, it is not sufficient to place the new node in the midway of the edge of the face of the parent cell. If this is done, it would dual to a rough piecewise representation of the original geometry results. What must be done is to move the new node location onto the real boundary contour surface. In the current implementation, it is assumed that the boundaries can