• 沒有找到結果。

Chapter 1. Introduction

1.4 Organization of the thesis

The chapters of this thesis are organized as follows.

Chapter 2 details a parallel three-dimensional electrostatic field solver formulated via Galerkin finite element method based on the subdomain-by-subdomain non-overlapping domain decomposition method. After finite element assembling procedure, the resulting matrix is stored in compressed sparse row format and is solved using either the parallel conjugate gradient method or a direct matrix solver, MUMPS. A parallel adaptive mesh refinement module (PAMR) is also coupled into the developed electrostatic field solver for obtaining better solution, especially, when there is a large variation of potential in the computational domain. Some benchmark problems are presented for demonstrating the accuracy and applicability of the electrostatic field solver. In the end of this chapter, the parallel performances of the Poisson’s equation solver are studied and their time breakdown is analyzed in detail.

Chapter 3 details a parallel three-dimensional magnetostatic field solver. This solver is developed and paralledized follows the techniques presented in chapter 2.

Also, PAMR is coupled into the developed magnetostatic field solver. Some

benchmark problems are presented for demonstrating the accuracy and applicability of the parallel magnetostatic field solver. In the end of this chapter, the parallel performances of these solvers are studied and their time breakdown is analyzed.

Chapter 4 presents the proposed parallel three-dimensional PIC-FEM method using an unstructured mesh. The PIC-FEM is developed follows the main principles of the conventional PIC-MCC method. In addition, the parallel implementation of PIC-FEM is done via domain decomposition method. Dynamic domain decomposition is developed for alleviating the load between the processors. Two benchmark problems are presented for demonstrating the accuracy and applicability of the parallel PIC-FEM code. In the end of this chapter, the parallel performance of the PIC-FEM code using DDD is studied and its time breakdown is analyzed in detail.

In chapter 5, the proposed parallel three-dimensional PIC-FEM code is used to simulate three different realistic problems. They are: three-dimensional field emission display (FED), three-dimensional DC/RF gas discharge plasma, and three-dimensional DC/RF magnetron plasma. Most of the simulated results are then compared with the previous studied available in the literature. Finally, the conclusions of this work and some possible directions for the future research work are presented in the chapter 6 in turn.

Chapter 2

The Parallel Computing of Finite Element Method for Three-Dimensional Electrostatic Field Problems

This chapter begins with the introduction to background of computational electromagnetic. In solving electrostatic problems, only the finite element Galerkin weighted residual method (GWRM) is chosen and introduced for using either a tetrahedral or a hexahedral mesh. Globe coordinate and local coordinate shape functions are used for tetrahedral and hexahedral meshes, respectively. Before the parallel computing of FEM, the computational domain is firsts decomposed into a number of non-overlapping sub-domains using multi-level graph-partitioning library, METIS. Then, some preprocessing procedure is used in converting the output data from graph-partitioning tool into the input data for field solvers. The second step is that the Poisson’s equation is formulated via GWRM using subdomain-by-subdomain method (SBS). After applying the assembling procedure of FEM, only the non-zero entries of the system of equation are stored in compressed sparse row format and solve the matrix using either parallel conjugate gradient method or direct matrix solver, MUMPS. The parallel adaptive mesh refinement module is then coupled with the parallel Poisson’s equation solver in order to improve the resolution of solution

near where the property gradient is large. Some benchmark problems are presented for demonstrating the accuracy and applicability of the Poisson’s equation solver. In the end of this chapter, the parallel performance of the Poisson’s equation solver is studied and its time breakdown is analyzed in detail.

2.1 Background of Computational Electromagnetic

For the whole electromagnetic theory, it could be regarded as the logical deduction from the Maxwell’s equations, which were first formulated by James Clerk Maxwell in 1873 [Cheng, 1995]. They are:

is the electric field intensity, Br

is the magnetic flux density, Jr

is the current density, εis the dielectric permittivity of the medium, µ is the dielectric permeability of the medium, c is the speed of light and ρis the volume charge density.

In this thesis, we only consider the electrostatic (ES) and magnetostatic (MS) field problems, hence, the Maxwell’s equations can be simplified to the scalar and vector Poisson’s equations for ES and MS fields, respectively. They are:

ε φ =−ρ

2 (2-5) J

Ar r

=

2 1

µ (2-6) where φ is electric potential and Ar

is the magnetic vector potential. From these two equations, it is clear that the electrostatic fields are contributed from the static electric charges and the magnetostatic fields are due to motion of electric charges with uniform velocity (direct current) or external magnets. The details in solving the magnetostatic field problems are presented in the next chapter.

In the past years, Maxwell’s equations have long been studied in dealing with electromagnetic problems. Two different approaches for solving Maxwell’s equations are analytic and numerical approached. For the simple structured device, there are many analytic solutions available which could easily derived from the series expansions, separation of variables, Bessel and Legandre polynomials, Laplace transformations, and the like [Umashankar, 1993]. However, there is almost no solution available when one consider a device with a complicated structure that involve a number of conductors, dielectric, permanent magnets, and semiconductors of arbitrary shapes, etc. Thus, an extremely accurate numerical method for solving Poisson’s equation is required to model these complicated structures.

Fortunately, with the developments in numerical techniques in the past two decades, nowadays it is possible to solve large-scale electromagnetic problems

numerically within reasonable time limits. The numerical methods can be generally divided into three separate groups, which are integral method, differential method, and variational method. For the integral method, it is based on the basis of the superposition principle and one can integrate such effects at any point obtaining the potential at that point. The well-known integral method is the boundary element method [Kythe, 1995], which is particularly suitable for problems with material homogeneity. Finite difference method [Sullivan, 2000] is the most popular among the differential methods. For this method, the computational domain is usually divided into structured mesh and the values of a scalar potential field are sought at all grid points. However, this method usually suffers from many problems when considering a complicated structured case, especially in generating the structured mesh on object with arbitrary geometrical shape, imposing the boundary conditions, interface approximation of muti-material region. For the variational method, it is based on the differential or integral form of the field equations to be solved [Zeidler, 1985]. The well-known variational method is the finite element method, which is widely used in many fields, e.g., [Beltzer, 1990], [Winslow, 1967], [Silvester and Pelosi, 1994], and [Jin, 2002], which also include the electromagnetic problems [Winslow, 1967].

2.2 Finite Element Method (FEM) 2.2.1 Background

The FEM is the numerical technique for obtaining approximate numerical solution of the partial differential equations (PDEs) arising from any physical system subjected to its boundary conditions. For FEM, the computational domain is first divided into smaller non-overlapping regions called elements (cells), and a trial function is postulated over each of the elements. For example, the trial solution with

global coordinates for a three-dimensional tetrahedral mesh is:

(

x y z a

)

a a N

(

x y z

)

a N

(

x y z

)

a N

(

x y z

)

U~ , , ; , , , , n n , ,

2 2 1

1

0 + + + +

= L (2-7)

Where x, y, z are the independent variables in the problems. The functions N

(

x,y,z

)

are known functions called trial functions. The coefficients, a , are undetermined i parameters and n is called degree of freedom (DOF). After formulating the PDEs using Galerkin weighted residual method with the trial solution, the element equations are obtained for a typical element. These element equations can then be used for other the elements over the domain as shown in Fig. 2.1. Then, larger sets of algebraic equations, which are called system equations, are formed after assembling these element equations. Moreover, the matrix structure of such huge system equations is sparse in essence; the matrix may be solved efficiently only storing the non-zero entries.

2.2.2 The Galerkin Weighted Residual Method (GWRM)

In FEM, we employ Galerkin weighted residual method (GWRM) with the three-dimensional C0-linear shape-function for a tetrahedral mesh and for a hexahedral mesh. In GWRM, a weighted average of residual over the entire domain should be zero and the trial function is the element shape function associated with each ai. The representations of GWRM with three-dimensional C0-linear global coordinate shape function and local coordinate shape function are given in Eqs. (2-8a)

and Eqs. (2-8a), respectively.

(

, , ;

) (

, ,

)

=0

∫∫∫

R x y z a Ni x y z dxdydz (2-8a)

(

, , ;

) (

, ,

)

=0

∫∫∫

Rξ η ρ a Ni ξ η ρ dxdydz (2-8b) where R(x,y,z;a) or R

(

ξ,η,ρ;a

)

is the residual of the governing equation.

2.3 Calculation of Electrostatic (ES) Field

Since the concept of GWRM is introduced, this section begins with derivation of the element equation of Poisson’s equation for typical ES fields (as shown in Eqs.

(2-5)). A trial solution is first constructed to approximate the exact electric potential,φ

( )

~

( )

;α

r U

r , which can be written with shape function using global coordinate for a tetrahedral mesh,

( )

( )

The local coordinate shape for a hexahedral mesh will be presented later. The major

steps for formulating Eqs.(2-5) using GWRM with a three-dimensional tetrahedral mesh in global coordinates are described in detail as follows.

Step 1: Derive the typical element equation of Eqs. (2-5) using GWRM.

4

Step 2: Integrate by parts.

x

Eqs. (2-12) can be reformulated using divergence theorem, it may be written,

where τrn(e) is the outward-normal component of the flux. All load terms are placed on the RHS; this includes the interior load and the boundary fluxes.

Step 3: Substitute the trial solution into element equations.

Inserting Eqs. (2-9) into Eqs. (2-13) yields,

4

Step 4: Substitute the 3-D C0-linear shape function for a tetrahedral mesh into element equation.

The LHS of Eqs. (2-14) may be reformulated,

∫∫∫

+ +

m

Where V is the volume of cell and the subscripts i, k, l, m have the values 1, 2, 3, 4 (see Fig. 2.2) for N1(e)(x,y,z) and are permuted cyclically forN2(e)(x,y,z), N3(e)(x,y,z)andN4(e)(x,y,z).

Step 5: Evaluate the interior load term and boundary flux term of Eqs .(2-14).

For coupling with Particle-In-Cell method in later chapter, here we interpolate charges from the particles to the nodes [Santi et. al., 2003], that

is:

Where the subscript k represents charged particle properties. Substituting Eqs. (2-18) into the interior load term, it becomes

Step 6: Assemble the element equations into a system equation

Using an assembly procedure, The system equation is formed,

Ki(,sj)aj =Fi(s) (2-22) After these theoretical developments, we may apply the boundary conditions to Eqs. (2-22).

For a three-dimensional hexahedral mesh with local coordinate shape function, the same steps 1-3 and 6 are repeated for Eqs.(2-5), and the step 4-5 are be modified as follows.

Step 4: Substitute the 3-D C0-linear shape function for a hexahedral mesh into element equation.

The Eqs. (2-15) is transformed into a form appropriate for numerical evaluation.

The stiffness integral may now be written as the following quadrature expressions: Gauss points at which the integral is evaluated. n is the number of Gauss points.

Here, the derivative of shape function with respect to x, y, z is handled using the chain rule of differentiation, and it yields:

[ ]

Step 5: Evaluate the interior load term and boundary flux term of Eqs .(2-24).

Again, for coupling with Particle-In-Cell method in later chapter, we

interpolate charges from the particles to the nodes, that is:

Where the subscript k represents charged particle properties. Then, ρi(e) is weightied to the Gauss point becomes ((e), , )

nm nl

nkη ρ

ρξ , and substitute this into the interior load integral term in quadrature expressions:

)

Now considering the boundary flux integral term in quadrature expressions,

)

2.4 Sparse Matrix Storage Schemes

The FEM formulations and assembly techniques typically lead to large and sparse matrices. Thus, it becomes essential to store sparse matrix in some kind of storage schemes, especially for an efficient matrix-by-vector product of the iterative method. Nearly all schemes have these two following storage components, e.g., [Golub and Van Loan, 1996], and [Saad, 2003]:

1. A one-dimensional array, which is called the primary array with the size of the

2. Two one-dimensional arrays of integer identifiers, which are called secondary arrays with the size of the number of non-zero entries, for recognizing which

entries of the matrix are stored in the primary array.

The Compressed Sparse Row (CSR) scheme is used for storing the current FEM sparse matrix, in which scheme the each entry of primary array is stored row-by-row.

2.5 Preconditioned Conjugate Gradient Method (PCG)

Among the iterative methods, the Preconditioned Conjugate Gradient method (PCG) is extremely effective for solving the symmetric positive define systems. The PCG method was developed in 1952 by Hestenes and Stiefel, which is an improvement to the steepest descent method [Saad, 2003]. The steepest descent approaches the solution asymptotically, however, the disadvantage of this method is that the speed of convergence may be very slow if the condition number of the matrix A is large. PCG is an efficient implementation of the conjugate directions method in which the search directions are constructed by conjugation of the residuals. In this section, the theory of PCG will not be described in detail, which can be found elsewhere, e.g., [Golub and Van Loan, 1996], [Saad, 2003], and [Barrett and Berry, 1994]. The algorithm is given in the following,

Algorithm 1.Preconditioned Conjugate Gradient Method

Where k is the iterative number, r is the residual, x is the solution vector, p is the step direction, α is the step length, and β is the correction factor. Preconditioner M is defined as the diagonal of stiffness A, known as Jacobi preconditioning, is equivalent to scaling the quadratic form along the coordinate axes.

2.6 Multi-frontal Massively Parallel Solver (MUMPS)

MUMPS [Amestoy et. al., 2000] is a package for solving systems of linear equations of the form Ax = b. Unlike PCG, the stiffness matrix, A, is a square sparse matrix that can be either un-symmetric, symmetric positive definite, or general

symmetric. MUMPS uses a multi-frontal technique, which is a direct method based on either the LU or the LDLT factorization of the matrix. In the following, the main features and steps of MUMPS from its userguide are given in turn.

The main features of the MUMPS package include the solution of the transposed system, input of the matrix in assembled format (distributed or centralized) or elemental format, error analysis, iterative refinement, scaling of the original matrix, return of a Schur complement matrix, and several built-in ordering algorithms. The details of this technique can be found in the reference of its user-guide. The system Ax

= b is solved in three main steps:

1. Analysis. The host performs an ordering based on the symmetrized pattern A+AT, and carries out symbolic factorization. A mapping of the multifrontal computational graph is then computed, and symbolic information is transferred from the host to the other processors. Using this information, the processors estimate the memory necessary for factorization and solution.

2. Factorization. The original matrix is first distributed to processors that will participate in the numerical factorization. The numerical factorization on each frontal matrix is conducted by a master processor (determined by the analysis phase) and one or more slave processors (determined dynamically). Each processor allocates an array for contribution blocks and factors; the factors must

be kept for the solution phase.

3. Solution. The right-hand side b is broadcast from the host to the other processors.

These processors compute the solution x using the (distributed) factors computed during Step 2, and the solution is either assembled on the host or kept distributed on the processors. Each of these phases can be called separately and several instances of MUMPS can be handled simultaneously. MUMPS allows the host processor to participate in computations during the factorization and solve phases, just like any other processor. For both the symmetric and the unsymmetric algorithms used in the code, MUMPS has chosen a fully asynchronous approach with dynamic scheduling of the computational tasks. Asynchronous communication is used to enable overlapping between communication and computation. Dynamic scheduling was initially chosen to accommodate numerical pivoting in the factorization. The other important reason for this choice was that, with dynamic scheduling, the algorithm can adapt itself at execution time to remap work and data to more appropriate processors. In fact, we combine the main features of static and dynamic approaches; MUMPS uses the estimation obtained during the analysis to map some of the main computational tasks; the other tasks are dynamically scheduled at execution time. The main data structures (the original matrix and the factors) are similarly partially mapped according to

the analysis phase.

MUMPS distributes the work tasks among the processors, but an identified processor (the host) is required to perform most of the analysis phase, to distribute the incoming matrix to the other processors (slaves) in the case where the matrix is centralized, and to collect the solution.

2.7 Parallel computing of FEM

In parallel computing of FEM, the computational domain is first partitioned divided into a number of non-overlapping sub-domains, which is equal to the number of processors. One processor is assigned for the computation of each sub-domain, and communications are required between processors whenever needed, e.g., [Farhat et.

al., 1995], and [Hodgson and Jimack, 1993]. Fig. 2.4 shows the three different kinds of partitioning methods, which are common used in graph-partitioning techniques [Saad, 2003]. We use the element-based partitioning to partition the domain, in which

partitioning method, there is no element should is split between two sub-domains, e.g., [Gullerud and Dodds, 2001], and [Thiagarajan and Aravamuthan, 2002].

The different type of approach is used for the different type of domain decomposition [Saad, 2003]. When the domain is partitioned into a set of overlapping sub-domains in which case overlapping Schwartz methods are used for the solution of the system. On the other hand, iterative sub-structuring methods are used for domain

is partitioned into a set of non-overlapping sub-domains. There are two typical non-overlapping domain decomposition methods used in parallel computing of FEM, which are the subdomain-by-subdomain (SBS) and the Schur complement method. We use the SBS approach for paralleling FEM, in which the global stiffness matrix is divided a numbers of partitioned matrices and be stored on each corresponding processor. Then the PCG method should be performed on the SBS basis. The details of SBS method are described in the following.

Before introducing SBS method, the graph-partitioning library, METIS [Karypis, and Kumar, 1995], is first used to decompose the computational domain Ω into p

non-overlapping sub-domains (see Fig. 2.5).

i

The following Eqs. (2-34) is the standard block-arrowhead structure of the stiffness matrix usually formed from SBS method.

⎥⎥

This kind of matrix structure stems from the special nodes re-ordering on each

sub-domain using SBS method. For each sub-domain, the rule of nodes re-ordering is that the internal nodes is numbered first and last the interfacial nodes. The matrices of internal nodes (Ai,Bi,BiT,xi, andbi) are contributed entirely from its corresponding sub-domain, and the matrices of interfacial matrices (As,Bs,BsT,xs, andbs) have the contributions from all sub-domains. Since all these matrices are concurrently assembled on each processor, the PCG algorithm using SBS method is given as follows, e.g., [Saad, 2003], [Gullerud and Dodds, 2001], [Khan and Topping, 1996],

and [Jimack, and Touheed, 1997],

Algorithm 2. Parallel Preconditioned Conjugate Gradient Method

.

k

This algorithm shows that the PCG with SBS method should be performed concurrently on each processor, whereas the communication is performed in two matrix operators: inner product and update [Jimack, and Touheed, 1997]. The subroutine inner product is used to calculate the inner product of two distributed vector between processors and then it requires a single globe communication for providing each processor with a copy of this sum. Since this sum is calculated

This algorithm shows that the PCG with SBS method should be performed concurrently on each processor, whereas the communication is performed in two matrix operators: inner product and update [Jimack, and Touheed, 1997]. The subroutine inner product is used to calculate the inner product of two distributed vector between processors and then it requires a single globe communication for providing each processor with a copy of this sum. Since this sum is calculated