Some Remarks - The Parallel Computing of Finite Element Method for

Chapter 2. The Parallel Computing of Finite Element Method for

2.11 Some Remarks

From the benchmark validations, the developed 3D parallel Poisson’s equation solver is verified successfully with and without the source term considering an Neumann boundary condition and Dirichlet boundary condition. The 3D parallel Poisson’s solver coupled with PAMR is used in simulating a typical CNT emitter and the results show a good resolution of potential distribution around a very narrow emitter tip. Such an accurate potential distribution usually plays an important role in correctly predicting the emission current from the CNT emitter tip. Parallel speedup performance shows 84% at 32 processors, and a more robust precondition should be implemented in the near future.

Chapter 3 The Parallel Computing of Finite Element Method for Three-Dimensional Magnetostatic Field Problems

As will be shown in later chapter, for simulating the typical DC and RF magnetron plasma, there is usually a permanent magnet system behind the cathode electrode, which has to be simulated before plasma modeling. In general, this magnetic field plays an important role in sustaining magnetron plasma at very low temperature. Thus, the main purpose of developing a parallel magnetostatic field solver is to obtain the magnetic field induced by permanent magnet system. Since both the finite element Galerkin weighted residual method (GWRM) and subdomain-by-subdomain (SBS) method are introduced in previous chapter in detail, in this chapter, we only interest in developing the parallel magnetostatic field solver using the GWRM and SBS coupled with PAMR directly. Some benchmark problems are presented for demonstrating the accuracy and applicability of the parallel magnetostatic field solver. In the end of this chapter, the parallel performances of these solvers are studied and their time breakdown is analyzed.

3.1 Calculation of magnetostatic (MS) Field

For the magnetostatic field problem, the Maxwell’s equation is reduced to a vector Poisson’s equation as shown as Eqs. (2-6) in the previous chapter. The

magnetic vector potential in Eqs. (2-6) is expressed in terms of current density Jr with the following definition,

Jr=∇× v (3-1) Where Mr

is the magnetization vector of permanent magnet. Substituting Eqs. (3-1)

into Eqs. (2-6), Eqs. (2-6) yields In Cartesian coordinates, Eqs. (3-2) is equivalent to three scalar Poisson equations:

and µ can be functions of position. In the following, the same developing steps of GWMR are used for Eqs. (3-3). After employing GWRM, we can obtain the element equation similar to previous Poisson’s equation. For the x-component element equation,

⎥⎥

Steps 1-4 and 6 of GWRM are not repeated here, and we only describe the detail in step 5 for Eqs. (3-3).

Step 5: Evaluate the interior load term and boundary flux term of Eqs.(3-4).

The divergence theorem could be written as follows,

where u is an arbitrary scalar and Fr

is an arbitrary vector. After employing Eqs. (3-5), the interior load integral becomes,

dxdydz manners are handled in y- and z- components. The final forms of element equations of

Eqs. (3-3) are:

After the theoretical development, the boundary conditions are imposed to Eqs. (3-9) and ready be solved using either the precondition conjugated gradient method (PCG) or the direct matrix package MUMPS. In parallel computing Eqs. (3-9), PCG is used based on the SBS method, the same description is given in the previous chapter and it will not be repeated for brevity.

3.2 Coupling of PAMR with Parallel Magnetostatic Field Solver

In the previous chapter, PAMR is described and successfully coupled with the parallel Poisson’s equation solver for electrostatic field problems. In the subsection, the same idea is followed for coupling PAMR and current parallel vector Poisson’s equation for magnetostatic field problems. Therefore, almost the same developing procedures are given as those for electrostatic field solvers as follows.

The PAMR presented in the previous section can be easily coupled to the current parallel magnetostatic field solvers since both utilize 3-D unstructured tetrahedral mesh and MPI for data communication. One can readily wrap up the PAMR as a library and insert it into the source code of any parallel numerical solver to be used.

However, some problems may occur due to memory conflicts between the inserted library and the numerical solver itself that could reduce the problem size one can handle in practice. As such, a simple coupling procedure, written in shell script (Fig.

3.1) that is standard on all Unix-like systems, can be prepared to link the PAMR and

the current parallel magnetostatic field solvers. In doing so, we can keep the source codes intact and without alterations. Indeed, it is especially justified if only a steady state of the physical problem is sought, in which normally only several times of mesh refinement is enough to have a fairly satisfactory solution. Thus, the total I/O time, which is in proportion to the number of couplings in switching between two codes,

can be reduced to a minimum in practical applications. In addition, as shown in Fig.

3.1, after identifying those cells that require refinement before PAMR, the domain is

repartitioned based on the new mesh refinement requirements. For example, the weight factors of the cells (vertex in graph theory) are set as eight for those cells which are flagged to be refined; otherwise, they are set as unity. With this distribution of weight factors as the input to ParMetis, an approximate (but rather good) load balancing can be achieved in the PAMR module. Then the magnetostatic field solvers read in the output refined mesh from the PAMR module and partition the new mesh with equal weight factors for all cells, in which the workload is balanced in the magnetostatic field solvers.

The current parallel magnetostatic field solvers along with PAMR are implemented and tested on a PC-cluster system with the Linux OS at the National Center for High-Performance Computing in Taiwan (64-node, dual processor and 8 GB RAM per node). The standard message-passing interface (MPI) is used for data communication. It is thus expected that the current parallel code will be highly portable among the memory-distributed parallel machines that are running with the Linux (or its equivalent) operating system.

3.3 Validation and Parallel Performance of the Magnetostatic Field

Solver

Since the parallel computing of the vector Poisson’s equation solver is the same with those techniques for Poisson’s equation solver presented in the previous chapter, the only difference of computing procedures is that Poisson’s equation has to be solved three times in order to obtain three different component of vector potential.

Therefore, we also use the Fig. 2.9 for representing the procedures of parallel vector Poisson’s equation solver, and the details of Fig. 2.9 can be found in previous chapter.

Validation of the parallel magnetostatic field solver

A permanent magnet array made up of eight segments is used to be the benchmark problem for validating the parallel magnetostatic field solver. Fig. 3.2 shows a cross section of the permanent magnet array. There are the experimental data and analytical solution available to this problem, e.g., [Leupold et. al., 1993], [Leupold et. al., 2000], and [Halbach, 1980]. The experimental data shows that the

magnetic flux density in the center air gap of the permanent magnet array is about 80-90% of the ideal value of analytic prediction. The analytic solution for this system is also given as follows:

B = BRln(ro/ri) (3-10) Where BR is the remanence of the permanent magnet, and ro and ri are the outer and

inner radii of the permanent magnet array, respectively.

Fig. 3.3shows the mesh distribution of the permanent magnet array made up of

eight segments using 0.63 million elements (~108,840 nodes). And in this simulation, the remanence BR of the permanent magnet material is 1 Tesla, the relative permeability is 1, the inner radius is 1 inch, and the outer radius is 2.74 inch. After substituting these parameters to Eqs. (3-10), the calculated magnetic flux density in the air gap is expected to be 1 Tesla. The Fig. 3.4 shows that the simulated magnitude of the magnetic flux density of the permanent magnet array at the center of the air gap is about 0.87 Tesla, which is agree with the previous experimental data. Fig. 3.5 shows the mesh distribution of permanent magnet array made up of eight segments using PAMR where the initial mesh is rather coarse (7,845 nodes), while the level-6 mesh is very fine (108,840 nodes) around the permanent magnets array. Table 4 lists the number of nodes/elements and the corresponding maximal magnitudes of the magnetic flux density of the permanent magnet array at the center of the air gap in the simulation domain at different levels of mesh refinement with the ε_ref is set to 0.08 based on the variation of vector potential is used. It also shows that better solution obtained after using PAMR.

Parallel performance of the magnetostatic field solver

The previous validation simulation case is also employed to test the parallel

performance of the current parallel magnetostatic field solver. Fig. 3.6 illustrates the parallel speedup as a function of the number of processors up to 32. The corresponding time breakdown of various components of the solver along with speedup is similar with the time breakdown of parallel electrostatic solver since they used the same parallel CG matrix solver. And, detailing the analysis on this time breakdown structure is not repeated here. The runtime using a single processor is about 259.86 seconds, while it is reduced to 10.82 seconds using 32 processors, which results in ~24.01 of parallel speedup.

3.4 Some Remarks

Following the development steps in previous chapter, the parallel vector Poisson’s equation solver is developed and verified successfully using a typical permanent magnet system. When our simulation results compare with the simulation results from commercial software, it especially shows a better resolution in magnetic field distribution since the parallel PAMR module is coupled into the parallel vector Poisson’s equation solver. Parallel speedup performance shows 75% at 32 processors, and a more robust precondition should also be implemented in the near future.

Chapter 4 An Overview of the PIC-FEM Method Using an Unstructured Mesh and Its Parallel Implementation

This chapter begins with the introduction to the overview of the conventional Particle-In-Cell and Monte-Carlo method (PIC-MCC), which is a well-known kinetic approach for plasma simulation. Since the conventional PIC-MCC is less flexible in simulating the device with complicated geometric shape when using a structured mesh. Therefore, the first main contribution of this chapter is to develop a PIC-MCC code for especially using an unstructured tetrahedral mesh, named PIC-FEM code.

However, the PIC-FEM code with a large number of particles does make the computation become very expensive. For sure, the second main contribution of this chapter is to accelerate the code using the parallel computing technique. In parallel computing of PIC-FEM method, the computational domain is first decomposed in a number of sub-domains, which is equal to the number of processors via the multi-level graph-partitioning library, METIS. Dynamic domain decomposition (DDD) technique is employed for alleviating the load unbalance among sub-domains. Two benchmark problems are presented for demonstrating the accuracy and applicability of the parallel PIC-FEM code. In the end of this chapter, the parallel performance of

4.1 General Description of PIC-MCC Method

The PIC-MCC method is the particle method coupling with the Maxwell’s equations. The original PIC method without MCC method was developed by plasma physicists, and it mainly be used in simulating the charged particles motion under electromagnetic field. This important theory greatly reduces the computational load in considering that N Coulomb interactions among N particles based on the charge ² extrapolation and force interpolation. Since PIC does not consider particle collisions, it could be represent as the collisionless Boltzmann equation, i,e., Vlasov’s equation [Nanbu, 2000]. The more details of PIC method can be found in Birdsall and

Landon’s book [Birdsall and Langdon, 1991], Hockey and Eastwood’s book [Hockney and Eastwood, 1988], and Birdsall’s review [Birdsall, 1991]. Until 1980s,

Monte-Carlo collisions method was included in PIC method for modeling the self-sustained plasma discharge [Birdsall, 1991]. In simulating the plasma discharge using PIC-MCC, the cell size should be a fraction of the Debye length in order to resolve the plasma sheath. Moreover, in order to resolve the plasma oscillation, the electron time-step must be one order smaller than 1/(plasma frequency).

4.2 The PIC-MCC Procedures

The conventional flowchart of the PIC-MCC scheme is shown as Fig. 4.1. It

shows that after initialization, one time step consists of eight stages as follows:

1. Charge extrapolation to each grid points 2. Calculation of electromagnetic fields 3. Force interpolation to each particle 4. Calculation of motion of each particle

5. Calculation of Monte-Carlo collision of each particle 6. Indexing (or sorting) all the particles

7. Calculation particle reduction

8. Sampling the particles within cells to determine the macroscopic quantities In order to significantly speedup the simulation, sub-cycling scheme [Brackbill and Cohen, 1985] is used since ion move very slowly in one time step due to it is heavier

than electron. In such scheme, after repeating the calculation of stages 1-8, 10 times for ∆ , we calculate the stages once for t_e ∆ , thus advancing the system time by t_i ∆ . t_i Since step 2 was introduced in chapter 2, the other steps will be given in detail as

follows:

4.2.1 Initialization

Before beginning stage 1-8, the data of geometry and simulation conditions should be read in the code. The number of simulated particles of each cell is calculated according to the particle number density and current cell volume. Since

and ions. The particle velocities are assigned to each particle based on Maxwell-Boltzmann distribution according to the particle velocities and temperature.

The positions for each simulated electron are randomly allocated within the cells and the same positions are assigned to the simulated ions.

4.2.2 Force Interpolation and Charge Extrapolation

The same weighting function should be used for force interpolation and charge extrapolation in order to eliminate the self-force and conserves momentum [Birdsall and Langdon, 1991]. For an unstructured mesh, the derived finite element volume

coordinate in previous chapter [Santi et. al., 2003] is used to as the interpolation and extrapolation weighting function. The force interpolation and charge extrapolation using volume coordinate are written in Eqs. (4-1) and (4-2), respectively.

∑

where the subscript i represents mesh node properties and subscript k represents

particle properties.

4.2.3 Equations of Motion

The leap-frog scheme [Birdsall and Langdon, 1991] is used for solving the equation of motion (see Fig. 4.2), in which scheme is second-order accuracy in time through use of a velocity that is staggered at half time steps relative to the particle position. In addition, this scheme requires minimal storage information of particle

velocity and position since only the updated particle velocity and position are stored.

Eqs.(4-3) and Eqs. (4-4) are the particle position and velocity derivatives in finite

difference form with leap-frog scheme, respectively:

n n Where n is the time step counter. The force term in the RHS of Eqs. (4-3) can also be formulated using leap-frog scheme, it yields

2 ) Then Eqs. (4-5) is ready to be solved using Boris’s method [Birdsall and Langdon, 1991]. In this method, the magnetic and electric fields are separated completely after introducing two new variables into Eqn. (4-5). They are:

After substituting Eqs. (4-6) and Eqs. (4-7) into Eqs. (4-5), Eqs. (4-5) becomes Bn Therefore, from Eqs. (4-6) to (4-8), the main ideas of Boris’s method can be explained that vr is obtained after adding the half of the electric impulse to the ⁻ initial velocity via Eqs. (4-6). Then vr is obtained after rotating ⁺ vr with magnetic ⁻ field via Eqs. (4-7). Finally, the updated velocity vrⁿ⁺¹^/² can be obtained after adding

the half of the electric impulse to vr via Eqs. (4-8). The detail of this method is also ⁺ can be found in Birdsall’s book [Birdsall and Langdon, 1991].

4.2.3.1 Particle Ray Tracing

There are two different methods for particle ray tracing, which are cell-by-cell and coordinate particle tracking for an unstructured mesh, respectively. For an

unstructured mesh, the cell-by-cell particle tracking takes the advantage of cell connectivity provided by the unstructured mesh data [Tseng, 2005]. The first step of the particle tracing is to determine whether the particle will across if the particle will stay in or leave the current cell. If the particle leaves, then the second step is to determine the intersection poison on the intersecting face. Further journey of the particle depends on the face condition. If it is the normal face between cells, then it will continue its movement until the time step ends. If the intersection face is an I/O or specified boundary, the particle will be removed. If not, then process the interaction according to the wall boundary conditions. The more details of this particle tracking can be found elsewhere [Tseng, 2005].

4.2.4 Monte Carlo Collision Algorithm (MCC)

In order to simulate a self-sustained and self-consistent plasma discharge, the dominant reactions between species should be considered properly. Since we use argon gas for plasma discharge, the simulation species are electron (e), ion (Ar ), and ⁺

ground state atom (Ar). In general, there are three different collision algorithms for these plasma species (see Fig. 4.3) [Nanbu, 2000], which are short-range collision between unlike particles, short-range collision between like particles, and Coulomb collisions. For the low-temperature plasma, the plasma density is usually less than10¹⁷m , the Coulomb collisions are all negligible. Moreover, argon gas is ⁻³ assumed in equilibrium and at rest, the short-range collision between like particles are also negligible in current work. Therefore, totally five reactions in the argon Monte Carlo collision which are listed as follows,

(1) e+Ar →e+Ar (Elastic Scattering) (2) e+Ar →e+Ar^* (Excitation)

(3) e+Ar→e+Ar⁺ +e (Ionization)

(4) Ar⁺ +Ar→ Ar+Ar⁺ (Charge Exchange) (5) Ar⁺ +Ar→ Ar⁺ +Ar (Elastic Scattering)

In general, there are two methods for treating e-Ar andAr -Ar collisions, which are ⁺ null-collision method and Nanbu’s method. Since only one random number used in Nanbu’s method [Vahedi and Surendra, 1995], it makes collision computation becomes more efficient than using the typical null-collision method, and Nanbu’s method is very simple and straightforward in treating such collisions. For treating the e-Ar collision using this method, the total collisional probability (P_T) is written as

∑

^Κ Where P is probability of the kth collisional event, _k

e k g

k N v t

P = σ (ε)∆ (4-10)

Then one can sample a collisional event k randomly Κ

= U

k 1 (4-11) and evaluate P at the energy εof the electron. The kth collisional event occurs _k when U >(k/Κ)−P_k, otherwise the collisional event not occurs (as shown in Fig.

4.4). This method is also used for treating theAr -Ar collisions. In the following, ⁺

each collisional event in the electron-molecule and ion-molecule collision is described briefly. The more detailed theory can be found in Ref. [Vahedi and Surendra, 1995].

4.2.4.1 Electron-Molecule Collision

The electron-molecule collision cross sections are the same as the ones used by Nanbu [Nanbu, 2000] as shown in Fig. 4.5. The only difference is we averaged 26 elastic collisions into one for fitting cross section data easily. When a elastic collision occurs, the incident electron scatters through an angle χwhich is computed follows the formulation in Ref. [Vahedi and Surendra, 1995]. It is

ε ε χ ² ε ²⁽¹ ⁾^R

cos + − +

= (4-12) where R is the random number andεis the energy of the incident electron. The Eqs.

(4-16) also holds for determining electron scattering angle for all types of

electron-neutral collisions. Since the azimuthal scattering angle φ, is uniformly distributed on the interval [0, 2π], and is determined by

φ=2πR (4-13) where R is the random number. Once χ and φ are obtained, the direction of the scattered velocity (as shown in Fig. 4.6) is determined by [Vahedi and Surendra, 1995]

cos _inc sin _inc _inc

inc

scat v v i v i v

vr = r +r ×r +r × r×r (4-14)

Whereθis given by cosθ =vr ⋅_inc ir, vr and _inc vr_scat are unit vector parallel to the incident and scattered velocities, respectively. Then the scattered velocity components can be determined by taking the projection of vr_scat on the coordinates axes. In an excitation collision, the incident electron loses the excitation threshold energy of 11.55ev and is scattered through an angle χ determined by Eqs. (4-12).

在文檔中應用非結構性網格之平行化三維PIC-FEM程式的研究與發展 (頁 67-0)