• 沒有找到結果。

Validation and Parallel Performance of the Magnetostatic Field Solver with

Chapter 3. The Parallel Computing of Finite Element Method for Three-

3.3 Validation and Parallel Performance of the Magnetostatic Field Solver with

Solver

Since the parallel computing of the vector Poisson’s equation solver is the same with those techniques for Poisson’s equation solver presented in the previous chapter, the only difference of computing procedures is that Poisson’s equation has to be solved three times in order to obtain three different component of vector potential.

Therefore, we also use the Fig. 2.9 for representing the procedures of parallel vector Poisson’s equation solver, and the details of Fig. 2.9 can be found in previous chapter.

Validation of the parallel magnetostatic field solver

A permanent magnet array made up of eight segments is used to be the benchmark problem for validating the parallel magnetostatic field solver. Fig. 3.2 shows a cross section of the permanent magnet array. There are the experimental data and analytical solution available to this problem, e.g., [Leupold et. al., 1993], [Leupold et. al., 2000], and [Halbach, 1980]. The experimental data shows that the

magnetic flux density in the center air gap of the permanent magnet array is about 80-90% of the ideal value of analytic prediction. The analytic solution for this system is also given as follows:

B = BRln(ro/ri) (3-10) Where BR is the remanence of the permanent magnet, and ro and ri are the outer and

inner radii of the permanent magnet array, respectively.

Fig. 3.3shows the mesh distribution of the permanent magnet array made up of

eight segments using 0.63 million elements (~108,840 nodes). And in this simulation, the remanence BR of the permanent magnet material is 1 Tesla, the relative permeability is 1, the inner radius is 1 inch, and the outer radius is 2.74 inch. After substituting these parameters to Eqs. (3-10), the calculated magnetic flux density in the air gap is expected to be 1 Tesla. The Fig. 3.4 shows that the simulated magnitude of the magnetic flux density of the permanent magnet array at the center of the air gap is about 0.87 Tesla, which is agree with the previous experimental data. Fig. 3.5 shows the mesh distribution of permanent magnet array made up of eight segments using PAMR where the initial mesh is rather coarse (7,845 nodes), while the level-6 mesh is very fine (108,840 nodes) around the permanent magnets array. Table 4 lists the number of nodes/elements and the corresponding maximal magnitudes of the magnetic flux density of the permanent magnet array at the center of the air gap in the simulation domain at different levels of mesh refinement with the εref is set to 0.08 based on the variation of vector potential is used. It also shows that better solution obtained after using PAMR.

Parallel performance of the magnetostatic field solver

The previous validation simulation case is also employed to test the parallel

performance of the current parallel magnetostatic field solver. Fig. 3.6 illustrates the parallel speedup as a function of the number of processors up to 32. The corresponding time breakdown of various components of the solver along with speedup is similar with the time breakdown of parallel electrostatic solver since they used the same parallel CG matrix solver. And, detailing the analysis on this time breakdown structure is not repeated here. The runtime using a single processor is about 259.86 seconds, while it is reduced to 10.82 seconds using 32 processors, which results in ~24.01 of parallel speedup.

3.4 Some Remarks

Following the development steps in previous chapter, the parallel vector Poisson’s equation solver is developed and verified successfully using a typical permanent magnet system. When our simulation results compare with the simulation results from commercial software, it especially shows a better resolution in magnetic field distribution since the parallel PAMR module is coupled into the parallel vector Poisson’s equation solver. Parallel speedup performance shows 75% at 32 processors, and a more robust precondition should also be implemented in the near future.

Chapter 4

An Overview of the PIC-FEM Method Using an Unstructured Mesh and Its Parallel Implementation

This chapter begins with the introduction to the overview of the conventional Particle-In-Cell and Monte-Carlo method (PIC-MCC), which is a well-known kinetic approach for plasma simulation. Since the conventional PIC-MCC is less flexible in simulating the device with complicated geometric shape when using a structured mesh. Therefore, the first main contribution of this chapter is to develop a PIC-MCC code for especially using an unstructured tetrahedral mesh, named PIC-FEM code.

However, the PIC-FEM code with a large number of particles does make the computation become very expensive. For sure, the second main contribution of this chapter is to accelerate the code using the parallel computing technique. In parallel computing of PIC-FEM method, the computational domain is first decomposed in a number of sub-domains, which is equal to the number of processors via the multi-level graph-partitioning library, METIS. Dynamic domain decomposition (DDD) technique is employed for alleviating the load unbalance among sub-domains. Two benchmark problems are presented for demonstrating the accuracy and applicability of the parallel PIC-FEM code. In the end of this chapter, the parallel performance of

4.1 General Description of PIC-MCC Method

The PIC-MCC method is the particle method coupling with the Maxwell’s equations. The original PIC method without MCC method was developed by plasma physicists, and it mainly be used in simulating the charged particles motion under electromagnetic field. This important theory greatly reduces the computational load in considering that N Coulomb interactions among N particles based on the charge 2 extrapolation and force interpolation. Since PIC does not consider particle collisions, it could be represent as the collisionless Boltzmann equation, i,e., Vlasov’s equation [Nanbu, 2000]. The more details of PIC method can be found in Birdsall and

Landon’s book [Birdsall and Langdon, 1991], Hockey and Eastwood’s book [Hockney and Eastwood, 1988], and Birdsall’s review [Birdsall, 1991]. Until 1980s,

Monte-Carlo collisions method was included in PIC method for modeling the self-sustained plasma discharge [Birdsall, 1991]. In simulating the plasma discharge using PIC-MCC, the cell size should be a fraction of the Debye length in order to resolve the plasma sheath. Moreover, in order to resolve the plasma oscillation, the electron time-step must be one order smaller than 1/(plasma frequency).

4.2 The PIC-MCC Procedures

The conventional flowchart of the PIC-MCC scheme is shown as Fig. 4.1. It

shows that after initialization, one time step consists of eight stages as follows:

1. Charge extrapolation to each grid points 2. Calculation of electromagnetic fields 3. Force interpolation to each particle 4. Calculation of motion of each particle

5. Calculation of Monte-Carlo collision of each particle 6. Indexing (or sorting) all the particles

7. Calculation particle reduction

8. Sampling the particles within cells to determine the macroscopic quantities In order to significantly speedup the simulation, sub-cycling scheme [Brackbill and Cohen, 1985] is used since ion move very slowly in one time step due to it is heavier

than electron. In such scheme, after repeating the calculation of stages 1-8, 10 times for ∆ , we calculate the stages once for te ∆ , thus advancing the system time by ti ∆ . ti Since step 2 was introduced in chapter 2, the other steps will be given in detail as

follows:

4.2.1 Initialization

Before beginning stage 1-8, the data of geometry and simulation conditions should be read in the code. The number of simulated particles of each cell is calculated according to the particle number density and current cell volume. Since

and ions. The particle velocities are assigned to each particle based on Maxwell-Boltzmann distribution according to the particle velocities and temperature.

The positions for each simulated electron are randomly allocated within the cells and the same positions are assigned to the simulated ions.

4.2.2 Force Interpolation and Charge Extrapolation

The same weighting function should be used for force interpolation and charge extrapolation in order to eliminate the self-force and conserves momentum [Birdsall and Langdon, 1991]. For an unstructured mesh, the derived finite element volume

coordinate in previous chapter [Santi et. al., 2003] is used to as the interpolation and extrapolation weighting function. The force interpolation and charge extrapolation using volume coordinate are written in Eqs. (4-1) and (4-2), respectively.

where the subscript i represents mesh node properties and subscript k represents

particle properties.

4.2.3 Equations of Motion

The leap-frog scheme [Birdsall and Langdon, 1991] is used for solving the equation of motion (see Fig. 4.2), in which scheme is second-order accuracy in time through use of a velocity that is staggered at half time steps relative to the particle position. In addition, this scheme requires minimal storage information of particle

velocity and position since only the updated particle velocity and position are stored.

Eqs.(4-3) and Eqs. (4-4) are the particle position and velocity derivatives in finite

difference form with leap-frog scheme, respectively:

n n Where n is the time step counter. The force term in the RHS of Eqs. (4-3) can also be formulated using leap-frog scheme, it yields

2 ) Then Eqs. (4-5) is ready to be solved using Boris’s method [Birdsall and Langdon, 1991]. In this method, the magnetic and electric fields are separated completely after introducing two new variables into Eqn. (4-5). They are:

2

After substituting Eqs. (4-6) and Eqs. (4-7) into Eqs. (4-5), Eqs. (4-5) becomes Bn Therefore, from Eqs. (4-6) to (4-8), the main ideas of Boris’s method can be explained that vr is obtained after adding the half of the electric impulse to the initial velocity via Eqs. (4-6). Then vr is obtained after rotating + vr with magnetic field via Eqs. (4-7). Finally, the updated velocity vrn+1/2 can be obtained after adding

the half of the electric impulse to vr via Eqs. (4-8). The detail of this method is also + can be found in Birdsall’s book [Birdsall and Langdon, 1991].

4.2.3.1 Particle Ray Tracing

There are two different methods for particle ray tracing, which are cell-by-cell and coordinate particle tracking for an unstructured mesh, respectively. For an

unstructured mesh, the cell-by-cell particle tracking takes the advantage of cell connectivity provided by the unstructured mesh data [Tseng, 2005]. The first step of the particle tracing is to determine whether the particle will across if the particle will stay in or leave the current cell. If the particle leaves, then the second step is to determine the intersection poison on the intersecting face. Further journey of the particle depends on the face condition. If it is the normal face between cells, then it will continue its movement until the time step ends. If the intersection face is an I/O or specified boundary, the particle will be removed. If not, then process the interaction according to the wall boundary conditions. The more details of this particle tracking can be found elsewhere [Tseng, 2005].

4.2.4 Monte Carlo Collision Algorithm (MCC)

In order to simulate a self-sustained and self-consistent plasma discharge, the dominant reactions between species should be considered properly. Since we use argon gas for plasma discharge, the simulation species are electron (e), ion (Ar ), and +

ground state atom (Ar). In general, there are three different collision algorithms for these plasma species (see Fig. 4.3) [Nanbu, 2000], which are short-range collision between unlike particles, short-range collision between like particles, and Coulomb collisions. For the low-temperature plasma, the plasma density is usually less than1017m , the Coulomb collisions are all negligible. Moreover, argon gas is 3 assumed in equilibrium and at rest, the short-range collision between like particles are also negligible in current work. Therefore, totally five reactions in the argon Monte Carlo collision which are listed as follows,

(1) e+Are+Ar (Elastic Scattering) (2) e+Are+Ar* (Excitation)

(3) e+Are+Ar+ +e (Ionization)

(4) Ar+ +ArAr+Ar+ (Charge Exchange) (5) Ar+ +ArAr+ +Ar (Elastic Scattering)

In general, there are two methods for treating e-Ar andAr -Ar collisions, which are + null-collision method and Nanbu’s method. Since only one random number used in Nanbu’s method [Vahedi and Surendra, 1995], it makes collision computation becomes more efficient than using the typical null-collision method, and Nanbu’s method is very simple and straightforward in treating such collisions. For treating the e-Ar collision using this method, the total collisional probability (PT) is written as

Κ Where P is probability of the kth collisional event, k

e k g

k N v t

P = σ (ε)∆ (4-10)

Then one can sample a collisional event k randomly Κ

+

= U

k 1 (4-11) and evaluate P at the energy εof the electron. The kth collisional event occurs k when U >(k/Κ)−Pk, otherwise the collisional event not occurs (as shown in Fig.

4.4). This method is also used for treating theAr -Ar collisions. In the following, +

each collisional event in the electron-molecule and ion-molecule collision is described briefly. The more detailed theory can be found in Ref. [Vahedi and Surendra, 1995].

4.2.4.1 Electron-Molecule Collision

The electron-molecule collision cross sections are the same as the ones used by Nanbu [Nanbu, 2000] as shown in Fig. 4.5. The only difference is we averaged 26 elastic collisions into one for fitting cross section data easily. When a elastic collision occurs, the incident electron scatters through an angle χwhich is computed follows the formulation in Ref. [Vahedi and Surendra, 1995]. It is

ε ε χ 2 ε 2(1 )R

cos + − +

= (4-12) where R is the random number andεis the energy of the incident electron. The Eqs.

(4-16) also holds for determining electron scattering angle for all types of

electron-neutral collisions. Since the azimuthal scattering angle φ, is uniformly distributed on the interval [0, 2π], and is determined by

φ=2πR (4-13) where R is the random number. Once χ and φ are obtained, the direction of the scattered velocity (as shown in Fig. 4.6) is determined by [Vahedi and Surendra, 1995]

cos inc sin inc inc

inc

scat v v i v i v

vr = r +r ×r +r × r×r (4-14)

Whereθis given by cosθ =vr ⋅inc ir, vr and inc vrscat are unit vector parallel to the incident and scattered velocities, respectively. Then the scattered velocity components can be determined by taking the projection of vrscat on the coordinates axes. In an excitation collision, the incident electron loses the excitation threshold energy of 11.55ev and is scattered through an angle χ determined by Eqs. (4-12).

In an ionizing event, the incident electron strips an electron off the neutral, and the neutral becomes an ion, continuing on its trajectory virtually undistributed. The energy balance equation with this assumption, it yields

ion

atom, and εion is the ionization threshold energy. The energy of the incident electron is given by

2 ) ( inc ion

ej R ε ε

ε ≈ (4-17)

Then energy of the scattered electron is obtained form Eqs. (4-15). After the energy assignment, each of the scattered and ejected electrons scatters through angles χ and φ determined by Eqs. (4-12) and (4-13), respectively. From the Eqs. (4-16), the velocity of the created ion is calculated from 3V Maxwellian distribution at the molecule temperature.

4.2.4.2 Ion-Molecule Collision

Fig. 4.7 shows the ion-neutral cross sections we used in the model. In a simple

charge exchange collision, an electron is assumed to hop from the neutral onto the ion.

Therefore, the velocity of the new ion is assigned with the velocity of the incident neutral and the new neutral takes the velocity of incident ion. The hard-sphere collisions assumption is used in treating the ion-neutral elastic scattering collision.

And the energy of the scattered ions are determined by [Vahedi and Surendra, 1995]

inc L

scat α ε

ε =(1− ) (4-18)

Where εinc and εscat are the energies of the incident and scattered ions, respectively.

The energy loss factor, α , is given by L χ

αL =sin2 (4-19)

where χ is the scattering angle in the laboratory frame which is determined by

R

= 1

cosχ (4-20) Where R is the random number. The azimuthal scattering angle φ is determined with Eqs. (4-13).

4.2.5 Indexing

The location of the particle after movement with respect to the cell is important information for particle reduction. The relations between particles and cells are reordered according to the order of the number of particles and cells using a simple algorithm [Tseng, 2005]. Before the particle reduction, the removed particle will be chosen by a random method in the current cell. And the number of the removed particle can be easy determined according to this numbering system.

4.2.6 Particle Reduction

For simulating plasma discharge, the number of simulated particles usually increases rapidly due to ionization, which makes computational load become very heavy. In this case, speedup, particle reduction technique should be employed with switching the particle weighting factor in order to speed up the code, e.g., [Nanbu, 2000], [Shon et. al., 2001]. The particle reduction technique we used is the improvement of Nanbu’s particle reduction technique [Nanbu, 2000]. The main idea of this technique is given as follows.

When the current number of ion (Ni(t)) is equal to 1.2 times the initial number of ion (Ni(0)), the particle weighting function is increased from initial weighting function (W ) to 0 1 W by removing excess sampled particles cell by cell. The .2 0

numbers of sampled particles in a cell are chosen in proportion to their number density in each cell. This procedure is repeated whenever Ni(t) exceeds Ni(0). However, it is clear that this particle reduction has no effect on charge density distribution and the number of real electron is also unchanged before and after switching.

4.2.7 Data sampling

A steady state of the plasma system is reached by monitoring the total numbers of particle in the computational domain. Once a steady state is reached; we can obtain the data with small statistical fluctuations by time-averaging a set of temporal data sampled for equal time interval. The data for the electron density, ion density, charge density, electric potential, and electric field, are sampled at the grid points. And, the electron temperature, ion temperature, and electron energy distribution function are sampled in each cell.

4.3 Parallel computing of PIC-FEM method

Since the computational domain has been partitioned using multi-level graph partitioning tool, the PIC-MCC is then ready for paralleling under single-instruction

multiple-data (SIMD) paradigm. Under SIMD paradigm, the PIC-MCCC code is executed in serial on its own sub-domain. Communications are required when particle meets the inter-processor boundary (IPB) and also in field solver. High parallel performance can only be achieved if communication is minimized and the computational load is evenly distributed among processors. Therefore, dynamic load balancing (DDD) technique should be used to re-partition the domain using the weights base on the number of particle in each domain when load becomes unbalance.

The reason for selecting the number of particle as the weight for DDD is that the particle computation is the most expensive component of PIC-MCC.

4.3.1 The Parallel PIC-FEM Method

Fig. 4.8 and 4.9 shows a simplified flowchart of the parallel PIC-FEM method

proposed in the current study. A converter should be designed for preprocessing the mesh (grid) data obtained from the domain decomposition and for proving a processor neighbor-identifying array. The main function of this converter is to reorder the fully unstructured mesh data into the globally sequential but locally unstructured mesh data for obtaining he relationship between global and local cell data by a simple arithmetic operation due to this special cell-numbering design [Tseng, 2005]. In addition, the node numbers of each sub-domain are must re-ordered follow the requirement for FEM SBS method. Except for parallel computing of FEM, most of the particle computation follows the experience from the previous parallel DSMC method [Tseng, 2005].

Again referring to Fig. 4.8, the master processor first reads in the preprocessed mesh data and then distributes it to all other processors. Once all preprocessor has the information of mesh data, the PIC-MCC code is executed in serial on it own processor.

After force interpolation, particle starts to move and be tracked using cell-by-cell.

When particle meets the IPB during its journey within a simulation time step, the particle related data is then stored into a buffer array and are numbered sequentially for considering communication efficiency [Tseng, 2005]. After all the particles in a processor are moved, a local communication among processor is occurred for communicating the buffer array. Received particle data are then unpacked and each

When particle meets the IPB during its journey within a simulation time step, the particle related data is then stored into a buffer array and are numbered sequentially for considering communication efficiency [Tseng, 2005]. After all the particles in a processor are moved, a local communication among processor is occurred for communicating the buffer array. Received particle data are then unpacked and each