Validation and Parallel Performance of the Electrostatic Field Solver with

Chapter 2. The Parallel Computing of Finite Element Method for

2.10 Validation and Parallel Performance of the Electrostatic Field Solver with

Fig. 2.9 shows a simplified flowchart of the parallel computing of FEM

proposed in the current chapter, which incorporates the multi-level graph-partitioning library. Fig. 2.9 shows that after reading the preprocessed cell and node data on a master processor (CPU 0), the data are then distributed to all other processors according to the designated initial domain decomposition. With these data, every

processor will concurrently construct shape function, coefficient matrix and then impose the boundary conditions. Once every processor has the above information the system will ready be solved using parallel PCG. The final results are then output when L2 error norm is less then the specified convergence criteria.

Validation of the parallel electrostatic field solver

Many analytical solutions of Poisson’s equation are available for comparison either with or without the source term. In the current study, we have selected one problem without a source term and another with a constant source term. The former is a grounded conducting sphere with diameter (Dsphere) 2 meters immersed in a uniform electric field (Er =10

volts/m, ~40,000 elements, 20 processors), while the latter is a uniformly charged distribution between two infinite, grounded conducting plates at

L=0m and L=0.02m (quasi 1-D, number density of singly-charged ions=10¹⁶m^-3,

~8,500 elements, 20 processors). About ~56,000 particles are used. The charge weighting used in this is based on the volume coordinates that originated from the finite element method. The simulation and analytic solutions of these two problems are both in excellent agreement with the analytical solution as shown in Fig. 2.10(a) and Fig. 2.10(b), respectively. These results validate the accuracy of the current parallel Poisson’s equation solver.

Parallel performance of the electrostatic field solver

The simulation of a typical single CNT field emitter within a periodic cell using 0.47 million elements (~97,000 nodes), as shown in Fig. 2.11, is employed to test the parallel performance of the current Poisson’s equation solver. This size of the mesh is typical for further production run as will be presented in chapter 5. Only ¼ of the volume is used for the simulation by taking advantage of the symmetry in this problem. The gate voltage is applied with 150 volts, while the cathode and anode electrodes are grounded and applied with 400 volts, respectively. At the planes of symmetry, Neumann boundary conditions are used. A very refined grid (Fig. 2.12) is used near the silicon tip to improve the accuracy of the predicted electrical field. No parallel adaptive mesh refinement is used in the simulation since at this stage; we are only interested in obtaining the parallel performance of the Poisson’s equation solver.

Fig. 2.13 illustrates the parallel speedup as a function of the number of processors

up to 32. The corresponding time breakdown of various components of the solver along with speedup is summarized in Table 2. The runtime using a single processor is about 138.17 seconds, while it is reduced to 5.13 seconds using 32 processors, which results in ~26.93 of parallel speedup. Most of the time is consumed in the parallel CG matrix solver, in which the percentage of communication time generally increases with the number of processors used. Note that the communication time, including the

send/receive and allreduce commands required in a parallel CG solver, is relatively short (~3.53 seconds or 4.5% of the total time) at 2 processors which is attributed to the fast access to the same memory by the dual-processor per node architecture of this cluster system. An appreciable portion of the runtime is spent in the communication for a large number of processors, e.g., 35.4% at 16 processors. A further improvement of the solver efficiency by adding a robust parallel preconditioner before the parallel CG solver is highly expected and will be reported elsewhere in the future.

Nevertheless, the present results clearly show that the parallel implementation of the Poisson’s equation using a subdomain-by-subdomain procedure performs very well for the typical problem size we employ in the field emission prediction. A smaller problem size is not tested in the current study since it is irrelevant for this kind of application. It is expected that the parallel speedup can be even better if a larger problem size is simulated, e.g., for an array of field emitters. Thus, the current parallel implementation can greatly help to reduce the runtime required for the parametric study of optimizing the field emitter design.

Performance of parallel adaptive mesh refinement

A case with the same boundary conditions as the above test case for parallel performance is used to demonstrate the improvement of prediction using parallel adaptive mesh refinement. Fig. 2-14 shows a close-up view of the mesh distribution

near the single CNT tip using PAMR where the initial mesh is rather coarse (7,006 nodes), while the level-5 mesh is very fine (61,241 nodes) near the tip. In this case, an element is refined into eight child elements if the standard deviation of the potentials among the nodes of this element is larger than the value of a preset criterion, ε_ref. In this case, ε_ref is set to 0.08. Table 3 lists the number of nodes/elements and the corresponding maximal electric field in the simulation domain at different levels of mesh refinement. In addition, the data in the parentheses are obtained by using an a posteriori error estimator as proposed by Zienkiewicz and zhu [Zienkiewicz and Zhu,

1987]. We have employed a very simple gradient recovery scheme by averaging the

cell values of the FE solution to extract the “exact” solution of the electric field in each cell. A prescribed global relative errorε_pre of 0.0003 is used to control the level

of accuracy. The absolute error in each element is then compared with a current mean absolute error at each level, based on ε_pre, to decide if refinement is required. From Table 3, it is clear that the results are nearly the same by using either the variation of

potential or the error estimator in the current study, although the implementation of variation of potential is more cost effective. For all the data presented in the present study, mesh refinement based on variation of potential is used throughout the study, unless otherwise specified.

After level-5 refinement, the maximum value of the electric field near the tip

reaches an approximately constant value of 11.323 V/nm. Note that the parallel performance of the PAMR module is not discussed here for brevity purposes but it appears in detail elsewhere [Lain et. al., 2006]. All the cases shown in succeeding sections apply this mesh refinement module for a better resolution near the emitter tip.

在文檔中應用非結構性網格之平行化三維PIC-FEM程式的研究與發展 (頁 62-67)