Generally, numerical methods for fluid dynamics are categorized as continuum-based and particle-based methods. Due to the expected rarefaction in the EBPVD, the Navier-Stokes equations fail to solving rarefied gas dynamics problem, current research will be investigated by DSMC method [Bird, 1994], which is a particle-based method.
The degree of rarefaction of a gas is generally expressed through the Knudsen number (Kn) which is the ratio of the mean free path λ to the characteristic dimension L, i.e.
Kn= λL
(2.1) The traditional requirement for the Navier-Stokes equations to be valid is that the Knudsen number should be less than 0.1.
2-1 DSMC Method
The direct simulation Monte Carlo (DSMC) method is widely used as an effective numerical technique to simulate rarefied, nonequilibrium gas flow. In the DSMC method, a large number of particles are generated in the flow field to represent real physical molecules. Their initial properties are determined by the macroscopic quantities such as density, temperature and velocity, according to equilibrium distributions. The time step employed is sufficiently small so that the movement of the particles and the interaction between them can be decoupled. In each time step, the particle trajectory is computed, and its location is updated. The entire computational domain is divided into a network of
cells. Each cell serves as a separated region for the molecular interaction. A particle is only allowed to collide with another particle belonging to the same cell. Therefore, the size of the computational cell must be of the magnitude of a mean free path. Probabilities of particle collision are determined by the collision rates from kinetic theory. During each collision, modeling particles exchange momentum and energy, and chemical reactions may also take place. Once a steady state is established, time averaging is performed in each cell to evaluate the macroscopic mean values of the flow properties. Important steps of the DSMC method include setting up the initial conditions, moving all the simulated particles, indexing all the particles, colliding between particles, and sampling the particles within cells to determine the macroscopic quantities. The details of the procedures show in Fig. 2.1.
2-2 Parallel DSMC Method
Although the large number of molecules in a real gas is replaced with a reduced number of model particles, there are still a large number of particles must be simulated, leading to tremendous computer power requirements and needing to cost a lot of computational time. As a result, parallel DSMC method is developed to solve the problem.
Fig. 2.2 illustrates a simplified flow chart of the 3-D parallel DSMC method used in the current study. The DSMC algorithm is readily parallelized through physical domain decomposition. The cells of the computational grid are distributed among the processors.
Each processor executes the DSMC algorithm in serial for all particles and cells in its domain. Data communication occurs when particles cross the domain (processor) boundaries and are then transferred between processors.
First, we construct an unstructured tetrahedral using a commercial meshing tool. The output grid data are then processed using a conversion program to transform them into a globally sequential but locally unstructured mesh data [Wu and Lian, 2003] conforming
the partitioning information, as schematically shown in Fig. 2.3. In addition, a processor neighbor-identifying array is created for each processor, which is used to identify the surrounding processors due to the unstructured format of the processor distribution in the domain. The resulting globally sequential but locally unstructured mesh data is then imported into the parallel DSMC code.
After reading the mesh data on a master processor (cpu0), the mesh data are then distributed to all other processors according to the predetermined domain decomposition.
All the particles on each processor then start to move as in sequential algorithm. The particle data are sent to a buffer and is numbered sequentially when hitting the inter-processor boundary (IPB) during its journey within a simulation time step. After all the particles on a processor are moved, the destination processor for each particle in the buffer is identified via a simple arithmetic computation, owing to the approach adopted for the cell numbering, and are then packed into arrays. Considering communication efficiency the packed arrays are then sent as a whole to its surrounding processors in turn based on the tagged numbers. Once a processor sends out all the packed arrays, it waits to receive the packed arrays from its surrounding processors in turn. This “send” and
“receive” operation serves practically as a synchronization step during each simulation time step. Received particle data are then unpacked and each particle continues to finish its journey for the remaining time step. The above procedures are repeated twice since there might be some particles cross the IPB twice during a simulation time step.
After all particles on each processors have come to their final locations at the end of a time step, the program then carries out the indexing of all particles and the collisions of particles in each computational cell on each processor as usual in a sequential DSMC code. The particles in each cell are then sampled at the preset appropriate time.
Higher parallel efficiency can only be achieved if communication is minimized and the computational load is evenly distributed among processors. To minimize communication for between processors, the spatial domain decomposition should adapt according to the workload distribution as simulation continues, which requires dynamic domain decomposition. For the DSMC algorithm, the workload (or equivalently particle numbers) on each processor changes frequently, especially during the transient period of a simulation, while the workload attains a roughly constant value during steady-state sampling.
Although DSMC possesses nearly 100% parallelism (except for initialization and final output), both the values of speedup and efficiency are expected to be lower than the ideal values due to the load unbalancing and communication as mentioned previously. It is needed to make the load balanced.
Parallel DSMC Code (PDSC) is the main solver used in this thesis, which utilizes unstructured tetrahedral mesh. Fig. 2.4 is the features of PDSC and brief introduction is listed in the following paragraphs.
2-2-1 Unstructured Tetrahedral Mesh
Reasons of PDSC using unstructured tetrahedral mesh are: (a) it can be easily used for flows with complicated boundary conditions, (b) parallel processing can be easier
implemented via graph-partitioning technique, which can handle irregular inter-processor boundary of dynamic domain decomposition, (c) it can be coupled with unstructured node-based numerical method (e.g. N-S equations).
According to these advantages of using unstructured mesh, a special particle ray-tracing technique has to be designed to efficiently track the particle movement for the special grid system, unstructured grid, which we use in the current study. Briefly speaking, the movement of a particle is determined by the velocity and initial position of the particle. If the intersecting face is an I/O boundary, the particle will be removed. If not, then process the interaction according to the specified wall boundary condition. The details of particle ray-tracing techniques of two- and three-dimensional domain are described in Ref. [Tseng, 2000; Lian, 2001].
2-2-2 Collision Cross-Section Data
In real molecular collision, the force between molecules is strongly repulsive at short distance and weakly attractive at larger distance. Models for analytical and numerical studies involve some degree of approximation. These models are developed to imitate the real particle collision according to experiment. There are three molecular collision models, which are the Hard Sphere (HS), Variable Hard Sphere (VHS) and Variable Soft Sphere (VSS) molecular models, in the standard DSMC method [Bird, 1994]. The total collision cross section of the hard sphere model is proportional to the square of the constant diameter. It has the advantage of easily calculated collision mechanics because of the isotropic scattering that means all directions are equally possible for the post-collision velocity in the center-mass frame of reference. But the
cross-section should vary with relative velocity in reality. The variable hard sphere (VHS) model proposes the collision diameter is a function of relative speed, which can predict the viscosity more accurately. The cross-section is determined from the viscosity coefficient, but the ratio of the momentum to the viscosity cross-section follows the hard sphere value. Thus, the variable soft sphere (VSS) model is developed to predict the correct viscosity and diffusion coefficients, which the scattering of post-collision is not isotropic anymore. The VSS model can reproduce the viscosity and diffusion coefficients correctly. The relevant parameters of using VSS model for the DSMC method can be found in Bird’s book [Bird, 1994]. This reference provides some usual gaseous species.
When the flow involves some special species, it has problem to obtain the relevant parameters of the VSS molecular model. To overcome this problem, a quantum chemistry method is proposed to calculate the intermolecular energy surface according to the distance between molecules [Wu and Hsu, 2003]. Then the simulated intermolecular energy potential is fitted through the Lennard-Jones (L-J) potential to obtain the constants.
Based on these constants and gas kinetic theory, the transport coefficients, which are viscosity and diffusion coefficients, are derived. Finally, the parameters of the VSS model are derived by fitting these computed coefficients to those derived from the VSS model.
2-2-3 Pressure Boundary Treatment
In order to perform accurate simulation for inflow/outflow pressure boundaries, general procedure for treating these conditions by using the concept of particles flux conservation is developed in PDSC [Wu, et al., 2001]. This function is useful for applications of micro-manifold, micro-nozzle and slider air bearing.
2-2-4 Unstructured Adaptive Mesh with Variable Time-Step Scheme
The accuracy of a DSMC simulation is directly related to the number of simulated particles per cell throughout the cells. As the number of simulated particles increases, the statistical uncertainties of the macroscopic properties reduce due to better collision condition. The number of simulated particle per cell is shown to inversely proportional linear and square of gas density for two- and three-dimensional flows, respectively. That is, the simulated molecules are fewer in higher density regions, while lower density regions are over resolved. More computational time is spent calculating the lower density regions than is needed. A strategy to increase the computational speed without sacrificing the accuracy of the solution is to reduce the number of simulated particles by using cell/particle weighting, but maintaining near-uniform particle distribution per cell.
Kannenberg and Boyd [Kannenberg and Boyd, 2000] presented strategies for efficient particle resolution in DSMC. The authors manipulated variations of particle weight, variations of time-step and grid arrangement to obtain a more uniform particle count throughout the flow field. It was shown that careless use of cell/particle weighting often introduces some detrimental effects to the statistical accuracy, which is caused by repeatedly cloning the particles in the flow field. Nevertheless, variable time-step method represents one of the simplest and most efficient ways of particle weighting that avoids the problem of particle cloning, if careful grid manipulation is done. To obtain a more uniform distribution of model particles per cell throughout the computational domain, a variable time-step scheme is highly recommended.
The number of simulated particles per cell is related to the number density, cell volume and particle weight by the following relation;
P
P W
N = nV (2.2)
Np is the number of simulated molecules of pth cell. Wp is the particle weight which is defined as ratio of the number of real particles to the number of simulated molecules, and n and V are the number density and the volume of the computational cell, respectively. If
the number density is assumed to be a constant, the simulated particle count decreases by decreasing the cell volume or increasing the particle weight. As mentioned previously, mesh refinement can help to obtain a better cell-size distribution, ideally on the order of the local mean free path. The volume of a cell can then be related to density by the fact that the mean free path is inversely proportional to the number density. Thus, the relationship between these variables is shown as Eq. (2.3),
−1
∝
∝
∆x λ n (2.3)
For two-dimensional flow, the cell volume is given by
2
For three-dimensional flow, the cell volume is given by
3
Substituting Eqs. (2.3)~(2.5) into Eq. (2.2), give the following relation between number of simulated particles and flow density (assuming constant particle weight):
D
According to Eq. (2.6), the number of simulated particle is inversely linear and square proportional to the number density with respect to the two- and three-dimensional flows.
That is, the lower density regions have larger simulated particle numbers and the higher density regions have fewer simulated molecular numbers. This effect is more obvious in three-dimensional simulation. This will lead to computational waste and incorrect results at lower and higher density regions, respectively. To avoid this problem, a variable time-step scheme is proposed to obtain a more uniform particle distribution as follows:
From Eq. (2.6), the density distribution is inversely proportional to the dimension of the cell. Thus, the first step of variable time-step scheme is to find out the cell, which has the minimum cell volume ( ), and to calculate the local time-step of the cell as Eqs. (2.7) and (2.8). The time-step is also proportional to
Vmin
∆x and inversely proportional to the number density,
D
Umean and 2kT m are the mean and thermal velocity, respectively. Then, each local time-step∆tj of each cell can be assigned based on ∆tmin and the cell volume Vj as Eq. the variable time-step scheme for a simulated particle moves across the cell interface is illustrated in Fig. 2.5.
Basic idea of variable time-step method in PDSC is to enforce the flux conservation (mass, momentum and energy) of moving simulated particle when crossing the interface between two neighboring cells. If we scale the local cell time-step to the local cell size (or local mean free path), then the best way to enforce flux conservation is to change the particle weight factor without destroying or cloning the particles during particle movement across the cell interface. The cloning of particle can generally induce unpredictable random-walk effects in a statistical simulation like DSMC. One of the advantages in implementing the variable time-step scheme is to reduce both the simulated particle numbers and transient time-step to steady state, when the sampling normally starts in DSMC. This will result in appreciable time saving for the steady DSMC simulation. The net flux of the physical particles, including mass, momentum and kinetic energy, should be enforced conservation when a simulated particle crosses the cell interface from the cell 1 and to the cell 2. Thus,
u conserved flux quantity and time-step, respectively, and the numbers at subscript represents cell numbers. Note that A represents the area of cell interface between cell 1 and 2. N2 is number of the simulated particle in cell 2, which originated from cell 1.
There are several choices of the corresponding parameters to satisfy Eq. (2.9), with which we can play. The best choice is to set N2=1 (without particle cloning or destroy) and to keep without changing the velocity across the cell interface, Eq. (2.9) can be rewritten as Eq. (2.10)
2 1=Φ Φ
2
∆ will be the same for all cells throughout the computational domain.
Inserting Eqs. (2.8)~(2.10) into Eq. (2.2), the number of simulated particles per cell is,
D
Using this approach, resulting number of simulated particles per cell for the three-dimensional flow scales with ∆x (~3 V , Vc c is the cell volume) if cell size is proportional to the local mean free path, which otherwise scales with (∆x)2. In doing so, the simulated particle will only have to adapt its weight that is proportional to the size of time-step, which is approximately commensurable to the local mean free path if solution-based adaptive mesh is used. Of course, the remaining time for a simulated particle, when crossing cell interface, should be rescaled according to the ratio of time-steps in original and destination cells. In the PDSC, the procedure of variable time-step scheme is listed in the following;
1. Chose a minimum cell volume to calculate the reference time-step.
2. Assign the time-step for each cell based on the cell size.
3. Determine the particle weight for each cell by Eq. (2.10)
4. The time-step has to be modified if the particle crosses the cell interface.
By using this variable time-step scheme, the simulated particle number and transient time will be reduced to speed up the computing.
2-2-5 Parallel with Dynamic Domain Decomposition
To save the enormous computational cost of the standard DSMC code, a parallel DSMC with dynamic domain decomposition. Message passing interface (MPI) is used for data communication. This function can automatically repartition the graph domain according to the loading of each processor, which is the particle number of each cell, to achieve the load balancing of the simulation. It also can be used for other particle simulation and equation solvers.
This section presents an overview of the algorithms implemented of dynamic load decomposition scheme. The parallel performance will become worse resulting from the communication and the load unbalancing. Dynamic domain decomposition scheme for an unstructured mesh is implemented to speed up the parallel computing. Basic concept is the domain will be repartition when the loading of each processor is becoming unbalancing. The simulator aims to balance the number of particles on the sub-domains.
The flowchart with dynamic domain decomposition is shown as Fig. 2.6. The procedures of the flowchart are almost the same except some processes of dynamic domain decomposition method. There are three main processes, which are decision policy for repartitioning, repartition the domain and cell/particle migration, for dynamic domain decomposition.
2-2-6 Conservative Weighting Scheme
When the flow involving trace particle species, the simulation needs lots of simulated particles to satisfy the DSMC limit, which will lead to immense computational time. A weighting scheme is developed to deal with this kind of flows. The basic concept
is assigning the lower weight for trace particle species to create more simulated particles.
This method does not use particle cloning and destroying to avoid the statistical error.
To overcome the trace problem mentioned in the above, recently Boyd [Boyd, 1996]
proposed a conservative weighting scheme, which is described briefly in the following. In this method, each species has its weighting. Non-trace and trace species have larger (W1) and smaller (W2) weights (W2/W1 <1), respectively. The first stage of the conservative weighting scheme is to split the particle of abundant species (W1) into a particle with weight W2 (trace species) and a particle with weight of W1-W2 when two particles (trace and abundant species) collide. Then, a collision is then performed using the conventional DSMC procedure for the two particles that have the same weight W2. The final stage is to merge together the two particles that were split such that the each linear momentum in three physical directions is exactly conserved. The momentum conservative equation is shown as Eq. (2.12); collision partners, respectively. Eq. (2.12) means the non-trace particle (W
u2 m1 m2
1) is split into two parts imaginatively; one particle has weighting W2 and the other has weighting W1-W2. The first part will has elastic collision with the trace particle, but the second part
1) is split into two parts imaginatively; one particle has weighting W2 and the other has weighting W1-W2. The first part will has elastic collision with the trace particle, but the second part