• 沒有找到結果。

Chapter 4 Parallel Computing of DSMC

4.3 Verifications of the Parallel DS MC

1 (

min max

max W W

I =W − (4.5)

where Wmax and Wmin are the maximum and minimum particle numbers across the processor, respectively. W is the average particle numbers of each sub-domain. Figure 4.18 shows the variation of maximal imbalance with processor numbers for three different problem sizes. The solid and dash line represent the parallel simulation with static and dynamic domain decomposition, respectively. In general, the maximal imbalance developed by static domain decomposition (0.5-2) is much higher (2-6 times) than that by dynamic domain decomposition (0.1-0.5). In addition, workload imbalance developed very fast among processors as the number of processors increases, if dynamic domain decomposition is not used. Although the maximal degree of imbalance for the small problem size deteriorates with increasing number of processors, it is fairly constant within 0.4 for the medium and large problem sizes, which again demonstrates the effectiveness of the dynamic domain decomposition.

4.2.3 Simulations on IBM-S MP

Figure 4.19 presents the parallel performance by using static and dynamic domain decomposition on IBM -SMP with respect to small, medium and large problem size, respectively. The number of processors is up to 128. The trend are very similar to the simulation on IBM -SP2 and do not repeat here again.

In conclusion, a proposed DSM C code with parallel, dynamic domain decomposition method is implemented here to speed up the computing efficiency and presented by solving a two-dimensional driven cavity flow. The results show the present researches are very well and powerful, and the present method has potential to simulate more realistic physical problems.

4.3 Verifications of the Parallel DS MC

The simulated cases in Chapter 3, that is, a two-dimensional hypersonic flow past a cylinder, a two-dimensional hypersonic flow past a 15°-compression ramp, a three-dimensional supersonic flow past sphere, are again used here to compare the results using statistic and dynamic domain decomposition methods. Results are then compared with experimental data and previous simulation wherever available, while the description of flow physics of the test problems will be as brief as possible since it only serves to verify the applicability and its accuracy of the proposed method. Flow conditions and results for each case are described in the following in turn.

4.3.1 Two-Dimensional Flows Hypersonic Flow over a Cylinder Flow and Simulation Conditions

Flow conditions of this case are shown in Section 3.3.1. The adaptive mesh is used here to improve the accuracy of the solutions. The flow domain is divided to 64 sub-domains for the parallel computing and run on IBM -SP2 machine at NCHC in Taiwan.

Domain Decomposition

Figure 4.20 shows the partition history of the dynamic domain decomposition for this simulation. Large variation of sub-domain area in the initial domain decomposition results from the use of a solution-based adaptive mesh, which is obtained from a mesh adaptation module based on a preliminary parallel simulation. Constant time-step scheme is used in this simulation. For the initial domain decomposition, we have assigned the uniform weight of each cell to form the initial domain decomposition. The cell number in each sub-domain is approximately the same initially, although the size of each sub-domain is highly different. This is because the cells in front of the cylinder are smaller than those in the wake regions of the adaptive mesh. Figure 4.20(c) shows that the final decomposition, which adapts to the flow dynamics as simulation continues, is totally different from the initial decomposition. Although the interfaces are not smooth and there are several cells which are isolated from their parent sub-domain. These fragments can lead to more interfaces between sub-domains and increasing the cut edges. But the algorithm is still working for our code. The number of simulated particles is about 1.3 million and the number of sampling is 10,000 in this simulation.

Comparison of Properties Contours

Figures 4.21 and 4.22 illustrate the contours of normalized density and temperatures with static and dynamic domain decomposition methods. From these figures, the simulated results using static or dynamic domain decomposition are almost the same. It proves that the dynamic domain decomposition method can provide accurate results and reduces lots of computational time.

Centerline Properties Distribution

Figure 4.23 illustrates the computed centerline densities and temperatures (rotational and translational) using dynamic domain decomposition, respectively, along with the simulation data without dynamic domain decomposition and previous experimental data [79]. Density increases rapidly along the centerline in front of the

cylinder and becomes relatively small in the wake region. Temperature also increases rapidly along the centerline but decreases rapidly after the bow shock. Strong non-equilibrium between rotational and translational temperatures is found after the cylinder due to the highly rarefied conditions in the wake region. Nevertheless, agreement between the current simulation with static/dynamic domain decomposition and experimental data wherever available is excellent, considering the experimental uncertainties. In addition, the running times of static and dynamic domain decomposition are about 6,400 and 2,700 seconds. Thus, simulation using dynamic domain decomposition reduces the running time up to 60% in this case, as compared with that using static domain decomposition.

Hypersonic Flow Over a 15o-Compression Ramp Flow and Simulation Conditions

A flow over a 15o-compression ramp is also proposed as the second benchmark here to identify the validity of dynamic domain decomposition method. The problem and flow conditions are described in Section 3.3.1. The flow domain is also divided to 64 sub-domains and run on IBM -SP2 machine at NCHC.

Domain Decomposition

Figure 4.24 shows the initial and final domain decomposition for this simulation.

Similar to previous case, constant time-step scheme is used in this simulation. The initial computational domain is also partitioned by uniform weight. Thus, each sub-domain has an approximately cell number. Some large domains above the flat plate are found due to the rarefied conditions in the region (Fig. 4.24(a)). In addition, some small domains at the end of the ramp plate and shock regions are found, where the density is very high due to the oblique shock originating approximately from the ramp corner. The simulated particles are about two million and the number of sampling is 10,000.

Comparison of Properties Contours

Figures 4.25 and 4.26 illustrate the contours of normalized density and temperatures with static and dynamic domain decomposition methods. Again, the results by using dynamic domain decomposition scheme agree excellent with those with static domain decomposition scheme. Thus, the implemented dynamic domain decomposition method is a very powerful tool for simulations.

Centerline Properties Distribution

Figure 4.27 presents the pressure, shear stress and heat transfer coefficients along the solid wall using static and dynamic domain decomposition methods. Data include computational results without dynamic domain decomposition, previous DSM C results by Robinson [24] and experimental data by Holden [81]. Results show that pressure coefficient increases rapidly near the tip, declines slowly along the flat plate to a lowest value near the ramp corner and finally increases dramatically along the ramp wall.

Results generally agree with previous simulation and experimental data, although the current simulated data seems to agree favorably with experimental data along the ramp wall, as compared with those of Robinson's [24]. Final rapid decline of our simulated data should be attributed to the vacuum condition we have imposed at the outflow boundary. In addition, the results of along the solid wall that are obtained with and without dynamic domain decomposition coincides excellently with each other shear stress coefficient and heat transfer coefficient although there is no experimental or simulated results. The running time of static and dynamic domain decomposition are about 14,000 and 7,500 seconds. The simulation time required for dynamic domain decomposition is about 100% shorter than that required for static dynamic domain decomposition, which again justifies the current parallel implementation.

4.3.2 Three-Dimensional Flows Supersonic Flow over a Sphere Flow and Simulation Conditions

For a three-dimensional simulation, a supersonic flow over a sphere is proposed here again to recognize the validity of dynamic domain decomposition method. The reasons to choose this as the test problems are, first, there exist experimental data and, second, it is a good test for checking if the simulation can reproduce the flow symmetry.

Third, it can prove that the dynamic domain decomposition method can be easily extended to three-dimensional flow. Specific details describing the flow conditions are presented in Section 3.3.2. The flow domain is divided to 8 sub-domains and run on IBM -SP2 machine at NCHC. The number of simulated particles is about 1.7 million at steady state and the number of cells is approximately 164,000 after two levels of mesh refinement. 10,000 sampling are used to sample for obtaining averaged flow properties.

Domain Decomposition

Figure 4.28 illustrates the initial and final domain decomposition for the supersonic flow past a sphere on a reduced (1/16) computational domain surface. The initial domain decomposition (Fig. 4.28(a)) is obtained assigning equal weight to each cell. At

the final domain decomposition (Fig. 4.28(b)), the sub-main size in front of the sphere enlarges as compared with the initial sub-domain size, due to the application of variable time-step method and increased density in the stagnation and bow shock region. In this case, the running times of static and dynamic domain decomposition are about 15,000 and 11,100 seconds, respectively. The computational time is saved up to 35% using dynamic domain decomposition, as compared with that using static domain decomposition. We would expect much higher time saving of more processors are used.

The number of simulated particles is about 1.7 million and the number of sampling is 10,000.

Comparison of Properties Contours

Figures 4.29 and 4.30 illustrate the contours of the normalized density and temperatures on x-y plane with static and dynamic domain decomposition methods. In these figures, the results by using dynamic domain decomposition show excellent agreement with those by using static domain decomposition.

Centerline Properties Distribution

Normalized density along the stagnation line is presented in Fig.4.31. The previous experimental data by Russell [82] is included for comparisons. All the results using static and dynamic domain decomposition methods agree well with the experimental data [82].

4.4 Concluding Remarks

This chapter has presented a description of the dynamic domain decomposition method for the unstructured DSM C code. The aim of successful dynamically repartition the computational domain is implemented on parallel machines, including IBM series and PC cluster. The parallel performance of the dynamic domain decomposition (DDD) method is consistently and significantly superior to the static domain decomposition (SDD) method. The computational time of the 3-D supersonic sphere flow reduces about 35% by using dynamic domain decomposition method because more balancing load is achieved. Briefly speaking, the DSM C method with dynamic domain decomposition provides an efficient parallel computing and the results are almost the same as those with static domain decomposition. And the algorithm of this method can easily be applied to other particle methods without many modifications.

Chapter 5 Conservative Weighting Scheme

For traditional DSM C code, the weighting of each species is the same (constant weighting). That means each simulated particle represents the same number of real physical particles. It works perfectly when the quantities of components have the same order. But in many applications the species has very small quantity and it is very important. For example, the product of chemical vapor deposition (CVD) is an important factor for film deposition. In order to obtain enough sampling of the trace species, the number of non-trace particles will become very huge by using the constant weighting scheme. This will lead to an enormous computational cost. Thus, a species-dependent weighting scheme is necessary when the flow has trace species, which is introduced in this chapter.

5.1 Conservative Weighting S cheme (CWS )

To overcome the trace problem mentioned in the above, recently Boyd [68]

proposed a conservative weighting scheme, which is described briefly in the following.

In this method, each species has its weighting. Non-trace and trace species have larger (W1) and smaller (W2) weights (W2/W1 <1), respectively. The first stage of the conservative weighting scheme is to split the particle of abundant species (W1) into a particle with weight W2 (trace species) and a particle with weight of W1-W2 when two particles (trace and abundant species) collide. Then, a collision is then performed using the conventional DSM C procedure for the two particles that have the same weight W2. The final stage is to merge together the two particles that were split such that the each linear momentum in three physical directions is exactly conserved. The momentum conservative equation is shown as Eq. (5.1);

)] collision partners, respectively. Equation (5.1) means the non-trace particle (W1) is split into two parts imaginatively; one particle has weighting W2 and the other has weighting W1-W2. The first part will has elastic collision with the trace particle, but the second part remains the same situation. Thus, Eq. (5.1) can be rewritten as Eq. (5.2)

)]

u and 1 u are the post-collision velocities of the first part of non-trace particle and the 2' trace particle, respectively. Then the two parts of non-trace particle are merged as one particle

u and 1 u are the real final velocities of the non-trace and trace particles, respectively. 2' The momentum is conserved by Eq.(5.1)~Eq.(5.3).

Unfortunately, it does not explicitly conserve total energy. But the energy difference (loss) caused by this split-merge process is found to be proportional to the weight ratio W2/W1 (<1). The energy lost for each collision is calculated by Eq. (5.4),

] Thus, the conservative weighting scheme proposed by Boyd [68] nearly conserves total energy as this weight ratio is much smaller than unity. The split-merge process described in the above can be summarized as Figure 5.1.

It was argued that if the split-merge scheme is employed at each collision, then energy is continuously lost from the system because of energy loss [68]. Boyd also proposed some practical remedy to keep this energy loss to a minimum by adding lost energy to the central-mass energy in a subsequent collision. In general, this energy should only be added to collisions between particles both having the maximum weight used in the simulation to keep this effect a minimum; that is, between two non-trace particles. Thus, energy conservation is essentially maintained for each iterative step of the simulation.

5.2 Verifications of Conservative Weighting S cheme

In the following we are going to perform two simulations to verify the conservative weighting scheme. The first test case is the single-cell simulation using both conventional (constant weighting) and conservative weighting schemes without

chemical reaction. By comparing the velocity distribution and relative error, it can be clearly shown that CWS is much superior to the constant weighting scheme if trace species is involved. The second test case is a hypersonic flow over a 3-D cylinder. From this case we can see the CWS can get the same simulated results by using much less particles and computational time.

5.2.1 Maxwell-Boltzmann Distribution of A Single Cell Two-Component Mixture without Chemical Reaction

The first simulation includes two species: Ar and He with different weight ratios.

In this simulation, we would like to know the effectiveness and accuracy of the conservative weighting scheme. We perform the simulation with different weight ratios using both constant weighting scheme and conservative weighting scheme (CWS). The total simulated particles are 10,000 in the cell, and the weight ratios are WAr/WHe=0.1 (10%), 0.05 (5%) and 0.01 (1%), respectively. Note that the number in the parenthesis represents mole fraction of Ar. Thus, Ar is a potential trace species in this case. The temperature in the cell is preset as 1,000 K. Figures 5.2~5.4, with weight ratios of 1/9, 1/19 and 1/99, respectively, show the velocity distribution after 20,000 time-steps and the relative error as a function of simulation time. Note that the relative error is defined as the root mean square of the sum of deviation from the M -B distribution relative to the M axwell-Boltzmann distributions by dividing the velocity range (-5,000~5,000 m/s) into 50 sections. In these figures, it is clearly that the constant weighting scheme performs better than the CWS for the simulations with weight ratios of 1/9 and 1/19.

This is because CWS has energy lost from non-trace gas, and we must add the lost energy to non-trace gas. When the weight ratio is not small enough, the energy loss will be relatively large, and hence it will affect the energy distribution. In turn, the velocity distribution will deviate strongly from the M axwell-Boltzmann distribution, which it should be in the current test case. But when the weight ratio is less than 0.05, as shown in Figure 5.4, the CWS can perform as accurately as the constant weighting scheme.

Also the relative errors of both species by the CWS decrease to a reasonable low value in a few time-steps, while that of trace species by the constant weighting method maintains unacceptably large as simulation continues. This result shows not only the CWS is as accurate as the constant weighting scheme but is also very effective for dealing the case involving trace species with the weight ratio lower than 0.05.

Three-Component Mixture Without Chemical Reaction

This simulation includes three species, Ne, Ar and He, with weight ratio 1:99:9900

in order. The simulation conditions are similar to those of two-species case. The total simulated particles are still 10,000 and the temperature in the cell is kept at 1,000 K.

Figure 5.5(a) shows the velocity distribution after 20,000 time-steps. It clearly demonstrates that the constant weighting scheme is incapable of handling the most trace species because very few particles of Ne exist in the cell. Figure 5.5(b) shows the relative errors of both the trace species and the most trace species by the constant weighting scheme are too large and decrease very slowly. Thus, it is clear that the CWS performs much better than the constant weighting scheme if the trace species are involved in D SM C simulation.

5.2.2 Hypersonic Flow over a Quasi-2-D Cylinder

The second test case is a hypersonic flow over a quasi-2-D cylinder. The 1-level adaptive tetrahedral mesh is shown in Fig. 5.6. The flow conditions are listed as follows;

the diameter of the cylinder is 1 m, the total number density and mach number of the inflow are 1.29E19 and 20, respectively, the mole fraction of all these cases is 1:99, the temperatures of free-stream, stagnation and surface are 20, 1620 and 291.6 K in order.

Resulting Knudsen number based on free-stream condition and the diameter of the cylinder is 0.1. The conservative weighting scheme and the traditional scheme are used in these cases and the corresponding particle number are about 0.24 million and 10 million, respectively. Figure 5.7~5.9 are the contours of the number density for each species with Ar-He, Ar-Ne and O2-N2, respectively. From these figures, it is clear that the CWS can use few particle and computational time to obtain the same results by comparing those of traditional method.

5.3 Concluding Remarks

In the current study, the conservative weighting scheme (CWS) is assessed to handle the trace species in the DSM C simulation without chemical reaction using a single-cell and a quasi-2-D hypersonic cylinder flow by assigning lower particle weighting for trace species. The CWS is shown to be effective and accurate in D SM C simulation without chemical reaction for 2-species and 3-species mixtures with concentration ratio (trace to abundant) less than 0.05, which the constant weighting scheme is difficult to use practically. For the quasi-2-D, two-species hypersonic cylinder

In the current study, the conservative weighting scheme (CWS) is assessed to handle the trace species in the DSM C simulation without chemical reaction using a single-cell and a quasi-2-D hypersonic cylinder flow by assigning lower particle weighting for trace species. The CWS is shown to be effective and accurate in D SM C simulation without chemical reaction for 2-species and 3-species mixtures with concentration ratio (trace to abundant) less than 0.05, which the constant weighting scheme is difficult to use practically. For the quasi-2-D, two-species hypersonic cylinder