### Abstract

Language | English |
---|---|

Article number | 106861 |

Number of pages | 14 |

Journal | Computer Physics Communications |

Early online date | 14 Aug 2019 |

DOIs | |

Publication status | E-pub ahead of print - 14 Aug 2019 |

### Fingerprint

### Keywords

- GPU
- CUDA
- discrete velocity method
- gas-kinetic equation
- high performance computing

### Cite this

}

**GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques.** / Zhu, Lianhua; Wang, Peng; Chen, Songze; Guo, Zhaoli; Zhang, Yonghao.

Research output: Contribution to journal › Article

TY - JOUR

T1 - GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques

AU - Zhu, Lianhua

AU - Wang, Peng

AU - Chen, Songze

AU - Guo, Zhaoli

AU - Zhang, Yonghao

PY - 2019/8/14

Y1 - 2019/8/14

N2 - This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of GPUs shows that the two main kernel functions can utilize 56% ~ 79% of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 ~ 2.8 and 1.2 ~ 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores.

AB - This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of GPUs shows that the two main kernel functions can utilize 56% ~ 79% of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 ~ 2.8 and 1.2 ~ 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores.

KW - GPU

KW - CUDA

KW - discrete velocity method

KW - gas-kinetic equation

KW - high performance computing

U2 - 10.1016/j.cpc.2019.106861

DO - 10.1016/j.cpc.2019.106861

M3 - Article

JO - Computer Physics Communications

T2 - Computer Physics Communications

JF - Computer Physics Communications

SN - 0010-4655

M1 - 106861

ER -