GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques

Lianhua Zhu, Peng Wang, Songze Chen, Zhaoli Guo, Yonghao Zhang

Research output: Contribution to journalArticle

Abstract

This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of GPUs shows that the two main kernel functions can utilize 56% ~ 79% of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 ~ 2.8 and 1.2 ~ 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores.
LanguageEnglish
Article number106861
Number of pages14
JournalComputer Physics Communications
Volume245
Early online date14 Aug 2019
DOIs
Publication statusE-pub ahead of print - 14 Aug 2019

Fingerprint

Kinetic theory of gases
Data storage equipment
kinetics
gases
cavity flow
messages
Program processors
Message passing
kernel functions
rarefied gases
Graphics processing unit
Kinetics
iteration
gas flow
resources
simulation
grids
Flow of gases

Keywords

  • GPU
  • CUDA
  • discrete velocity method
  • gas-kinetic equation
  • high performance computing

Cite this

@article{a07777caac214314ad81f55ba6206375,
title = "GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques",
abstract = "This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of GPUs shows that the two main kernel functions can utilize 56{\%} ~ 79{\%} of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 ~ 2.8 and 1.2 ~ 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores.",
keywords = "GPU, CUDA, discrete velocity method, gas-kinetic equation, high performance computing",
author = "Lianhua Zhu and Peng Wang and Songze Chen and Zhaoli Guo and Yonghao Zhang",
year = "2019",
month = "8",
day = "14",
doi = "10.1016/j.cpc.2019.106861",
language = "English",
volume = "245",
journal = "Computer Physics Communications",
issn = "0010-4655",

}

GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques. / Zhu, Lianhua; Wang, Peng; Chen, Songze; Guo, Zhaoli; Zhang, Yonghao.

In: Computer Physics Communications, Vol. 245, 106861, 31.12.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques

AU - Zhu, Lianhua

AU - Wang, Peng

AU - Chen, Songze

AU - Guo, Zhaoli

AU - Zhang, Yonghao

PY - 2019/8/14

Y1 - 2019/8/14

N2 - This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of GPUs shows that the two main kernel functions can utilize 56% ~ 79% of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 ~ 2.8 and 1.2 ~ 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores.

AB - This paper presents a Graphics Processing Unit (GPU) acceleration of an iteration-based discrete velocity method (DVM) for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work is based on a fast converging iterative scheme. The memory reduction techniques previously proposed for DVM are applied for GPU computing, enabling full three-dimensional (3D) solutions of kinetic model equations in the contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the direct simulation Monte Carlo (DSMC) simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with the phase-space grid points up to 0.7 trillion. The computing performance profiling on three models of GPUs shows that the two main kernel functions can utilize 56% ~ 79% of the GPU computing and memory resources. The performance of the GPU algorithm is compared with a typical parallel CPU implementation of the same algorithm using the Message Passing Interface (MPI). The comparison shows that the GPU program on K40 and K80 achieves 1.2 ~ 2.8 and 1.2 ~ 2.4 speedups for the 3D lid-driven cavity flow, respectively, compared with the MPI parallelized CPU program running on 96 CPU cores.

KW - GPU

KW - CUDA

KW - discrete velocity method

KW - gas-kinetic equation

KW - high performance computing

U2 - 10.1016/j.cpc.2019.106861

DO - 10.1016/j.cpc.2019.106861

M3 - Article

VL - 245

JO - Computer Physics Communications

T2 - Computer Physics Communications

JF - Computer Physics Communications

SN - 0010-4655

M1 - 106861

ER -