Scaling soft matter physics to thousands of graphics processing units in parallel

Alan Gray, Alistair Hart, Oliver Henrich, Kevin Stratford

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

We describe a multi-graphics processing unit (GPU) implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original central processing unit (CPU) version with GPU functionality in a maintainable fashion. We present several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each time step. We detail our halo-exchange communication phase for the code, which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5-5 times faster that the CPU version (on fully utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs.

LanguageEnglish
Pages274-283
Number of pages10
JournalInternational Journal of High Performance Computing Applications
Volume29
Issue number3
Early online date25 Mar 2015
DOIs
Publication statusPublished - 1 Aug 2015

Fingerprint

Graphics Processing Unit
Physics
Scaling
scaling
Program processors
central processing units
Complex Fluids
Unit
fluids
Memory Hierarchy
Lattice Boltzmann
Data Transfer
Fluids
fluid dynamics
Fluid Dynamics
hierarchies
Overlapping
Graphics processing unit
halos
Tuning

Keywords

  • compute unified device architecture
  • fluid dynamics
  • Lattice Boltzmann
  • molecular dynamics
  • MPI
  • parallel scaling

Cite this

@article{30671f233f1e46fca32939bbaf34e812,
title = "Scaling soft matter physics to thousands of graphics processing units in parallel",
abstract = "We describe a multi-graphics processing unit (GPU) implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original central processing unit (CPU) version with GPU functionality in a maintainable fashion. We present several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each time step. We detail our halo-exchange communication phase for the code, which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5-5 times faster that the CPU version (on fully utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs.",
keywords = "compute unified device architecture, fluid dynamics, Lattice Boltzmann, molecular dynamics, MPI, parallel scaling",
author = "Alan Gray and Alistair Hart and Oliver Henrich and Kevin Stratford",
year = "2015",
month = "8",
day = "1",
doi = "10.1177/1094342015576848",
language = "English",
volume = "29",
pages = "274--283",
journal = "International Journal of High Performance Computing Applications",
issn = "1094-3420",
number = "3",

}

Scaling soft matter physics to thousands of graphics processing units in parallel. / Gray, Alan; Hart, Alistair; Henrich, Oliver; Stratford, Kevin.

In: International Journal of High Performance Computing Applications, Vol. 29, No. 3, 01.08.2015, p. 274-283.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Scaling soft matter physics to thousands of graphics processing units in parallel

AU - Gray, Alan

AU - Hart, Alistair

AU - Henrich, Oliver

AU - Stratford, Kevin

PY - 2015/8/1

Y1 - 2015/8/1

N2 - We describe a multi-graphics processing unit (GPU) implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original central processing unit (CPU) version with GPU functionality in a maintainable fashion. We present several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each time step. We detail our halo-exchange communication phase for the code, which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5-5 times faster that the CPU version (on fully utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs.

AB - We describe a multi-graphics processing unit (GPU) implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original central processing unit (CPU) version with GPU functionality in a maintainable fashion. We present several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each time step. We detail our halo-exchange communication phase for the code, which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5-5 times faster that the CPU version (on fully utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs.

KW - compute unified device architecture

KW - fluid dynamics

KW - Lattice Boltzmann

KW - molecular dynamics

KW - MPI

KW - parallel scaling

UR - http://www.scopus.com/inward/record.url?scp=84938086730&partnerID=8YFLogxK

UR - http://journals.sagepub.com/home/hpc

U2 - 10.1177/1094342015576848

DO - 10.1177/1094342015576848

M3 - Article

VL - 29

SP - 274

EP - 283

JO - International Journal of High Performance Computing Applications

T2 - International Journal of High Performance Computing Applications

JF - International Journal of High Performance Computing Applications

SN - 1094-3420

IS - 3

ER -