PARCEL 1.2(Jan.31, 2023)

PARCEL 1.2 is available.

The following points have been corrected in this update.

  • 1. NVIDIA GPUs are supported by CUDA.

Download

Source code

download

Manual

Japanese version

html

English version

html

License

The source code is licensed under the GNU Lesser General Public License(LGPL)

Old Version

Click Here

Overview of PARCEL

Matrix solvers for simultaneous linear equation systems are classified into direct and iterative solvers. In most of extreme scale problems, iterative solvers based on Krylov subspace methods are essential from the viewpoints of the computational cost and the memory usage. The PARallel Computing ELements (PARCEL) library provides highly efficient parallel Krylov subspace solvers for modern massively parallel supercomputers, which are characterized by accelerated computation and less performance improvement in inter-node communication. The PARCEL is based on a hybrid parallel programming model with MPI+OpenMP, and supports the latest communication avoiding (CA) Krylov methods (the Chebyshev basis CG method [5,6], the CA-GMRES method [3,4]) in addition to the conventional Krylov subspace methods (the CG method, the BiCGstab method, and the GMRES method) [1,2]. In the CA-Krylov methods, the number of collective communication is reduced to avoid a communication bottleneck. The fine-block preconditioner [7] for SIMD operations and the Neumann series preconditioner [2] for GPU operations are available in addition to the conventional preconditioners (the point Jacobi preconditioner, the ILU preconditioners, the block Jacobi preconditioner, the additive Schwaltz preconditioner) [1,2]. QR factorization routines (the classical Gram-Schmidt method [1,2], the modified Gram-Schmitt method [1,2], the tall skinny QR method [3], the Cholesky QR method [8], the Cholesky QR2 method [9]) and an eigenvalue solver based on the CA-Arnoldi method [3], which are implemented for the CA-GMRES method, are also available. The PARCEL supports three matrix formats (Compressed Row Storage (CRS) format , Diagonal (DIA) format and Domain Decomposition Metod (DDM) format) and two data types (Double precision and Quadruple precision), and can be called from programs written in C and Fortran.

Performance

≪ Performance comparisons of CG solvers with block Jacobi preconditioners using 32 CPUs on BDEC-01/Wisteria-O (A64FX) at Univ. Tokyo. ≫
Three dimensional Poisson equation with the problem size of 768×768×768

PETSc
CRS
Block Jacobi Preconditioner
(all-MPI)

PARCEL
CRS
Block Jacobi Preconditioner
(MPI+OpenMP)

PARCEL
CRS
Fine-block Preconditioner
(MPI+OpenMP)

PARCEL
DDM
Fine-block Preconditioner
(MPI+OpenMP)

Elapse time [s]

75.08

73.30

36.68

11.31

Memory usage [GB]

382

194

194

166

iteration number

1632

1633

1905

1571

≪ Performance comparisons of CG solvers with block Jacobi preconditioners using 32 CPUs on HPE SGI8600 (Intel Xeon Gold 6248R) at Japan Atomic Energy Agency. ≫
Three dimensional Poisson equation with the problem size of 768×768×768

PETSc
CRS
Block-Jacobi Preconditioner
(all-MPI)

PARCEL
CRS
Block-Jacobi Preconditioner
(MPI+OpenMP)

PARCEL
CRS
Fine-block Preconditioner
(MPI+OpenMP)

PARCEL
DDM
Fine-block Preconditioner
(MPI+OpenMP)

Elapse time [s]

137.03

125.45

167.97

83.03

Memory usage [GB]

369

158

158

126

iteration number

1437

1438

1903

1536

≪ Performance comparisons of CG solvers with preconditioners using 32 GPUs on HPE SGI8600 (Intel Xeon Gold 6248R, NVIDIA Tesla V100 SXM2) at Japan Atomic Energy Agency. ≫
Three dimensional Poisson equation with the problem size of 768×768×768

PETSc
Block Jacobi
Preconditioner
(MPI+CUDA)

AmgX
Jacobi
Preconditioner
(MPI+CUDA)

PARCEL
CRS
Block Jacobi
Preconditioner
(MPI+CUDA)

PARCEL
CRS
Neumann series
Preconditioner
(MPI+CUDA)

PARCEL
DDM
Neumann series
Preconditioner
(MPI+CUDA)

Elapse time [s]

58.31

9.45

58.25

11.88

7.89

Memory usage [GB]
CPU
GPU


282
222


40
167


191
269


122
269


91
151

iteration number

882

982

883

982

982

Reference

[1] R. Barret, M. Berry, T. F. Chan, et al., "Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods", SIAM(1994)
[2] Y. Saad, "Iterative methods for sparse linear systems", SIAM (2003)
[3] M. Hoemmen, "Communication-avoiding Krylov subspace methods". Ph.D.thesis, University of California, Berkeley(2010)
[4] Y. Idomura, T. Ina, A. Mayumi, et al., "Application of a communication-avoiding generalized minimal residual method to a gyrokinetic five dimensional eulerian code on many core platforms",ScalA 17:8th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems, pp.1-8, (2017).
[5] R. Suda, L. Cong, D. Watanabe, et al., "Communication-Avoiding CG Method: New Direction of Krylov Subspace Methods towards Exa-scale Computing", RIMS Kokyuroku ,pp. 102-111, (2016).
[6] Y. Idomura, T. Ina, S. Yamashita, et al., "Communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations". ScalA 18:9th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems,pp. 17-24.(2018)
[7] T. Ina, Y. Idomura , T. Imamura, S.Yamashita, and N. Onodera, "Iterative methods with mixed-precision preconditioning for ill-conditioned linear systems in multiphase CFD simulations", ScalA21:12th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems .(2021)
[8] A. Stathopoulos, K. Wu, "A block orthogonalization procedure with constant synchronization requirements". SIAM J. Sci. Comput. 23, 2165–2182 (2002)
[9] T. Fukaya, Y. Nakatsukasa, Y. Yanagisawa, et al., "CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System," ScalA 14:5th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems, pp. 31-38,(2014)

Developer

Computer Science Research and Development Office, Center for Computational Science & e-Systems, Japan Atomic Energy Agency

Contact

ccse-quad(at)ml.jaea.go.jp
※ Substitute @ for (at).

 

About CCSE



Recruitment

Access/Inquiries

Supercomputers

Research Overview

Results

Public Software

Meetings

copyright © 2020 CCSE. All Rights Reserved.

-->