This chapter describes the iterative methods contained in this library. The following notation is used, represents the linear system which will be solved, with being an nn sparse matrix and representing the initial vector. represents an appropriate preconditioner.
The Conjugate Gradient (CG) method is an iterative Krylov subspace algorithm for solving symmetric matrices.
The Biconjugate Gradient Stabilized (Bi-CGSTAB) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices.
The Generalized Minimum Residual (GMRES(m)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. GMRES(m) is restarted periodically after iterations if it did not converge within iterations, the solution obtained until iterations will be used as an input for the new restart cycle. PARCEL provides variants of GMRES with the Classical Gram-Schmidt (sometimes referred to as standard Gram-Schmidt) and Modified Gram-Schmidt (MGS) orthogonalization methods.
is an upper triangular matrix with representing its elements, represents a vector consisting of the first i elements of the vector .
The Communication Avoiding Generalized Minimum Residual (CA-GMRES(s,t)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. The calculation which is done in iterations in the GMRES(m) method is equivalent to one iteration of the CA-GMRES method. CA-GMRES(s,t) is restarted periodically after iterations if it did not converge within iterations, the solution obtained until iterations will be used as an input for the new restart cycle. CA-GMRES(s,t) is equivalent to GMRES(m) if and are chosen so that . The convergence property of CA-GMRES(s,t) is the same as GMRES(m) in exact arithmetic. In addition, the number of global collective communication calls is reduced by communication avoiding QR factorization. As CA-GMRES(s,t) produces basis vectors at once, when is too large, the linear independency of the basic vectors may become worse because of round off errors, leading to worse convergence property than GMRES(m). In order to improve the orthogonality of the basic vectors, the PARCEL implementation of CA-GMRES(s,t) provides an option to use the Newton basis in addition to the monomial basis.
represents a unit vector, represents the -th row and -th column element of the matrix , and represents the eigenvalues obtained from the Hessenberg matrix, which is generated by the iteration process.
The elements of the matrix are represented as
The elements of the matrix are represented as .
The Chebyshev basis Conjugate Gradient (CBCG) method is an iterative Krylov subspace algorithm for solving symmetric matrices. The CBCG method calculates iterations of the CG method in one iteration. By this approach, the CBCG method reduces the global collective communication calls. In order to use the Chebyshev basis the largest and smallest eigenvalues are needed. PARCEL provides two methods to obtain these eigenvalues, the so called power method and the communication avoiding Arnoldi method.
, represent the largest and smallest eigenvalues of .
In iterative algorithms, the application of a preconditioner of the form can help to reduce the number of iterations until convergence. The notation [] means [ with respect to the vector ]. Although the more accurate approximation leads to the less number of iterations until convergence, the cost of preconditioner is normally increased. In non-parallel computing, one of the most common preconditioners is the incomplete LU factorization (ILU). However, in parallel computing, parallel preconditioners are needed. PARCEL provides the following parallel preconditioners: Point Jacobi, Block Jacobi and Additive Schwarz.
The point Jacobi preconditioner is one of the most simplest preconditioners. only consists of the diagonal elements of the matrix . Compared to other preconditioners the efficiency of the point Jacobi preconditioner is very low. However, it can be easily applied in parallel and it does not require any additional communication between processors.
LU factorization decompose a square matrix into a lower triangular matrix and an upper triangular matrix . For a typical sparse matrix, the LU factors can be much less sparse than the original matrix, which is called as fill-in. This increases the memory and computational requirements. In order to avoid this issue, PARCEL provides the zero fill-in incomplete LU factorization (ILU(0)).
The diagonal incomplete LU factorization (D-ILU) computes a diagonal matrix , a lower triangular matrix and an upper triangular matrix for the input matrix . The preconditioner is constructed with , and () with: Two different conditions exist in order to construct the diagonal matrix :
Condition 1: The diagonal elements of equal those of :
Condition 2: The row sum of equals the row sum of :
Given the conditions above can be computed as follows:
If condition 1 should be fullfilled the following computation is used:
If condition 2 should be fullfilled the following computation is used:
The Block Jacobi Preconditioner constructs out of the diagonal blocks of the matrix . For each block, the incomplete ILU factorization is computed. When the preconditioner is applied, each block can be processed independently, resulting in a high level of parallelism. In addition, no communication is needed when each block is defined within a sub-matrix on each processor.
The Additive Schwarz Preconditioner constructs out of the overlapping diagonal blocks of . For each block the incomplete LU factorization is computed. Compared to the Block Jacobi Preconditioner, the Additive Schwarz Preconditioner may require additional communication. PARCEL provides four different overlapping methods, BASIC, RESTRICT, INTERPOLATE and NONE.
Fig.1 Additive Schwarz Preconditioner with overlapping diagonal blocks |
Solve with an extended including an overlapping region with the neighboring process, in the overlapping region is determined by summing up the results between the neighboring region.
Fig.2 BASIC |
Solve with an extended including an overlapping region with the neighboring process, in the overlapping region is determined without taking account of the result in the neighboring region.
Fig.3 RESTRICT |
Solve with an extended without an overlapping region with the neighboring process, in the overlapping region is determined by summing up the results between the neighboring regions.
Fig.4 INTERPOLATE |
Solve with an extended without an overlapping region with the neighboring process, in the overlapping region is determined without taking account of the result in the neighboring regions.
Fig.5 NONE |
In order to save memory in sparse matrix computation, only the non-zero elements and their locations are stored. PARCEL supports different sparse matrix formats.
The CRS format compresses the row indices of the non-zero elements of the sparse matrix. Together with the compressed rows, the column index and the value for each non-zero element are stored in one dimensional arrays.
Values of non-zero elements ::
Compressed row indices ::
Column indices of non-zero elements ::
Values of non-zero elements ::
Compressed row indices ::
Column indices of non-zero elements ::
Values of non-zero elements ::
Compressed row indices ::
Column indices of non-zero elements ::
The Diagonal Format stores the non-zero elements of a matrix as diagonals. This format provides high performance for band block diagonal matrices. In addition to the element values, an offset value for each diagonal of the matrix needs to be stored inside an one dimensional array. A negative offset indicates that the diagonal is below the main diagonal of the matrix, a positive offset indicates that the diagonal is above the main diagonal. The main diagonal has the offset zero.
Elements ::
Offsets ::
Number of diagonals ::
Elements ::
Offsets ::
Number of diagonals ::
Elements ::
Offsets ::
Number of diagonals ::
This chapter explains how to compile and install the PARCEL library.
After extracting the PARCEL library, the directory structure should be the same as in Fig.6.
Fig.6 The PARCEL directory structure |
PARCEL requires a C compiler, a FORTRAN compiler, which supports preprocessor, MPI, LAPACK, and OpenMP. When compiling with , the option has to be specified to to activate preprocessor. Compiler options can be changed inside the file arch/make_config. An example of a make_config file is shown in Fig.7. The make command should be executed from the top directory of the extracted file. In order to re-compile, followed by should be executed. If the command was successful, each directory should contain an executable file. 64-bit integer is only needed when working with over roughly 2 billion unknowns.
Fig.7 Example of a make_config file |
In order to call PARCEL library, programs need to be linked against the library files inside the directory.
The PARCEL routines with direct call interfaces are shown below.
call parcel_dcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )
Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA(nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
call parcel_dcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )
Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA(n*nnd) | double precision | in | Non-zero elements of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
call parcel_ddcg( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )
Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx_hi(n) | double precision | in/out | in: Initial vector (upper bits), out: Solution vector (upper bits) |
vecx_lo(n) | double precision | in/out | in: initial value of the iteration (lower bit), out: solution vector (lower bit) |
vecb_hi(n) | double precision | in | Right hand side vector (upper bit) |
vecb_lo(n) | double precision | in | Right hand side vector (lower bit) |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA_hi(nnz) | double precision | in | Non-zero elements (upper bits) of matrix stored in the CRS format |
crsA_lo(nnz) | double precision | in | Non-zero elements (lower bit) of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
PRECISION_A | integer | in | Precision of matrix (1: double precision, 2: quad precision) |
PRECISION_b | integer | in | Precision of right hand side vector (1: double precision, 2: quad precision) |
PRECISION_x | integer | in | Precision of solution vector (1: double precision, 2: quad precision) |
PRECISION_PRECONDITION | integer | in | Precision of preconditioner (1: double precision, 2: quad precision) |
call parcel_ddcg_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )
Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx_hi(n) | double precision | in/out | in: Initial vector (upper bits), out: Solution vector (upper bits) |
vecx_lo(n) | double precision | in/out | in: initial value of the iteration (lower bit), out: solution vector (lower bit) |
vecb_hi(n) | double precision | in | Right hand side vector (upper bit) |
vecb_lo(n) | double precision | in | Right hand side vector (lower bit) |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA_hi(n*nnd) | double precision | in | Non-zero elements (upper bits) of matrix stored in the Diagonal format |
diaA_lo(n*nnd) | double precision | in | Non-zero elements (lower bit) of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
PRECISION_A | integer | in | Precision of matrix (1: double precision, 2: quad precision) |
PRECISION_b | integer | in | Precision of right hand side vector (1: double precision, 2: quad precision) |
PRECISION_x | integer | in | Precision of solution vector (1: double precision, 2: quad precision) |
PRECISION_PRECONDITION | integer | in | Precision of preconditioner (1: double precision, 2: quad precision) |
call parcel_dbicgstab( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )
Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are storedin the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA(nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
call parcel_dbicgstab_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )
Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA(n*nnd) | double precision | in | Non-zero elements of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
call parcel_ddbicgstab( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )
Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx_hi(n) | double precision | in/out | in: Initial vector (upper bits), out: Solution vector (upper bits) |
vecx_lo(n) | double precision | in/out | in: initial value of the iteration (lower bit), out: solution vector (lower bit) |
vecb_hi(n) | double precision | in | Right hand side vector (upper bit) |
vecb_lo(n) | double precision | in | Right hand side vector (lower bit) |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA_hi(nnz) | double precision | in | Non-zero elements (upper bits) of matrix stored in the CRS format |
crsA_lo(nnz) | double precision | in | Non-zero elements (lower bit) of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
PRECISION_A | integer | in | Precision of matrix (1: double precision, 2: quad precision) |
PRECISION_b | integer | in | Precision of right hand side vector (1: double precision, 2: quad precision) |
PRECISION_x | integer | in | Precision of solution vector (1: double precision, 2: quad precision) |
PRECISION_PRECONDITION | integer | in | Precision of preconditioner (1: double precision, 2: quad precision) |
call parcel_ddbicgstab_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )
Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx_hi(n) | double precision | in/out | in: Initial vector (upper bits), out: Solution vector (upper bits) |
vecx_lo(n) | double precision | in/out | in: initial value of the iteration (lower bit), out: solution vector (lower bit) |
vecb_hi(n) | double precision | in | Right hand side vector (upper bit) |
vecb_lo(n) | double precision | in | Right hand side vector (lower bit) |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA_hi(n*nnd) | double precision | in | Non-zero elements (upper bits) of matrix stored in the Diagonal format |
diaA_lo(n*nnd) | double precision | in | Non-zero elements (lower bit) of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
PRECISION_A | integer | in | Precision of matrix (1: double precision, 2: quad precision) |
PRECISION_b | integer | in | Precision of right hand side vector (1: double precision, 2: quad precision) |
PRECISION_x | integer | in | Precision of solution vector (1: double precision, 2: quad precision) |
PRECISION_PRECONDITION | integer | in | Precision of preconditioner (1: double precision, 2: quad precision) |
call parcel_dgmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )
Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA(nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
GMRES_m | integer | in | Number of iterations until restart |
GMRES_GSflag | integer | in | Flag for orthogonalization algorithm (1: MGS, 2: CGS) |
call parcel_dgmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )
Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA(n*nnd) | double precision | in | Non-zero elements of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
GMRES_m | integer | in | Number of iterations until restart |
GMRES_GSflag | integer | in | Flag for orthogonalization algorithm (1: MGS, 2: CGS) |
call parcel_ddgmres( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )
Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx_hi(n) | double precision | in/out | in: Initial vector (upper bits), out: Solution vector (upper bits) |
vecx_lo(n) | double precision | in/out | in: initial value of the iteration (lower bit), out: solution vector (lower bit) |
vecb_hi(n) | double precision | in | Right hand side vector (upper bit) |
vecb_lo(n) | double precision | in | Right hand side vector (lower bit) |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA_hi(nnz) | double precision | in | Non-zero elements (upper bits) of matrix stored in the CRS format |
crsA_lo(nnz) | double precision | in | Non-zero elements (lower bit) of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
GMRES_m | integer | in | Number of iterations until restart |
GMRES_GSflag | integer | in | Flag for orthogonalization algorithm (1: MGS, 2: CGS) |
PRECISION_A | integer | in | Precision of matrix (1: double precision, 2: quad precision) |
PRECISION_b | integer | in | Precision of right hand side vector (1: double precision, 2: quad precision) |
PRECISION_x | integer | in | Precision of solution vector (1: double precision, 2: quad precision) |
PRECISION_PRECONDITION | integer | in | Precision of preconditioner (1: double precision, 2: quad precision) |
call parcel_ddgmres_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )
Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx_hi(n) | double precision | in/out | in: Initial vector (upper bits), out: Solution vector (upper bits) |
vecx_lo(n) | double precision | in/out | in: initial value of the iteration (lower bit), out: solution vector (lower bit) |
vecb_hi(n) | double precision | in | Right hand side vector (upper bit) |
vecb_lo(n) | double precision | in | Right hand side vector (lower bit) |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA_hi(n*nnd) | double precision | in | Non-zero elements (upper bits) of matrix stored in the Diagonal format |
diaA_lo(n*nnd) | double precision | in | Non-zero elements (lower bit) of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
GMRES_m | integer | in | Number of iterations until restart |
GMRES_GSflag | integer | in | Flag for orthogonalization algorithm (1: MGS, 2: CGS) |
PRECISION_A | integer | in | Precision of matrix (1: double precision, 2: quad precision) |
PRECISION_b | integer | in | Precision of right hand side vector (1: double precision, 2: quad precision) |
PRECISION_x | integer | in | Precision of solution vector (1: double precision, 2: quad precision) |
PRECISION_PRECONDITION | integer | in | Precision of preconditioner (1: double precision, 2: quad precision) |
call parcel_dcbcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )
The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA(nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
CBCG_kstep | integer | in | Number of communication avoiding steps |
CBCG_Eigenflag | integer | in | Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI) |
power_method_itrmax | integer | in | Maximum number of iterations in the power method |
CAARNOLDI_sstep | integer | in | Number of communication avoiding steps s of the CA-ARNOLDI method |
CAARNOLDI_tstep | integer | in | Number of outer iterations t in the CA-ARNOLDI method (restart length = st) |
CAARNOLDI_basis | integer | in | Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis) |
CAARNOLDI_QRflag | integer | in | Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2) |
call parcel_dcbcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )
The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the Diagonal format
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA(n*nnd) | double precision | in | Non-zero elements of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
CBCG_kstep | integer | in | Number of communication avoiding steps |
CBCG_Eigenflag | integer | in | Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI) |
power_method_itrmax | integer | in | Maximum number of iterations in the power method |
CAARNOLDI_sstep | integer | in | Number of communication avoiding steps s of the CA-ARNOLDI method |
CAARNOLDI_tstep | integer | in | Number of outer iterations t in the CA-ARNOLDI method (restart length = st) |
CAARNOLDI_basis | integer | in | Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis) |
CAARNOLDI_QRflag | integer | in | Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2) |
call parcel_dcagmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )
Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element valuesin the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA(nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
CAGMRES_sstep | integer | in | Number of communication avoiding steps s in the CA-GMRES method |
CAGMRES_tstep | integer | in | Number of outer iterations t in the CA-GMRES method (restart length = st) |
CAGMRES_basis | integer | in | Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis) |
CAGMRES_QRflag | integer | in | Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2) |
call parcel_dcagmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )
Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values in the Diagonal format
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in/out | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA(n*nnd) | double precision | in | Non-zero elements of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
iflagAS | integer | in | Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE) |
itrmax | integer | in/out | in: maximum number of iterations, out: number of iterations |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
reshistory(itrmax) | double precision | out | History of relative residual error |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
iret | integer | out | Error code (0:normal) |
CAGMRES_sstep | integer | in | Small communication step of CA-GMRES method |
CAGMRES_tstep | integer | in | List step of the CA-GMRES method |
CAGMRES_basis | integer | in | Base generation flag of CA-GMRES method (0: single basis, 1: newton basis) |
CAGMRES_QRflag | integer | in | QR Decomposition flag of CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2) |
The PARCEL routines with indirect call interfaces are shown below.
call PARCEL_KSP_Default_Setting(method)
Apply initial settings to Struct in the PARCEL format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
call set_KSP_CRS(icomm,vecx,vecb,n,gn,nnz,istart,crsA,crsRow_ptr,crsCol,itrmax,ipreflag,ILU_method,addL,method)
Set matrix in the CRS format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
nnz | integer*4 / integer*8 | in | Number of non-zero elements on each process |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
crsA(nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
crsRow_ptr(n+1) | integer*4 / integer*8 | in | Pointer table in the CRS format |
crsCol(nnz) | integer*4 / integer*8 | in | Column numbers of non-zero elements in the CRS format |
itrmax | integer | in | in: maximum number of iterations, out: number of iterations |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
method | type(KSP) | in/out | Struct in the PARCEL format |
call set_KSP_DIA(icomm,vecx,vecb,n,gn,istart,diaA,offset,nnd,itrmax,ipreflag,ILU_method,addL,method)
Set matrix in the Diagonal format.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
icomm | integer | in | MPI communicator |
vecx(n) | double precision | in | in: initial vector, out: solution vector |
vecb(n) | double precision | in | Right hand side vector |
n | integer*4 / integer*8 | in | Size of vector on each process |
gn | integer*4 / integer*8 | in | Total size of vector |
istart | integer*4 / integer*8 | in | Start line of matrix on each process |
diaA(n*nnd) | double precision | in | Non-zero elements of matrix stored in the Diagonal format |
offset(nnd) | integer*4 / integer*8 | in | Offset of each non-zero diagonal elements array in the Diagonal format |
nnd | integer*4 / integer*8 | in | Number of diagonal elements arrays in the Diagonal format |
itrmax | integer | in | in: maximum number of iterations, out: number of iterations |
ipreflag | integer | in | Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match) |
addL | integer | in | Overlapping width in the additive Schwartz method |
method | type(KSP) | in/out | Struct in the PARCEL format |
call set_vecx_KSP(n,x,method)
Set initial vector.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
n | integer*4 / integer*8 | in | Size of vector on each process |
x(n) | double precision | in | Initial vector |
method | type(KSP) | in/out | Struct in the PARCEL format |
call set_vecb_KSP(n,b,method)
Set right hand side vector b.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
n | integer*4 / integer*8 | in | Size of vector on each process |
b(n) | double precision | in | Right hand side vector |
method | type(KSP) | in/out | Struct in the PARCEL format |
call get_vecx_KSP(n,x,method)
Get solution vector.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
n | integer*4 / integer*8 | in | Size of vector on each process |
x(n) | double precision | out | Solution vector |
method | type(KSP) | in | Struct in the PARCEL format |
call get_reshistory_KSP(niter,reshistory, method)
Get convergence history.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
niter | integer | out | Number of iterations until convergence |
reshistory(method%itrmax) | double precision | out | Convergence history |
method | type(KSP) | in | Struct in the PARCEL format |
call reset_CRS_Mat(itrmax,ipreflag,ILU_method,addL,crsA,method)
Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the CRS format without changing the matrix structure or the positions of non-zero elements.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
itrmax | integer | in | Maximum number of iterations (-1: keep the existing setting) |
ipreflag | integer | in | Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row) |
addL | integer | in | Overlapping width in the additive Schwartz method (-1: keep the existing setting) |
crsA(method%nnz) | double precision | in | Non-zero elements of matrix stored in the CRS format |
method | type(KSP) | in/out | Struct in the PARCEL format |
call reset_DIA_Mat(itrmax,ipreflag,ILU_method,addL,diaMat,method)
Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the Diagonal format without changing the matrix structure or the positions of non-zero elements.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
itrmax | integer | in | Maximum number of iterations (-1: keep the existing setting) |
ipreflag | integer | in | Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz) |
ILU_method | integer | in | Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row) |
addL | integer | in | Overlapping width in the additive Schwartz method (-1: keep the existing setting) |
diaMat(method%n*method%nnd) | double precision | in | Non-zero elements of matrix stored in the CRS format |
method | type(KSP) | in/out | Struct in the PARCEL format |
call set_KSP_solver(method,solver)
Set Krylov subspace method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
solver | integer | in | Flag for Krylov subspace method (1: CG, 2: BiCGGSTAB, 3: GMRES, 4: CA-GMRES, 5: CBCG) |
call set_KSP_abstolmax(method,abstolmax)
Set maximun tolerance in absolute value of residual error.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
abstolmax | double precision | in | Convergence criterion (norm of residual error) |
call set_KSP_rtolmax(method,rtolmax)
Set maximum tolerance in relative error.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
rtolmax | double precision | in | Convergence criterion (norm of relative residual error) |
call set_KSP_iovlflag(method,iovlflag)
Set communication-computation overlap method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
iovlflag | integer | in | Flag for communication-computation overlap (0: none, 1: all processes, 2: each process) |
call set_KSP_GMRES_m(method,GMRES_m)
Set the number of iterations m until restart in the GMRES(m) method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
GMRES_m | integer | in | Number of iterations until restart |
call set_KSP_GMRES_GSflag(method,GMRES_GSflag)
Set orthogonalization algorithm in the GMRES(m) method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
GMRES_GSflag | integer | in | Flag for orthogonalization algorithm (1: MGS, 2: CGS) |
call set_KSP_CAGMRES_sstep(method,CAGMRES_sstep)
Set number of communication avoiding steps s of CA-GMRES.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAGMRES_sstep | integer | in | Number of communication steps of the CA-GMRES method |
call set_KSP_CAGMRES_tstep(method,CAGMRES_tstep)
Set number of outer iterations t of CA-GMRES.)
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAGMRES_tstep | integer | in | Number of outer iterations t in the CA-GMRES method method (restart length = st) |
call set_KSP_CAGMRES_basis(method,CAGMRES_basis)
Set basis vector in the CA-GMRES method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAGMRES_basis | integer | in | Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis) |
call set_KSP_CAGMRES_QRflag(method,CAGMRES_QRflag)
Set QR factorization algorithm in the CA-GMRES method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAGMRES_QRflag | integer | in | Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2) |
call set_KSP_CBCG_kstep(method,CBCG_kstep)
Set number of communication avoiding steps in the CBCG method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CBCG_kstep | integer | in | Number of communication avoiding steps |
call set_KSP_CBCG_Eigenflag(method,CBCG_Eigenflag)
Set eigenvalue computation algorithm in the CBCG method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CBCG_Eigenflag | integer | in | Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI) |
call set_KSP_CAARNOLDI_sstep(method,CAARNOLDI_sstep)
Set number of communication avoiding steps s in the CA-ARNOLDI method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAARNOLDI_sstep | integer | in | Number of communication avoiding steps s of the CA-ARNOLDI method |
call set_KSP_CAARNOLDI_tstep(method,CAARNOLDI_tstep)
Set number of outer iterations t in the CA-ARNOLDI method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAARNOLDI_tstep | integer | in | Number of outer iterations t in the CA-ARNOLDI method (restart length = st) |
call set_KSP_CAARNOLDI_basis(method,CAARNOLDI_basis)
Set basis vector in the CA-ARNOLDI method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAARNOLDI_basis | integer | in | Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis) |
call set_KSP_CAARNOLDI_QRflag(method,CAARNOLDI_QRflag)
Set QR factorization algorithm in the CA-ARNOLDI method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
CAARNOLDI_QRflag | integer | in | Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2) |
call set_KSP_power_method_itrmax(method,power_method_itrmax)
Set maximum number of iterations in the power method.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | out | Struct in the PARCEL format |
power_method_itrmax | integer | in | Maximum number of iterations in the power method |
call START_KSP_SOLVER(method)
Solve a simultaneous linear equation system Ax = b.
Argument name (dimension) | Type | Input/Output | Description |
---|---|---|---|
method | type(KSP) | in/out | Struct in the PARCEL format |
The use of the PARCEL routines is explained by FORTRAN sample programs of a CG solver. Details of the usage of the PARCEL in C programs can be found in C sample programs stored in SPARSE/example_C in the archive. The PARCEL can be called also from C programs in the same manner, provided that a MPI communicator is prepared for FORTRAN routines. In the C sample programs, a MPI communicator created in C is converted to FORTRAN via MPI_Comm_c2f in the MPI library. In the C sample program for the indirect interface, a struct in the PARCEL format is generated in FORTRAN, and then, it is used in C programs.
Sample codes in the CRS format are shown in Fig.8 and Fig.9. Here, make_matrix_CRS denotes an arbitrary routine, which generates a matrix in the CRS format in PARCEL. Fig.8 and Fig.9 show examples with the direct and indirect interfaces, respectively. In Fig.9, a compiling option to include SPARSE/src is needed. When only values of matrix elements in , elements in , and the initial values in are changed with the same matrix structure or the non-zero element positions, in Fig.8, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.
program main use mpi implicit none integer n,gn,nnz,istart real*8,allocatable :: crsA(:) integer,allocatable :: crsRow_ptr(:),crsCol(:) real*8,allocatable :: vecx(:) real*8,allocatable :: vecb(:) integer itrmax real*8 rtolmax real*8, allocatable :: reshistory(:) integer ipreflag integer ILU_method integer iflagAS integer addL integer iovlflag integer iret integer ierr call MPI_Init(ierr) call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol) allocate(vecx(n)) allocate(vecb(n)) allocate(reshistory(itrmax)) ipreflag=0 ILU_method=1 addL=0 iflagAS=1 itrmax=100 rtolmax=1.0d-8 iovlflag=0 vecb=1.0d0 vecx=1.0d0 call parcel_dcg( & MPI_COMM_WORLD, & vecx,vecb, & n,gn,nnz,istart, & crsA,crsRow_ptr,crsCol, & ipreflag,ILU_method,addL,iflagAS, & itrmax,rtolmax, & reshistory, & iovlflag,iret & ) call MPI_Finalize(ierr) deallocate(vecx) deallocate(vecb) deallocate(crsA) deallocate(crsRow_ptr) deallocate(crsCol) deallocate(reshistory) end program main
Fig.8 Sample code in the CRS format (direct interface) |
program main use mpi use krylov use krylov_kernel implicit none integer n,gn,nnz,istart real*8,allocatable :: crsA(:) integer,allocatable :: crsRow_ptr(:),crsCol(:) real*8,allocatable :: vecx(:) real*8,allocatable :: vecb(:) integer itrmax real*8 rtolmax real*8, allocatable :: reshistory(:) integer ipreflag integer ILU_method integer iflagAS integer addL integer iovlflag integer iret integer ierr type(KSP) :: method call MPI_Init(ierr) call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol) allocate(vecx(n)) allocate(vecb(n)) allocate(reshistory(itrmax)) ipreflag=0 ILU_method=1 addL=0 iflagAS=1 itrmax=100 rtolmax=1.0d-8 iovlflag=0 vecb=1.0d0 vecx=1.0d0 call PARCEL_KSP_Default_Setting(method) call set_KSP_CRS(& MPI_COMM_WORLD, & vecx,vecb, & n,gn,nnz,istart, & crsA,crsRow_ptr,crsCol, & itrmax,ipreflag,ILU_method,addL,& method) call set_KSP_solver(method,1) call START_KSP_SOLVER(method) call get_vecx_KSP(n,vecx,method) call free_KSP(method) call MPI_Finalize(ierr) deallocate(vecx) deallocate(vecb) deallocate(reshistory) end program main
Fig.9 Sample code in the CRS format (indirect interface) |
Sample codes in the Diagonal format are shown in Fig.10 and Fig.11. Here, make_matrix_DIA denotes an arbitrary routine, which generates a matrix in the Diagonal format in PARCEL. Fig.10 and Fig.11 show examples with the direct and indirect interfaces, respectively. In Fig.11, a compiling option to include SPARSE/src is needed. When only values of matrix elements in , elements in , and the initial values in are changed with the same matrix structure or the non-zero element positions, in Fig.10, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.
program main use mpi implicit none integer n,gn,nnd,istart real*8,allocatable :: diaA(:) integer,allocatable :: offset(:) real*8,allocatable :: vecx(:) real*8,allocatable :: vecb(:) integer itrmax real*8 rtolmax real*8, allocatable :: reshistory(:) integer ipreflag integer ILU_method integer iflagAS integer addL integer iovlflag integer iret integer ierr call MPI_Init(ierr) call make_matrix_DIA(n,gn,nnd,istart,diaA,offset) allocate(vecx(n)) allocate(vecb(n)) allocate(reshistory(itrmax)) ipreflag=0 ILU_method=1 addL=0 iflagAS=1 itrmax=100 rtolmax=1.0d-8 iovlflag=0 vecb=1.0d0 vecx=1.0d0 call parcel_dcg_dia(& MPI_COMM_WORLD, & vecx,vecb,& n,gn,istart,& diaA,offset,nnd,& ipreflag,ILU_method,addL,iflagAS,& itrmax,rtolmax,& reshistory,& iovlflag,iret& ) call MPI_Finalize(ierr) deallocate(vecx) deallocate(vecb) deallocate(diaA) deallocate(offset) deallocate(reshistory) end program main
Fig.10 Sample code in the Diagonal format (direct interface) |
program main use mpi use krylov use krylov_kernel implicit none integer n,gn,nnd,istart real*8,allocatable :: diaA(:) integer,allocatable :: offset(:) real*8,allocatable :: vecx(:) real*8,allocatable :: vecb(:) integer itrmax real*8 rtolmax real*8, allocatable :: reshistory(:) integer ipreflag integer ILU_method integer iflagAS integer addL integer iovlflag integer iret integer ierr call MPI_Init(ierr) call make_matrix_DIA(n,gn,nnd,istart,diaA,offset) allocate(vecx(n)) allocate(vecb(n)) allocate(reshistory(itrmax)) ipreflag=0 ILU_method=1 addL=0 iflagAS=1 itrmax=100 rtolmax=1.0d-8 iovlflag=0 vecb=1.0d0 vecx=1.0d0 call PARCEL_KSP_Default_Setting(method) call set_KSP_DIA(& MPI_COMM_WORLD, & vecx,vecb, & n,gn,istart, & diaA,offset,nnd,& itrmax,ipreflag,ILU_method,addL,& method) call set_KSP_solver(method,1) call START_KSP_SOLVER(method) call get_vecx_KSP(n,vecx,method) call free_KSP(method) call MPI_Finalize(ierr) deallocate(vecx) deallocate(vecb) deallocate(reshistory) end program main
Fig.11 Sample code in the Diagonal format (indirect interface) |