1 Krylov subspace methods

This chapter describes the iterative methods contained in this library. The following notation is used, $Ax = b$ represents the linear system which will be solved, with $A$ being an n $\times$ n sparse matrix and $x_0$ representing the initial vector. $K$ represents an appropriate preconditioner.

1.1 The Preconditioned Conjugate Gradient Method

The Conjugate Gradient (CG) method is an iterative Krylov subspace algorithm for solving symmetric matrices.

Algorithm description

$\mathbf{{Compute\ }}\mathbf{r}^{\mathbf{0}}\mathbf{= b - A}\mathbf{x}^{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}^{\mathbf{- 1}}\mathbf{=}\mathbf{x}^{\mathbf{0}}\mathbf{\ }$
$\mathbf{p}^{\mathbf{- 1}}\mathbf{= 0}$
$\mathbf{\alpha}_{\mathbf{- 1}}\mathbf{= 0\ }\mathbf{\ }$
$\mathbf{\beta}_{\mathbf{- 1}}\mathbf{= 0}$
$\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{0}}$
$\mathbf{\rho}_{\mathbf{0}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{0}} \right\rangle$
$\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}$
　　 $\mathbf{p}^{\mathbf{i}}\mathbf{= s +}\mathbf{\beta}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}$
　　 $\mathbf{q}^{\mathbf{i}}\mathbf{= A}\mathbf{p}^{\mathbf{i}}$
　　 $\mathbf{\gamma =}\left\langle \mathbf{p}^{\mathbf{i}}\mathbf{,}\mathbf{q}^{\mathbf{i}} \right\rangle$
　　 $\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}$
　　 $\mathbf{\alpha}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i}}\mathbf{\ /\ \gamma}$
　　 $\mathbf{r}^{\mathbf{i + 1}}\mathbf{=}\mathbf{r}^{\mathbf{i}}\mathbf{-}\mathbf{\alpha}_{\mathbf{i}}\mathbf{q}^{\mathbf{i}}$
　　 $\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{i + 1}}$
　　 $\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{i + 1}} \right\rangle$
　　 $\mathbf{{if\ }}\left\| \mathbf{r}^{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ the}}\mathbf{n}$
　　　　 $\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}$
　　　　 $\mathbf{{qui}}\mathbf{t}$
　　 $\mathbf{{endi}}\mathbf{f}$
　　 $\mathbf{\beta}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{\ /\ }\mathbf{\rho}_{\mathbf{i}}$
$\mathbf{{\ en}}\mathbf{d}$

1.2 The Preconditioned Biconjugate Gradient Stabilized Method

The Biconjugate Gradient Stabilized (Bi-CGSTAB) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices.

Algorithm description

$\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}$
$\mathbf{p}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}$
$\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}} \right\rangle$
$\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}$
　　 $\mathbf{{solve\ }}\widehat{\mathbf{p}}\mathbf{{\ from\ K}}\widehat{\mathbf{p}}\mathbf{= \ }\mathbf{p}_{\mathbf{i}}$
　　 $\mathbf{q = A}\widehat{\mathbf{p}}$
　　 $\mathbf{c}_{\mathbf{2}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,q} \right\rangle$
　　 $\mathbf{\alpha =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\mathbf{c}_{\mathbf{2}}$
　　 $\mathbf{e =}\mathbf{r}_{\mathbf{i}}\mathbf{- \alpha q}$
　　 $\mathbf{{solve\ }}\widehat{\mathbf{e}}\mathbf{{\ from\ K}}\widehat{\mathbf{e}}\mathbf{= e}$
　　 $\mathbf{v = A}\widehat{\mathbf{e}}$
　　 $\mathbf{c}_{\mathbf{3}}\mathbf{=}\left\langle \mathbf{e,v} \right\rangle\mathbf{\ /\ }\left\langle \mathbf{v,v} \right\rangle$
　　 $\mathbf{x}_{\mathbf{i + 1}}\mathbf{=}\mathbf{x}_{\mathbf{i}}\mathbf{+ \alpha}\widehat{\mathbf{p}}\mathbf{+}\mathbf{c}_{\mathbf{3}}\widehat{\mathbf{e}}$
　　 $\mathbf{r}_{\mathbf{i + 1}}\mathbf{= e -}\mathbf{c}_{\mathbf{3}}\mathbf{v}$
　　 $\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{i + 1}} \right\rangle$
　　 $\mathbf{{if\ }}\left\| \mathbf{r}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{the}}\mathbf{n}$
　　　　 $\mathbf{{quit}}$
　　 $\mathbf{{endif}}$
　　 $\mathbf{\beta =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\left( \mathbf{c}_{\mathbf{2}}\mathbf{c}_{\mathbf{3}} \right)$
　　 $\mathbf{p}_{\mathbf{i + 1}}\mathbf{=}\mathbf{r}_{\mathbf{i + 1}}\mathbf{+ \beta}\left( \mathbf{p}_{\mathbf{i}}\mathbf{-}\mathbf{c}_{\mathbf{3}}\mathbf{q} \right)$
$\mathbf{{end}}$

1.3 The Preconditioned Generalized Minimum Residual Method

The Generalized Minimum Residual (GMRES(m)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. GMRES(m) is restarted periodically after $m$ iterations if it did not converge within $m$ iterations, the solution obtained until $m$ iterations will be used as an input for the new restart cycle. PARCEL provides variants of GMRES with the Classical Gram-Schmidt (sometimes referred to as standard Gram-Schmidt) and Modified Gram-Schmidt (MGS) orthogonalization methods.

Algorithm description

$\mathbf{H}_{\mathbf{n}}$ is an upper triangular matrix with $\mathbf{h}_{\mathbf{j,k}}$ representing its elements, $\mathbf{\ }\mathbf{e}_{\mathbf{i}}$ represents a vector consisting of the first i elements of the vector $\mathbf{e}$ .

$\mathbf{for\ j = 0,1,2,3}\mathbf{\ldots}$
　　 $\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}$
　　 $\mathbf{v}_{\mathbf{1}}\mathbf{= - r\ /\ }\left\| \mathbf{r} \right\|$
　　 $\mathbf{e =}\left( \mathbf{-}\left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}$
　　 $\mathbf{n = m}$
　　 $\mathbf{for\ i = 1,2,3}\mathbf{\ldots}\mathbf{m}$
　　　　 $\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{=}\mathbf{v}_{\mathbf{i}}$
　　　　 $\mathbf{\omega = A}{\widehat{\mathbf{v}}}_{\mathbf{i}}$
　　　　 $\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i}$
　　　　　　 $\mathbf{h}_{\mathbf{k,i}}\mathbf{=}\left\langle \mathbf{\omega,}\mathbf{v}_{\mathbf{k}} \right\rangle$
　　　　　　 $\mathbf{\omega = \omega -}\mathbf{h}_{\mathbf{k,i}}\mathbf{v}_{\mathbf{k}}$
　　　　 $\mathbf{{end}}$
　　　　 $\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{=}\left\| \mathbf{\omega} \right\|$
　　　　 $\mathbf{v}_{\mathbf{i + 1}}\mathbf{= \omega/}\left\| \mathbf{\omega} \right\|$
　　　　 $\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i - 1}$
　　　　　　 $\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}$
　　　　 $\mathbf{{end}}$
　　　　 $\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
　　　　 $\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
　　　　 $\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}$
　　　　 $\mathbf{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathbf{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathbf{h}_{\mathbf{i + 1,i}}$
　　　　 $\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{= 0}$
　　　　 $\mathbf{{if\ }}\left\| \mathbf{e}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}$
　　　　　　 $\mathbf{n = i}$
　　　　　　 $\mathbf{{exit}}$
　　　　 $\mathbf{{endif}}$
　　 $\mathbf{{end}}$
　　 $\mathbf{y}_{\mathbf{n}}\mathbf{=}\mathbf{H}_{\mathbf{n}}^{\mathbf{- 1}}\mathbf{e}_{\mathbf{n}}$
　　 $\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 1}}^{\mathbf{n}}{\mathbf{y}_{\mathbf{k}}\mathbf{v}_{\mathbf{k}}}$
　　 $\mathbf{x}_{\mathbf{n}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}$
　　 $\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{n}}$
$\mathbf{{end}}$

1.4 The Preconditioned Communication Avoiding Generalized Minimum Residual Method

The Communication Avoiding Generalized Minimum Residual (CA-GMRES(s,t)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. The calculation which is done in $s$ iterations in the GMRES(m) method is equivalent to one iteration of the CA-GMRES method. CA-GMRES(s,t) is restarted periodically after $t$ iterations if it did not converge within $t$ iterations, the solution obtained until $m$ iterations will be used as an input for the new restart cycle. CA-GMRES(s,t) is equivalent to GMRES(m) if $s$ and $t$ are chosen so that $m = t \times s$ . The convergence property of CA-GMRES(s,t) is the same as GMRES(m) in exact arithmetic. In addition, the number of global collective communication calls is reduced by communication avoiding QR factorization. As CA-GMRES(s,t) produces $s$ basis vectors at once, when $s$ is too large, the linear independency of the basic vectors may become worse because of round off errors, leading to worse convergence property than GMRES(m). In order to improve the orthogonality of the basic vectors, the PARCEL implementation of CA-GMRES(s,t) provides an option to use the Newton basis in addition to the monomial basis.

Algorithm description

$\mathbf{e}$ represents a unit vector, ${\widetilde{\mathbf{\rho}}}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}$ represents the $s$ -th row and $s$ -th column element of the matrix $\mathbf{R}_{\mathbf{k}}$ , and $\mathbf{E}$ represents the eigenvalues obtained from the Hessenberg matrix, which is generated by the iteration process.

$\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{= \lbrack}\mathbf{e}_{\mathbf{2}}\mathbf{,}\mathbf{e}_{\mathbf{3}}\mathbf{,\ldots,}\mathbf{e}_{\mathbf{s + 1}}\mathbf{\rbrack}$
$\mathbf{for\ j = 0,1,2,3\ldots}$
　　 $\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}$
　　 $\mathbf{v}_{\mathbf{1}}\mathbf{= r\ /\ }\left\| \mathbf{r} \right\|$
　　 $\mathbf{\zeta =}\left( \left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}$
　　 $\mathbf{for\ k = 0,1\ldots,t - 1}$
　　　　 $\mathbf{Fix\ basis\ conversion\ matrix\ \lbrack}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,E\rbrack}$
　　　　 $\mathbf{Comupute\ \ basic\ vector\ \ \lbrack\ s,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{\ \rbrack}$
　　　　 $\mathbf{if(k.eq.0)}$
　　　　　　 $\mathbf{{QR\ decomposition\ }}\mathbf{V}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}$
　　　　　　 $\mathfrak{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}$
　　　　　　 $\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{- 1}}$
　　　　　　 $\mathcal{H}_{\mathbf{0}}\mathbf{=}\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}$
　　　　　　 $\mathbf{for\ o = 1,2\ldots,s}$
　　　　　　 $\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta\rbrack}$
　　　　　　 $\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}$
　　　　　　　　 $\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{Q}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}\mathbf{R}_{\mathbf{0}}\mathbf{,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
　　　　　　　　 $\mathbf{{quit}}$
　　　　　　 $\mathbf{{endif}}$
　　　　 $\mathbf{{else}}$
　　　　　　 ${\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{=}\left( \mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}} \right)^{\mathbf{H}}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}$
　　　　　　 ${\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{-}\mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}}{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}$
　　　　　　 $\mathbf{{QR\ decomposition\ }}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{Q}}}_{\mathbf{k}}^{\mathbf{\sim}}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}$
　　　　　　 $\mathbf{R}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathbf{R}_{\mathbf{k}} & \mathbf{z}_{\mathbf{k}} \\ \mathbf{0}_{\mathbf{1,k}} & \mathbf{\rho} \\ \end{pmatrix}$
　　　　　　 $\mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{= -}\mathfrak{H}_{\mathbf{k - 1}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}\mathfrak{R}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}$
　　　　　　 $\mathbf{H}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}\mathbf{B}_{\mathbf{k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{b}_{\mathbf{k}}\mathbf{z}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}}\mathbf{-}\mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}$
　　　　　　 $\mathbf{h}_{\mathbf{k}}\mathbf{=}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{\rho}_{\mathbf{k}}\mathbf{b}_{\mathbf{k}}$
　　　　　　 $\mathfrak{H}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1}} & \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \begin{matrix} \mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}} \\ \mathbf{0}_{\mathbf{1,s(k - 1)}} \\ \end{matrix} & \begin{matrix} \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{matrix} \\ \end{pmatrix}$
　　　　　　 $\mathcal{H}_{\mathbf{k}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{pmatrix}$
　　　　　　 $\mathbf{for\ o = 1 + sk,\ldots,s(k + 1)}$
　　　　　　　　 $\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta\rbrack}$
　　　　　　　　 $\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}$
　　　　　　　　　　 $\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
　　　　　　　　　　 $\mathbf{{quit}}$
　　　　　　　　 $\mathbf{{endif}}$
　　　　　　 $\mathbf{{end}}$
　　　　 $\mathbf{{endif}}$
　　 $\mathbf{{end}}$
　　 $\mathbf{update\ \ solution\ \ vector\lbrack s,t - 1,st,}{\acute{\mathbf{V}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{t - 1}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
$\mathbf{{end}}$

Fix basis conversion matrix $\mathbf{\lbrack s,B,E\rbrack}$

$\mathbf{{if\ compute\ Newton\ Basis}}$
　　 $\mathbf{i = 0}$
　　 $\mathbf{{while}}\left( \mathbf{i \leq s - 1} \right)$
　　　　 $\mathbf{{if}}\left( \mathbf{i.eq.s - 1} \right)\mathbf{{then}}$
　　　　　　 $\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
　　　　 $\mathbf{{else}}$
　　　　　　 $\mathbf{{if}}\left( \mathbf{{Im}}\left\lbrack \mathbf{E}_{\mathbf{i}} \right\rbrack\mathbf{.eq.0} \right)\mathbf{{then}}$
　　　　　　　　 $\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
　　　　　　 $\mathbf{{else}}$
　　　　　　　　 $\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
　　　　　　　　 $\mathbf{B}_{\mathbf{i + 1,i + 1}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
　　　　　　　　 $\mathbf{B}_{\mathbf{i,i + 1}}\mathbf{= -}\left( \mathbf{Im\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack} \right)^{\mathbf{2}}$
　　　　　　　　 $\mathbf{i = i + 1}$
　　　　　　 $\mathbf{{endif}}$
　　　　 $\mathbf{{endif}}$
　　　　 $\mathbf{i = i + 1}$
　　 $\mathbf{{end\ \ }}$
$\mathbf{{end\ \ }}$

Comupute basis vector $\mathbf{\lbrack\ s,v\ ,B\rbrack}$

The elements of the matrix $\mathbf{B}$ are represented as $\mathbf{b}_{\mathbf{k,i}}$

$\mathbf{for\ k = 1,2,3\ldots s}$
　　 $\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{=}\mathbf{v}_{\mathbf{k - 1}}$
　　 $\mathbf{if(k.ne.1)then}$
　　　　 $\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}$
　　　　 $\mathbf{\beta =}\mathbf{b}_{\mathbf{k - 2,k - 1}}$
　　　　 $\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}\mathbf{+ \beta}\mathbf{v}_{\mathbf{k - 2}}$
　　 $\mathbf{{else}}$
　　　　 $\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}$
　　　　 $\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}$
　　 $\mathbf{{endif}}$
$\mathbf{{en}}\mathbf{d}$

Givens rotation $\mathbf{\lbrack i,}\mathcal{H,}\mathbf{\zeta\rbrack}$

　The elements of the matrix $\mathcal{H}$ are represented as $\mathcal{h}_{\mathbf{k,i}}$ .

$\mathbf{for\ k = 1,2,3\ldots i - 1}$
　　 $\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}$
$\mathbf{\text{end}}$
$\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
$\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
$\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}$
$\mathcal{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathcal{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathcal{h}_{\mathbf{i + 1,i}}$
$\mathcal{h}_{\mathbf{i + 1,i}}\mathbf{= 0}$

Update solution vector $\mathbf{\lbrack s,k,n,V,Q,R,}\mathcal{H,}\mathbf{\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$

$\mathbf{y =}\mathcal{H}^{\mathbf{- 1}}\mathbf{\zeta}$
$\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 0}}^{\mathbf{s}\mathbf{m}\mathbf{- 1}}\mathbf{Q}_{\mathbf{k}}\mathbf{y}_{\mathbf{k}}\mathbf{+}\sum_{\mathbf{l =}\mathbf{{sm}}}^{\mathbf{n - 1}}\mathbf{V}_{\mathbf{l}\mathbf{- sm}}\sum_{\mathbf{k =}\mathbf{{sm}}}^{\mathbf{n - 1}}{\mathbf{R}_{\mathbf{l - sm}\mathbf{,}\mathbf{k}\mathbf{- sm}}^{\mathbf{- 1}}\mathbf{y}_{\mathbf{k}}}$
$\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}$

1.5 The Preconditioned Chebyshev basis Conjugate Gradient Method

The Chebyshev basis Conjugate Gradient (CBCG) method is an iterative Krylov subspace algorithm for solving symmetric matrices. The CBCG method calculates $k$ iterations of the CG method in one iteration. By this approach, the CBCG method reduces the global collective communication calls. In order to use the Chebyshev basis the largest and smallest eigenvalues are needed. PARCEL provides two methods to obtain these eigenvalues, the so called power method and the communication avoiding Arnoldi method.

Algorithm

$\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ gue}}\mathbf{{ss\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}$
$\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}$
$\mathbf{Q}_{\mathbf{0}}\mathbf{=}\mathbf{S}_{\mathbf{0}}$
$\mathbf{for\ i = 0,1,2,3\ldots}$
$\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}}\mathbf{\ ,\ }\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}}$
　　 $\mathbf{\alpha}_{\mathbf{i}}\mathbf{= \ }\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}} \right)$
　　 $\mathbf{x}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{x}_{\mathbf{{ik}}}\mathbf{+}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}$
　　 $\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{r}_{\mathbf{{ik}}}\mathbf{- A}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}$
　　 $\mathbf{{if\ }}\left\| \mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}$
　　 $\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{i + 1}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{i + 1}}\mathbf{,}\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}$
　　 $\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}}$
　　 $\mathbf{B}_{\mathbf{i}}\mathbf{=}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}} \right)\mathbf{\ }$
　　 $\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{S}_{\mathbf{i + 1}}\mathbf{-}\mathbf{Q}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}$
　　 $\mathbf{A}\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{{AS}}_{\mathbf{i + 1}}\mathbf{-}\mathbf{{AQ}}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}$
$\mathbf{{end}}$

Compute Chebyshev basis $\mathbf{\lbrack S,AS,r,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}$

$\mathbf{\lambda}_{\mathbf{\max}}$ , $\mathbf{\lambda}_{\mathbf{\min}}$ represent the largest and smallest eigenvalues of $\mathbf{AK}^{-1}$ .

$\mathbf{\eta = 2\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}$
$\mathbf{\zeta =}\left( \mathbf{\lambda}_{\mathbf{\max}}\mathbf{+}\mathbf{\lambda}_{\mathbf{\min}} \right)\mathbf{\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}$
$\mathbf{s}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}\mathbf{\ }$
$\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{= \ }\mathbf{s}_{\mathbf{0}}$
$\mathbf{s}_{\mathbf{1}}\mathbf{= \eta A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{- \zeta}\mathbf{s}_{\mathbf{0}}$
$\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{= \ }\mathbf{s}_{\mathbf{1}}$
$\mathbf{for\ j = 2,3,\ldots,k}$
　　 $\mathbf{s}_{\mathbf{j}}\mathbf{= 2\eta A}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{- 2\zeta}\mathbf{s}_{\mathbf{j - 1}}\mathbf{-}\mathbf{s}_{\mathbf{j - 2}}$
　　 $\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{= \ }\mathbf{s}_{\mathbf{j}}$
$\mathbf{{end}}$
$\mathbf{S = (}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}$
$\mathbf{AS = (A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}$

2 Preconditioning

In iterative algorithms, the application of a preconditioner of the form $K \approx A$ can help to reduce the number of iterations until convergence. The notation [ ${{solve\ }}\widehat{{p}}{{\ from\ K}}\widehat{{p}}{= \ }{p}$ ] means [ $approximately solve {A}\widehat{p}{= \ }{p}$ with respect to the vector $\widehat{p}$ ]. Although the more accurate approximation leads to the less number of iterations until convergence, the cost of preconditioner is normally increased. In non-parallel computing, one of the most common preconditioners is the incomplete LU factorization (ILU). However, in parallel computing, parallel preconditioners are needed. PARCEL provides the following parallel preconditioners: Point Jacobi, Block Jacobi and Additive Schwarz.

2.1 Point Jacobi Preconditioner

The point Jacobi preconditioner is one of the most simplest preconditioners. $K$ only consists of the diagonal elements of the matrix $A$ . Compared to other preconditioners the efficiency of the point Jacobi preconditioner is very low. However, it can be easily applied in parallel and it does not require any additional communication between processors.

2.2 Zero Fill-in Incomplete LU Factorization Preconditioners (ILU(0))

LU factorization decompose a square matrix $A$ into a lower triangular matrix $L$ and an upper triangular matrix $U$ . For a typical sparse matrix, the LU factors can be much less sparse than the original matrix, which is called as fill-in. This increases the memory and computational requirements. In order to avoid this issue, PARCEL provides the zero fill-in incomplete LU factorization (ILU(0)).

2.3 Diagonal Incomplete LU Factorization Preconditioners (D-ILU)

The diagonal incomplete LU factorization (D-ILU) computes a diagonal matrix $D$ , a lower triangular matrix $L$ and an upper triangular matrix $U$ for the input matrix $A$ . $A = L_{A} + D_{A} + U_{A}$ The preconditioner $K$ is constructed with $L_{A}$ , $U_{A}$ and $D$ ( $D \neq D_{A}$ ) with: $K = \left( D + L_{A} \right)D^{- 1}\left( D + U_{A} \right)$ Two different conditions exist in order to construct the diagonal matrix $D$ :

Condition 1: The diagonal elements of $K$ equal those of $A$ : $A_{{ii}} = K_{{ii}}\ \ (i = 1,\ldots,n)$
Condition 2: The row sum of $K$ equals the row sum of $A$ : $\sum_{j}^{}A_{{ij}} = \sum_{j}^{}K_{{ij}}\ \ (i = 1,\ldots,n)$

Given the conditions above $D$ can be computed as follows:

If condition 1 should be fullfilled the following computation is used: $D_{{ii}} = A_{{ii}} - \sum_{j < i}^{}{A_{{ij}}D_{jj}^{-1}A_{{ji}}}\ \ (i = 1,\ldots,n)$
If condition 2 should be fullfilled the following computation is used: $D_{{ii}} = A_{{ii}} - \sum_{j < i}^{}{\sum_{k > j}^{}{A_{{ij}}D_{{jj}}^{-1}A_{jk}}}\ \ (i = 1,\ldots,n)$

2.4 Block Jacobi Preconditioner

The Block Jacobi Preconditioner constructs $K$ out of the diagonal blocks of the matrix $A$ . For each block, the incomplete ILU factorization is computed. When the preconditioner is applied, each block can be processed independently, resulting in a high level of parallelism. In addition, no communication is needed when each block is defined within a sub-matrix on each processor.

2.5 Additive Schwarz Preconditioner

The Additive Schwarz Preconditioner constructs $K$ out of the overlapping diagonal blocks of $A$ . For each block the incomplete LU factorization is computed. Compared to the Block Jacobi Preconditioner, the Additive Schwarz Preconditioner may require additional communication. PARCEL provides four different overlapping methods, BASIC, RESTRICT, INTERPOLATE and NONE.

Fig.1 Additive Schwarz Preconditioner with overlapping diagonal blocks

BASIC

Solve $Ks=r$ with an extended $r$ including an overlapping region with the neighboring process, $s$ in the overlapping region is determined by summing up the results between the neighboring region.

Fig.2 BASIC

RESTRICT

　 Solve $Ks=r$ with an extended $r$ including an overlapping region with the neighboring process, $s$ in the overlapping region is determined without taking account of the result in the neighboring region.

Fig.3 RESTRICT

INTERPOLATE

Solve $Ks=r$ with an extended $r$ without an overlapping region with the neighboring process, $s$ in the overlapping region is determined by summing up the results between the neighboring regions.　

Fig.4 INTERPOLATE

NONE

Solve $Ks=r$ with an extended $r$ without an overlapping region with the neighboring process, $s$ in the overlapping region is determined without taking account of the result in the neighboring regions.　

Fig.5 NONE

3 Sparse Matrix Data Formats

In order to save memory in sparse matrix computation, only the non-zero elements and their locations are stored. PARCEL supports different sparse matrix formats.

3.1 Compressed Row Storage (CRS) Format

　The CRS format compresses the row indices of the non-zero elements of the sparse matrix. Together with the compressed rows, the column index and the value for each non-zero element are stored in one dimensional arrays. Compresed row indeces are stored in an incremental order.

$A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)$

One process case

In case a single process processes the whole matrix from above

Values of non-zero elements :: $crsA=\{a,b,c,d,e,f,g,h,i,j\}$

Compressed row indices :: $crsRow\_ptr=\{1,4,5,8,11\}$

Column indices of non-zero elements :: $crsCol=\{1,2,3,2,1,3,4,2,3,4\}$

Two process case

Rank 0 process

Values of non-zero elements :: $crsA=\{a,b,c,d\}$

Compressed row indices :: $crsRow\_ptr=\{1,4,5\}$

Column indices of non-zero elements :: $crsCol=\{1,2,3,2\}$

Rank 1 process

Values of non-zero elements :: $crsA=\{e,f,g,h,i,j\}$

Compressed row indices :: $crsRow\_ptr=\{1,4,7\}$

Column indices of non-zero elements :: $crsCol=\{1,3,4,2,3,4\}$

3.2 Diagonal Format

　The Diagonal Format stores the non-zero elements of a matrix as diagonals. This format provides high performance for band block diagonal matrices. In addition to the element values, an offset value for each diagonal of the matrix needs to be stored inside an one dimensional array. A negative offset indicates that the diagonal is below the main diagonal of the matrix, a positive offset indicates that the diagonal is above the main diagonal. The main diagonal has the offset zero. Offset values are stored in an incremental order.

$A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)$

Single process case

In case the matrix above is processed by a single process

Elements :: $diaA=\{0,0,e,h,0,0,0,i,a,d,f,j,b,0,g,0,c,0,0,0\}$

Offsets :: $offset=\{-2,-1,0,1,2\}$

Number of diagonals :: $nnd=5$

Two process case

Rank 0 process

Elements :: $diaA=\{a,d,b,0,c,0\}$

Offsets :: $offset=\{0,1,2\}$

Number of diagonals :: $nnd=3$

Rank 1 process

Elements :: $diaA=\{e,h,0,i,f,j,g,0\}$

Offsets :: $offset=\{-2,-1,0,1\}$

Number of diagonals :: $nnd=4$

4 The structure of PARCEL

　This chapter explains how to compile and install the PARCEL library.

4.1 Directory Structure

　After extracting the PARCEL library, the directory structure should be the same as in Fig.6.

Fig.6 The PARCEL directory structure

4.2 Compiling the PARCEL library

PARCEL requires a C compiler, a FORTRAN compiler, which supports preprocessor, MPI, LAPACK, and OpenMP. When compiling with $gfortran$ , the option $-cpp$ has to be specified to to activate preprocessor. Compiler options can be changed inside the file arch/make_config. An example of a make_config file is shown in Fig.7. The make command should be executed from the top directory of the extracted file. In order to re-compile, $make\ clean$ followed by $make$ should be executed. If the $make$ command was successful, each $example$ directory should contain an executable file. 64-bit integer is needed when working with over roughly 2 billion unknowns. For example, FDEFINE has to be specified for a large scale test, example5, in this package.

Option	Explanation
CC	C compiler command name
FC	FORTRAN compiler command name
CFLAGS	C compiler options
FFLAGS	FORTRAN compiler options
FP_MODE	Option to suppress optimizations with arithmetic order change (Only for quad precision)
INCLUDEDIR	Include directories
LD_MPI	MPI library
LD_LAPACK	LAPACK library
FDEFINE	64bit integer becomes available with -DPARCEL_INT8.

Fig.7 Example of a make_config file

4.3 Usage of the PARCEL library

In order to call PARCEL library, programs need to be linked against the $*.a$ library files inside the $src$ directory.

5 PARCEL routines (direct interface)

　The PARCEL routines with direct call interfaces are shown below.

5.1 parcel_dcg

Interface

call parcel_dcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA(nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)

5.2 parcel_dcg_dia

Interface

call parcel_dcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA(n*nnd)	double precision	in	Non-zero elements of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)

5.3 parcel_ddcg

Interface

call parcel_ddcg( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx_hi(n)	double precision	in/out	in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n)	double precision	in/out	in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n)	double precision	in	Right hand side vector (upper bit)
vecb_lo(n)	double precision	in	Right hand side vector (lower bit)
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA_hi(nnz)	double precision	in	Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz)	double precision	in	Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
PRECISION_A	integer	in	Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b	integer	in	Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x	integer	in	Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION	integer	in	Precision of preconditioner (1: double precision, 2: quad precision)

5.4 parcel_ddcg_dia

Interface

call parcel_ddcg_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx_hi(n)	double precision	in/out	in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n)	double precision	in/out	in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n)	double precision	in	Right hand side vector (upper bit)
vecb_lo(n)	double precision	in	Right hand side vector (lower bit)
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA_hi(n*nnd)	double precision	in	Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd)	double precision	in	Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
PRECISION_A	integer	in	Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b	integer	in	Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x	integer	in	Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION	integer	in	Precision of preconditioner (1: double precision, 2: quad precision)

5.5 parcel_dbicgstab

Interface

call parcel_dbicgstab( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are storedin the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA(nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)

5.6 parcel_dbicgstab_dia

Interface

call parcel_dbicgstab_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA(n*nnd)	double precision	in	Non-zero elements of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)

5.7 parcel_ddbicgstab

Interface

call parcel_ddbicgstab( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx_hi(n)	double precision	in/out	in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n)	double precision	in/out	in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n)	double precision	in	Right hand side vector (upper bit)
vecb_lo(n)	double precision	in	Right hand side vector (lower bit)
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA_hi(nnz)	double precision	in	Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz)	double precision	in	Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
PRECISION_A	integer	in	Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b	integer	in	Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x	integer	in	Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION	integer	in	Precision of preconditioner (1: double precision, 2: quad precision)

5.8 parcel_ddbicgstab_dia

Interface

call parcel_ddbicgstab_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx_hi(n)	double precision	in/out	in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n)	double precision	in/out	in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n)	double precision	in	Right hand side vector (upper bit)
vecb_lo(n)	double precision	in	Right hand side vector (lower bit)
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA_hi(n*nnd)	double precision	in	Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd)	double precision	in	Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
PRECISION_A	integer	in	Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b	integer	in	Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x	integer	in	Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION	integer	in	Precision of preconditioner (1: double precision, 2: quad precision)

5.9 parcel_dgmres

Interface

call parcel_dgmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )

Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA(nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
GMRES_m	integer	in	Number of iterations until restart
GMRES_GSflag	integer	in	Flag for orthogonalization algorithm (1: MGS, 2: CGS)

5.10 parcel_dgmres_dia

Interface

call parcel_dgmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )

Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA(n*nnd)	double precision	in	Non-zero elements of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
GMRES_m	integer	in	Number of iterations until restart
GMRES_GSflag	integer	in	Flag for orthogonalization algorithm (1: MGS, 2: CGS)

5.11 parcel_ddgmres

Interface

call parcel_ddgmres( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx_hi(n)	double precision	in/out	in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n)	double precision	in/out	in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n)	double precision	in	Right hand side vector (upper bit)
vecb_lo(n)	double precision	in	Right hand side vector (lower bit)
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA_hi(nnz)	double precision	in	Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz)	double precision	in	Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
GMRES_m	integer	in	Number of iterations until restart
GMRES_GSflag	integer	in	Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A	integer	in	Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b	integer	in	Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x	integer	in	Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION	integer	in	Precision of preconditioner (1: double precision, 2: quad precision)

5.12 parcel_ddgmres_dia

Interface

call parcel_ddgmres_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx_hi(n)	double precision	in/out	in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n)	double precision	in/out	in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n)	double precision	in	Right hand side vector (upper bit)
vecb_lo(n)	double precision	in	Right hand side vector (lower bit)
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA_hi(n*nnd)	double precision	in	Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd)	double precision	in	Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
GMRES_m	integer	in	Number of iterations until restart
GMRES_GSflag	integer	in	Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A	integer	in	Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b	integer	in	Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x	integer	in	Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION	integer	in	Precision of preconditioner (1: double precision, 2: quad precision)

5.13 parcel_dcbcg

Interface

call parcel_dcbcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )

Function

The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA(nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
CBCG_kstep	integer	in	Number of communication avoiding steps
CBCG_Eigenflag	integer	in	Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax	integer	in	Maximum number of iterations in the power method
CAARNOLDI_sstep	integer	in	Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep	integer	in	Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis	integer	in	Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag	integer	in	Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

5.14 parcel_dcbcg_dia

Interface

call parcel_dcbcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )

Function

The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the Diagonal format

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA(n*nnd)	double precision	in	Non-zero elements of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
CBCG_kstep	integer	in	Number of communication avoiding steps
CBCG_Eigenflag	integer	in	Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax	integer	in	Maximum number of iterations in the power method
CAARNOLDI_sstep	integer	in	Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep	integer	in	Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis	integer	in	Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag	integer	in	Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

5.15 parcel_dcagmres

Interface

call parcel_dcagmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )

Function

Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element valuesin the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA(nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
CAGMRES_sstep	integer	in	Number of communication avoiding steps s in the CA-GMRES method
CAGMRES_tstep	integer	in	Number of outer iterations t in the CA-GMRES method (restart length = st)
CAGMRES_basis	integer	in	Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)
CAGMRES_QRflag	integer	in	Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

5.16 parcel_dcagmres_dia

Interface

call parcel_dcagmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )

Function

Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values in the Diagonal format

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in/out	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA(n*nnd)	double precision	in	Non-zero elements of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
iflagAS	integer	in	Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax	integer	in/out	in: maximum number of iterations, out: number of iterations
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)
reshistory(itrmax)	double precision	out	History of relative residual error
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret	integer	out	Error code (0:normal)
CAGMRES_sstep	integer	in	Small communication step of CA-GMRES method
CAGMRES_tstep	integer	in	List step of the CA-GMRES method
CAGMRES_basis	integer	in	Base generation flag of CA-GMRES method (0: single basis, 1: newton basis)
CAGMRES_QRflag	integer	in	QR Decomposition flag of CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

6 PARCEL routines (indirect interface)

　The PARCEL routines with indirect call interfaces are shown below.

6.1 PARCEL_KSP_Default_Setting

Interface

call PARCEL_KSP_Default_Setting(method)

Function

Apply initial settings to Struct in the PARCEL format.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format

6.2 set_KSP_CRS

Interface

call set_KSP_CRS(icomm,vecx,vecb,n,gn,nnz,istart,crsA,crsRow_ptr,crsCol,itrmax,ipreflag,ILU_method,addL,method)

Function

　Set matrix in the CRS format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
nnz	integer4 / integer8	in	Number of non-zero elements on each process
istart	integer4 / integer8	in	Start line of matrix on each process
crsA(nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1)	integer4 / integer8	in	Pointer table in the CRS format
crsCol(nnz)	integer4 / integer8	in	Column numbers of non-zero elements in the CRS format
itrmax	integer	in	in: maximum number of iterations, out: number of iterations
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
method	type(KSP)	in/out	Struct in the PARCEL format

6.3 set_KSP_DIA

Interface

call set_KSP_DIA(icomm,vecx,vecb,n,gn,istart,diaA,offset,nnd,itrmax,ipreflag,ILU_method,addL,method)

Function

Set matrix in the Diagonal format.

Argument name (dimension)	Type	Input/Output	Description
icomm	integer	in	MPI communicator
vecx(n)	double precision	in	in: initial vector, out: solution vector
vecb(n)	double precision	in	Right hand side vector
n	integer4 / integer8	in	Size of vector on each process
gn	integer4 / integer8	in	Total size of vector
istart	integer4 / integer8	in	Start line of matrix on each process
diaA(n*nnd)	double precision	in	Non-zero elements of matrix stored in the Diagonal format
offset(nnd)	integer4 / integer8	in	Offset of each non-zero diagonal elements array in the Diagonal format
nnd	integer4 / integer8	in	Number of diagonal elements arrays in the Diagonal format
itrmax	integer	in	in: maximum number of iterations, out: number of iterations
ipreflag	integer	in	Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL	integer	in	Overlapping width in the additive Schwartz method
method	type(KSP)	in/out	Struct in the PARCEL format

6.4 set_vecx_KSP

Interface

　call set_vecx_KSP(n,x,method)

Function

Set initial vector.

Argument name (dimension)	Type	Input/Output	Description
n	integer4 / integer8	in	Size of vector on each process
x(n)	double precision	in	Initial vector
method	type(KSP)	in/out	Struct in the PARCEL format

6.5 set_vecb_KSP

Interface

　call set_vecb_KSP(n,b,method)

Function

Set right hand side vector b.

Argument name (dimension)	Type	Input/Output	Description
n	integer4 / integer8	in	Size of vector on each process
b(n)	double precision	in	Right hand side vector
method	type(KSP)	in/out	Struct in the PARCEL format

6.6 get_vecx_KSP

Interface

　call get_vecx_KSP(n,x,method)

Function

Get solution vector.

Argument name (dimension)	Type	Input/Output	Description
n	integer4 / integer8	in	Size of vector on each process
x(n)	double precision	out	Solution vector
method	type(KSP)	in	Struct in the PARCEL format

6.7 get_reshistory_KSP

Interface

call get_reshistory_KSP(niter,reshistory, method)

Function

Get convergence history.

Argument name (dimension)	Type	Input/Output	Description
niter	integer	out	Number of iterations until convergence
reshistory(method%itrmax)	double precision	out	Convergence history
method	type(KSP)	in	Struct in the PARCEL format

6.8 reset_CRS_Mat

Interface

call reset_CRS_Mat(itrmax,ipreflag,ILU_method,addL,crsA,method)

Function

　Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the CRS format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension)	Type	Input/Output	Description
itrmax	integer	in	Maximum number of iterations (-1: keep the existing setting)
ipreflag	integer	in	Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL	integer	in	Overlapping width in the additive Schwartz method (-1: keep the existing setting)
crsA(method%nnz)	double precision	in	Non-zero elements of matrix stored in the CRS format
method	type(KSP)	in/out	Struct in the PARCEL format

6.9 reset_DIA_Mat

Interface

call reset_DIA_Mat(itrmax,ipreflag,ILU_method,addL,diaMat,method)

Function

　Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the Diagonal format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension)	Type	Input/Output	Description
itrmax	integer	in	Maximum number of iterations (-1: keep the existing setting)
ipreflag	integer	in	Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method	integer	in	Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL	integer	in	Overlapping width in the additive Schwartz method (-1: keep the existing setting)
diaMat(method%n*method%nnd)	double precision	in	Non-zero elements of matrix stored in the CRS format
method	type(KSP)	in/out	Struct in the PARCEL format

6.10 set_KSP_solver

Interface

call set_KSP_solver(method,solver)

Function

Set Krylov subspace method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
solver	integer	in	Flag for Krylov subspace method (1: CG, 2: BiCGGSTAB, 3: GMRES, 4: CA-GMRES, 5: CBCG)

6.11 set_KSP_abstolmax(method,abstolmax)

Interface

call set_KSP_abstolmax(method,abstolmax)

Function

Set maximun tolerance in absolute value of residual error.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
abstolmax	double precision	in	Convergence criterion (norm of residual error)

6.12 set_KSP_rtolmax

Interface

call set_KSP_rtolmax(method,rtolmax)

Function

Set maximum tolerance in relative error.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
rtolmax	double precision	in	Convergence criterion (norm of relative residual error)

6.13 set_KSP_iovlflag

Interface

call set_KSP_iovlflag(method,iovlflag)

Function

Set communication-computation overlap method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
iovlflag	integer	in	Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)

6.14 set_KSP_GMRES_m(method,GMRES_m)

Interface

call set_KSP_GMRES_m(method,GMRES_m)

Function

Set the number of iterations m until restart in the GMRES(m) method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
GMRES_m	integer	in	Number of iterations until restart

6.15 set_KSP_GMRES_GSflag

Interface

call set_KSP_GMRES_GSflag(method,GMRES_GSflag)

Function

Set orthogonalization algorithm in the GMRES(m) method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
GMRES_GSflag	integer	in	Flag for orthogonalization algorithm (1: MGS, 2: CGS)

6.16 set_KSP_CAGMRES_sstep

Interface

call set_KSP_CAGMRES_sstep(method,CAGMRES_sstep)

Function

Set number of communication avoiding steps s of CA-GMRES.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAGMRES_sstep	integer	in	Number of communication steps of the CA-GMRES method

6.17 set_KSP_CAGMRES_tstep

Interface

call set_KSP_CAGMRES_tstep(method,CAGMRES_tstep)

Function

Set number of outer iterations t of CA-GMRES.)

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAGMRES_tstep	integer	in	Number of outer iterations t in the CA-GMRES method method (restart length = st)

6.18 set_KSP_CAGMRES_basis

Interface

call set_KSP_CAGMRES_basis(method,CAGMRES_basis)

Function

Set basis vector in the CA-GMRES method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAGMRES_basis	integer	in	Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)

6.19 set_KSP_CAGMRES_QRflag

Interface

call set_KSP_CAGMRES_QRflag(method,CAGMRES_QRflag)

Function

Set QR factorization algorithm in the CA-GMRES method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAGMRES_QRflag	integer	in	Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

6.20 set_KSP_CBCG_kstep

Interface

call set_KSP_CBCG_kstep(method,CBCG_kstep)

Function

Set number of communication avoiding steps in the CBCG method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CBCG_kstep	integer	in	Number of communication avoiding steps

6.21 set_KSP_CBCG_Eigenflag

Interface

call set_KSP_CBCG_Eigenflag(method,CBCG_Eigenflag)

Function

Set eigenvalue computation algorithm in the CBCG method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CBCG_Eigenflag	integer	in	Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)

6.22 set_KSP_CAARNOLDI_sstep

Interface

call set_KSP_CAARNOLDI_sstep(method,CAARNOLDI_sstep)

Function

Set number of communication avoiding steps s in the CA-ARNOLDI method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAARNOLDI_sstep	integer	in	Number of communication avoiding steps s of the CA-ARNOLDI method

6.23 set_KSP_CAARNOLDI_tstep

Interface

call set_KSP_CAARNOLDI_tstep(method,CAARNOLDI_tstep)

Function

Set number of outer iterations t in the CA-ARNOLDI method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAARNOLDI_tstep	integer	in	Number of outer iterations t in the CA-ARNOLDI method (restart length = st)

6.24 set_KSP_CAARNOLDI_basis

Interface

call set_KSP_CAARNOLDI_basis(method,CAARNOLDI_basis)

Function

Set basis vector in the CA-ARNOLDI method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAARNOLDI_basis	integer	in	Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)

6.25 set_KSP_CAARNOLDI_QRflag

Interface

call set_KSP_CAARNOLDI_QRflag(method,CAARNOLDI_QRflag)

Function

Set QR factorization algorithm in the CA-ARNOLDI method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
CAARNOLDI_QRflag	integer	in	Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

6.26 set_KSP_power_method_itrmax

Interface

call set_KSP_power_method_itrmax(method,power_method_itrmax)

Function

Set maximum number of iterations in the power method.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	out	Struct in the PARCEL format
power_method_itrmax	integer	in	Maximum number of iterations in the power method

6.27 START_KSP_SOLVER

Interface

call START_KSP_SOLVER(method)

Function

Solve a simultaneous linear equation system Ax = b.

Argument name (dimension)	Type	Input/Output	Description
method	type(KSP)	in/out	Struct in the PARCEL format

7 How to use

　The use of the PARCEL routines is explained by FORTRAN sample programs of a CG solver. Details of the usage of the PARCEL in C programs can be found in C sample programs stored in SPARSE/example_C in the archive. The PARCEL can be called also from C programs in the same manner, provided that a MPI communicator is prepared for FORTRAN routines. In the C sample programs, a MPI communicator created in C is converted to FORTRAN via MPI_Comm_c2f in the MPI library. In the C sample program for the indirect interface, a struct in the PARCEL format is generated in FORTRAN, and then, it is used in C programs.

7.1 Compressed Row Storage (CRS) Format

　Sample codes in the CRS format are shown in Fig.8 and Fig.9. Here, make_matrix_CRS denotes an arbitrary routine, which generates a matrix in the CRS format in PARCEL. Fig.8 and Fig.9 show examples with the direct and indirect interfaces, respectively. In Fig.9, a compiling option to include SPARSE/src is needed. When only values of matrix elements in $A$ , elements in $b$ , and the initial values in $x$ are changed with the same matrix structure or the non-zero element positions, in Fig.8, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.

Sample code 1 in the CRS format

  program main
    use mpi
    implicit none
 
    integer n,gn,nnz,istart
    real*8,allocatable  :: crsA(:)
    integer,allocatable :: crsRow_ptr(:),crsCol(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
 
    call MPI_Init(ierr)
 
    call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
 
    vecb=1.0d0
    vecx=1.0d0
    call parcel_dcg( &
         MPI_COMM_WORLD, &
         vecx,vecb, &
         n,gn,nnz,istart, &
         crsA,crsRow_ptr,crsCol, &
         ipreflag,ILU_method,addL,iflagAS, &
         itrmax,rtolmax, &
         reshistory, &
         iovlflag,iret &
         )
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(crsA)
    deallocate(crsRow_ptr)
    deallocate(crsCol)
 
    deallocate(reshistory)
 
  end program main

Fig.8 Sample code in the CRS format (direct interface)

Sample code 2 in the CRS format

  program main
    use mpi
    use krylov
    use krylov_kernel
    implicit none
 
    integer n,gn,nnz,istart
    real*8,allocatable  :: crsA(:)
    integer,allocatable :: crsRow_ptr(:),crsCol(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
    type(KSP) :: method
 
    call MPI_Init(ierr)
 
    call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
 
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
    vecb=1.0d0
    vecx=1.0d0
 
    call PARCEL_KSP_Default_Setting(method)
    call set_KSP_CRS(&
         MPI_COMM_WORLD, &
         vecx,vecb, &
         n,gn,nnz,istart, &
         crsA,crsRow_ptr,crsCol, &
         itrmax,ipreflag,ILU_method,addL,&
         method)
    call set_KSP_solver(method,1)
    call START_KSP_SOLVER(method)
    call get_vecx_KSP(n,vecx,method)
    call free_KSP(method)
 
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(reshistory)
 
  end program main

Fig.9 Sample code in the CRS format (indirect interface)

7.2 Diagonal Format

　Sample codes in the Diagonal format are shown in Fig.10 and Fig.11. Here, make_matrix_DIA denotes an arbitrary routine, which generates a matrix in the Diagonal format in PARCEL. Fig.10 and Fig.11 show examples with the direct and indirect interfaces, respectively. In Fig.11, a compiling option to include SPARSE/src is needed. When only values of matrix elements in $A$ , elements in $b$ , and the initial values in $x$ are changed with the same matrix structure or the non-zero element positions, in Fig.10, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.

Sample code 1 in the Diagonal format

  program main
    use mpi
    implicit none
 
    integer n,gn,nnd,istart
    real*8,allocatable  :: diaA(:)
    integer,allocatable :: offset(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
 
    call MPI_Init(ierr)
 
    call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
 
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
    vecb=1.0d0
    vecx=1.0d0
 
    call parcel_dcg_dia(&
         MPI_COMM_WORLD, &
         vecx,vecb,&
         n,gn,istart,&
         diaA,offset,nnd,&
         ipreflag,ILU_method,addL,iflagAS,&
         itrmax,rtolmax,&
         reshistory,&
         iovlflag,iret&
         )
 
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(diaA)
    deallocate(offset)
 
    deallocate(reshistory)
 
  end program main

Fig.10 Sample code in the Diagonal format (direct interface)

Sample code 2 in the Diagonal format

  program main
    use mpi
    use krylov
    use krylov_kernel
    implicit none
 
    integer n,gn,nnd,istart
    real*8,allocatable  :: diaA(:)
    integer,allocatable :: offset(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
 
    call MPI_Init(ierr)
 
    call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
 
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
    vecb=1.0d0
    vecx=1.0d0
 
    call PARCEL_KSP_Default_Setting(method)
    call set_KSP_DIA(&
         MPI_COMM_WORLD, &
         vecx,vecb, &
         n,gn,istart, &
         diaA,offset,nnd,&
         itrmax,ipreflag,ILU_method,addL,&
         method)
    call set_KSP_solver(method,1)
    call START_KSP_SOLVER(method)
    call get_vecx_KSP(n,vecx,method)
    call free_KSP(method)
 
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(reshistory)
 
  end program main

Fig.11 Sample code in the Diagonal format (indirect interface)