# 1 Krylov subspace methods

This chapter describes the iterative methods contained in this library. The following notation is used, $Ax = b$ represents the linear system which will be solved, with $A$ being an n$\times$n sparse matrix and $x_0$ representing the initial vector. $K$ represents an appropriate preconditioner.

## 1.1 The Preconditioned Conjugate Gradient Method

The Conjugate Gradient (CG) method is an iterative Krylov subspace algorithm for solving symmetric matrices.

• Algorithm description
1. $\mathbf{{Compute\ }}\mathbf{r}^{\mathbf{0}}\mathbf{= b - A}\mathbf{x}^{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}^{\mathbf{- 1}}\mathbf{=}\mathbf{x}^{\mathbf{0}}\mathbf{\ }$
2. $\mathbf{p}^{\mathbf{- 1}}\mathbf{= 0}$
3. $\mathbf{\alpha}_{\mathbf{- 1}}\mathbf{= 0\ }\mathbf{\ }$
4. $\mathbf{\beta}_{\mathbf{- 1}}\mathbf{= 0}$
5. $\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{0}}$
6. $\mathbf{\rho}_{\mathbf{0}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{0}} \right\rangle$
7. $\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}$
8. 　　$\mathbf{p}^{\mathbf{i}}\mathbf{= s +}\mathbf{\beta}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}$
9. 　　$\mathbf{q}^{\mathbf{i}}\mathbf{= A}\mathbf{p}^{\mathbf{i}}$
10. 　　$\mathbf{\gamma =}\left\langle \mathbf{p}^{\mathbf{i}}\mathbf{,}\mathbf{q}^{\mathbf{i}} \right\rangle$
11. 　　$\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}$
12. 　　$\mathbf{\alpha}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i}}\mathbf{\ /\ \gamma}$
13. 　　$\mathbf{r}^{\mathbf{i + 1}}\mathbf{=}\mathbf{r}^{\mathbf{i}}\mathbf{-}\mathbf{\alpha}_{\mathbf{i}}\mathbf{q}^{\mathbf{i}}$
14. 　　$\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{i + 1}}$
15. 　　$\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{i + 1}} \right\rangle$
16. 　　$\mathbf{{if\ }}\left\| \mathbf{r}^{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ the}}\mathbf{n}$
17. 　　　　$\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}$
18. 　　　　$\mathbf{{qui}}\mathbf{t}$
19. 　　$\mathbf{{endi}}\mathbf{f}$
20. 　　$\mathbf{\beta}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{\ /\ }\mathbf{\rho}_{\mathbf{i}}$
21. $\mathbf{{\ en}}\mathbf{d}$

## 1.2 The Preconditioned Biconjugate Gradient Stabilized Method

The Biconjugate Gradient Stabilized (Bi-CGSTAB) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices.

• Algorithm description
1. $\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}$
2. $\mathbf{p}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}$
3. $\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}} \right\rangle$
4. $\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}$
5. 　　$\mathbf{{solve\ }}\widehat{\mathbf{p}}\mathbf{{\ from\ K}}\widehat{\mathbf{p}}\mathbf{= \ }\mathbf{p}_{\mathbf{i}}$
6. 　　$\mathbf{q = A}\widehat{\mathbf{p}}$
7. 　　$\mathbf{c}_{\mathbf{2}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,q} \right\rangle$
8. 　　$\mathbf{\alpha =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\mathbf{c}_{\mathbf{2}}$
9. 　　$\mathbf{e =}\mathbf{r}_{\mathbf{i}}\mathbf{- \alpha q}$
10. 　　$\mathbf{{solve\ }}\widehat{\mathbf{e}}\mathbf{{\ from\ K}}\widehat{\mathbf{e}}\mathbf{= e}$
11. 　　$\mathbf{v = A}\widehat{\mathbf{e}}$
12. 　　$\mathbf{c}_{\mathbf{3}}\mathbf{=}\left\langle \mathbf{e,v} \right\rangle\mathbf{\ /\ }\left\langle \mathbf{v,v} \right\rangle$
13. 　　$\mathbf{x}_{\mathbf{i + 1}}\mathbf{=}\mathbf{x}_{\mathbf{i}}\mathbf{+ \alpha}\widehat{\mathbf{p}}\mathbf{+}\mathbf{c}_{\mathbf{3}}\widehat{\mathbf{e}}$
14. 　　$\mathbf{r}_{\mathbf{i + 1}}\mathbf{= e -}\mathbf{c}_{\mathbf{3}}\mathbf{v}$
15. 　　$\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{i + 1}} \right\rangle$
16. 　　$\mathbf{{if\ }}\left\| \mathbf{r}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{the}}\mathbf{n}$
17. 　　　　$\mathbf{{quit}}$
18. 　　$\mathbf{{endif}}$
19. 　　$\mathbf{\beta =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\left( \mathbf{c}_{\mathbf{2}}\mathbf{c}_{\mathbf{3}} \right)$
20. 　　$\mathbf{p}_{\mathbf{i + 1}}\mathbf{=}\mathbf{r}_{\mathbf{i + 1}}\mathbf{+ \beta}\left( \mathbf{p}_{\mathbf{i}}\mathbf{-}\mathbf{c}_{\mathbf{3}}\mathbf{q} \right)$
21. $\mathbf{{end}}$

## 1.3 The Preconditioned Generalized Minimum Residual Method

The Generalized Minimum Residual (GMRES(m)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. GMRES(m) is restarted periodically after $m$ iterations if it did not converge within $m$ iterations, the solution obtained until $m$ iterations will be used as an input for the new restart cycle. PARCEL provides variants of GMRES with the Classical Gram-Schmidt (sometimes referred to as standard Gram-Schmidt) and Modified Gram-Schmidt (MGS) orthogonalization methods.

• Algorithm description

$\mathbf{H}_{\mathbf{n}}$ is an upper triangular matrix with $\mathbf{h}_{\mathbf{j,k}}$ representing its elements, $\mathbf{\ }\mathbf{e}_{\mathbf{i}}$ represents a vector consisting of the first i elements of the vector $\mathbf{e}$.

1. $\mathbf{for\ j = 0,1,2,3}\mathbf{\ldots}$
2. 　　$\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}$
3. 　　$\mathbf{v}_{\mathbf{1}}\mathbf{= - r\ /\ }\left\| \mathbf{r} \right\|$
4. 　　$\mathbf{e =}\left( \mathbf{-}\left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}$
5. 　　$\mathbf{n = m}$
6. 　　$\mathbf{for\ i = 1,2,3}\mathbf{\ldots}\mathbf{m}$
7. 　 　　　$\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{=}\mathbf{v}_{\mathbf{i}}$
8. 　　 　　$\mathbf{\omega = A}{\widehat{\mathbf{v}}}_{\mathbf{i}}$
9. 　　　 　$\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i}$
10. 　　　　　　$\mathbf{h}_{\mathbf{k,i}}\mathbf{=}\left\langle \mathbf{\omega,}\mathbf{v}_{\mathbf{k}} \right\rangle$
11. 　　　　　　$\mathbf{\omega = \omega -}\mathbf{h}_{\mathbf{k,i}}\mathbf{v}_{\mathbf{k}}$
12. 　　　　$\mathbf{{end}}$
13. 　　　　$\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{=}\left\| \mathbf{\omega} \right\|$
14. 　　　　$\mathbf{v}_{\mathbf{i + 1}}\mathbf{= \omega/}\left\| \mathbf{\omega} \right\|$
15. 　　　　$\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i - 1}$
16. 　　　　　　$\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}$
17. 　　　　$\mathbf{{end}}$
18. 　　　　$\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
19. 　　　　$\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
20. 　　　　$\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}$
21. 　　　　$\mathbf{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathbf{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathbf{h}_{\mathbf{i + 1,i}}$
22. 　　　　$\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{= 0}$
23. 　　　　$\mathbf{{if\ }}\left\| \mathbf{e}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}$
24. 　　　　　　$\mathbf{n = i}$
25. 　　　　　　$\mathbf{{exit}}$
26. 　　　　$\mathbf{{endif}}$
27. 　　$\mathbf{{end}}$
28. 　　$\mathbf{y}_{\mathbf{n}}\mathbf{=}\mathbf{H}_{\mathbf{n}}^{\mathbf{- 1}}\mathbf{e}_{\mathbf{n}}$
29. 　　$\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 1}}^{\mathbf{n}}{\mathbf{y}_{\mathbf{k}}\mathbf{v}_{\mathbf{k}}}$
30. 　　$\mathbf{x}_{\mathbf{n}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}$
31. 　　$\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{n}}$
32. $\mathbf{{end}}$

## 1.4 The Preconditioned Communication Avoiding Generalized Minimum Residual Method

The Communication Avoiding Generalized Minimum Residual (CA-GMRES(s,t)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. The calculation which is done in $s$ iterations in the GMRES(m) method is equivalent to one iteration of the CA-GMRES method. CA-GMRES(s,t) is restarted periodically after $t$ iterations if it did not converge within $t$ iterations, the solution obtained until $m$ iterations will be used as an input for the new restart cycle. CA-GMRES(s,t) is equivalent to GMRES(m) if $s$ and $t$ are chosen so that $m = t \times s$. The convergence property of CA-GMRES(s,t) is the same as GMRES(m) in exact arithmetic. In addition, the number of global collective communication calls is reduced by communication avoiding QR factorization. As CA-GMRES(s,t) produces $s$ basis vectors at once, when $s$ is too large, the linear independency of the basic vectors may become worse because of round off errors, leading to worse convergence property than GMRES(m). In order to improve the orthogonality of the basic vectors, the PARCEL implementation of CA-GMRES(s,t) provides an option to use the Newton basis in addition to the monomial basis.

• Algorithm description

$\mathbf{e}$ represents a unit vector, ${\widetilde{\mathbf{\rho}}}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}$ represents the $s$-th row and $s$-th column element of the matrix $\mathbf{R}_{\mathbf{k}}$, and $\mathbf{E}$ represents the eigenvalues obtained from the Hessenberg matrix, which is generated by the iteration process.

1. $\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{= \lbrack}\mathbf{e}_{\mathbf{2}}\mathbf{,}\mathbf{e}_{\mathbf{3}}\mathbf{,\ldots,}\mathbf{e}_{\mathbf{s + 1}}\mathbf{\rbrack}$
2. $\mathbf{for\ j = 0,1,2,3\ldots}$
3. 　　$\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}$
4. 　　$\mathbf{v}_{\mathbf{1}}\mathbf{= r\ /\ }\left\| \mathbf{r} \right\|$
5. 　　$\mathbf{\zeta =}\left( \left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}$
6. 　　$\mathbf{for\ k = 0,1\ldots,t - 1}$
7. 　　　　$\mathbf{Fix\ basis\ conversion\ matrix\ \lbrack}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,E\rbrack}$
8. 　　　　$\mathbf{Comupute\ \ basic\ vector\ \ \lbrack\ s,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{\ \rbrack}$
9. 　　　　$\mathbf{if(k.eq.0)}$
10. 　　　　　　$\mathbf{{QR\ decomposition\ }}\mathbf{V}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}$
11. 　　　　　　$\mathfrak{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}$
12. 　　　　　　$\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{- 1}}$
13. 　　　　　　$\mathcal{H}_{\mathbf{0}}\mathbf{=}\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}$
14. 　　　　　　$\mathbf{for\ o = 1,2\ldots,s}$
15. 　　　　　　$\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta\rbrack}$
16. 　　　　　　$\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}$
17. 　　　　　　　　$\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{Q}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}\mathbf{R}_{\mathbf{0}}\mathbf{,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
18. 　　　　　　　　$\mathbf{{quit}}$
19. 　　　　　　$\mathbf{{endif}}$
20. 　　　　$\mathbf{{else}}$
21. 　　　　　　${\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{=}\left( \mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}} \right)^{\mathbf{H}}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}$
22. 　　　　　　${\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{-}\mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}}{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}$
23. 　　　　　　$\mathbf{{QR\ decomposition\ }}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{Q}}}_{\mathbf{k}}^{\mathbf{\sim}}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}$
24. 　　　　　　$\mathbf{R}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathbf{R}_{\mathbf{k}} & \mathbf{z}_{\mathbf{k}} \\ \mathbf{0}_{\mathbf{1,k}} & \mathbf{\rho} \\ \end{pmatrix}$
25. 　　　　　　$\mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{= -}\mathfrak{H}_{\mathbf{k - 1}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}\mathfrak{R}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}$
26. 　　　　　　$\mathbf{H}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}\mathbf{B}_{\mathbf{k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{b}_{\mathbf{k}}\mathbf{z}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}}\mathbf{-}\mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}$
27. 　　　　　　$\mathbf{h}_{\mathbf{k}}\mathbf{=}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{\rho}_{\mathbf{k}}\mathbf{b}_{\mathbf{k}}$
28. 　　　　　　$\mathfrak{H}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1}} & \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \begin{matrix} \mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}} \\ \mathbf{0}_{\mathbf{1,s(k - 1)}} \\ \end{matrix} & \begin{matrix} \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{matrix} \\ \end{pmatrix}$
29. 　　　　　　$\mathcal{H}_{\mathbf{k}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{pmatrix}$
30. 　　　　　　$\mathbf{for\ o = 1 + sk,\ldots,s(k + 1)}$
31. 　　　　　　　　$\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta\rbrack}$
32. 　　　　　　　　$\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}$
33. 　　　　　　　　　　$\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
34. 　　　　　　　　　　$\mathbf{{quit}}$
35. 　　　　　　　　$\mathbf{{endif}}$
36. 　　　　　　$\mathbf{{end}}$
37. 　　　　$\mathbf{{endif}}$
38. 　　$\mathbf{{end}}$
39. 　　$\mathbf{update\ \ solution\ \ vector\lbrack s,t - 1,st,}{\acute{\mathbf{V}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{t - 1}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
40. $\mathbf{{end}}$

• Fix basis conversion matrix $\mathbf{\lbrack s,B,E\rbrack}$
1. $\mathbf{{if\ compute\ Newton\ Basis}}$
2. 　　$\mathbf{i = 0}$
3. 　　$\mathbf{{while}}\left( \mathbf{i \leq s - 1} \right)$
4. 　　　　$\mathbf{{if}}\left( \mathbf{i.eq.s - 1} \right)\mathbf{{then}}$
5. 　　　　　　$\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
6. 　　　　$\mathbf{{else}}$
7. 　　　　　　$\mathbf{{if}}\left( \mathbf{{Im}}\left\lbrack \mathbf{E}_{\mathbf{i}} \right\rbrack\mathbf{.eq.0} \right)\mathbf{{then}}$
8. 　　　　　　　　$\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
9. 　　　　　　$\mathbf{{else}}$
10. 　　　　　　　　$\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
11. 　　　　　　　　$\mathbf{B}_{\mathbf{i + 1,i + 1}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}$
12. 　　　　　　　　$\mathbf{B}_{\mathbf{i,i + 1}}\mathbf{= -}\left( \mathbf{Im\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack} \right)^{\mathbf{2}}$
13. 　　　　　　　　$\mathbf{i = i + 1}$
14. 　　　　　　$\mathbf{{endif}}$
15. 　　　　$\mathbf{{endif}}$
16. 　　　　$\mathbf{i = i + 1}$
17. 　　$\mathbf{{end\ \ }}$
18. $\mathbf{{end\ \ }}$

• Comupute basis vector $\mathbf{\lbrack\ s,v\ ,B\rbrack}$

The elements of the matrix $\mathbf{B}$ are represented as $\mathbf{b}_{\mathbf{k,i}}$

1. $\mathbf{for\ k = 1,2,3\ldots s}$
2. 　　$\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{=}\mathbf{v}_{\mathbf{k - 1}}$
3. 　　$\mathbf{if(k.ne.1)then}$
4. 　　　　$\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}$
5. 　　　　$\mathbf{\beta =}\mathbf{b}_{\mathbf{k - 2,k - 1}}$
6. 　　　　$\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}\mathbf{+ \beta}\mathbf{v}_{\mathbf{k - 2}}$
7. 　　$\mathbf{{else}}$
8. 　　　　$\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}$
9. 　　　　$\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}$
10. 　　$\mathbf{{endif}}$
11. $\mathbf{{en}}\mathbf{d}$

• Givens rotation $\mathbf{\lbrack i,}\mathcal{H,}\mathbf{\zeta\rbrack}$

The elements of the matrix $\mathcal{H}$ are represented as $\mathcal{h}_{\mathbf{k,i}}$.

1. $\mathbf{for\ k = 1,2,3\ldots i - 1}$
2. 　　$\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}$
3. $\mathbf{\text{end}}$
4. $\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
5. $\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}$
6. $\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}$
7. $\mathcal{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathcal{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathcal{h}_{\mathbf{i + 1,i}}$
8. $\mathcal{h}_{\mathbf{i + 1,i}}\mathbf{= 0}$

• Update solution vector $\mathbf{\lbrack s,k,n,V,Q,R,}\mathcal{H,}\mathbf{\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}$
1. $\mathbf{y =}\mathcal{H}^{\mathbf{- 1}}\mathbf{\zeta}$
2. $\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 0}}^{\mathbf{s}\mathbf{m}\mathbf{- 1}}\mathbf{Q}_{\mathbf{k}}\mathbf{y}_{\mathbf{k}}\mathbf{+}\sum_{\mathbf{l =}\mathbf{{sm}}}^{\mathbf{n - 1}}\mathbf{V}_{\mathbf{l}\mathbf{- sm}}\sum_{\mathbf{k =}\mathbf{{sm}}}^{\mathbf{n - 1}}{\mathbf{R}_{\mathbf{l - sm}\mathbf{,}\mathbf{k}\mathbf{- sm}}^{\mathbf{- 1}}\mathbf{y}_{\mathbf{k}}}$
3. $\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}$

## 1.5 The Preconditioned Chebyshev basis Conjugate Gradient Method

The Chebyshev basis Conjugate Gradient (CBCG) method is an iterative Krylov subspace algorithm for solving symmetric matrices. The CBCG method calculates $k$ iterations of the CG method in one iteration. By this approach, the CBCG method reduces the global collective communication calls. In order to use the Chebyshev basis the largest and smallest eigenvalues are needed. PARCEL provides two methods to obtain these eigenvalues, the so called power method and the communication avoiding Arnoldi method.

• Algorithm
1. $\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ gue}}\mathbf{{ss\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}$
2. $\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}$
3. $\mathbf{Q}_{\mathbf{0}}\mathbf{=}\mathbf{S}_{\mathbf{0}}$
4. $\mathbf{for\ i = 0,1,2,3\ldots}$
5. $\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}}\mathbf{\ ,\ }\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}}$
6. 　　$\mathbf{\alpha}_{\mathbf{i}}\mathbf{= \ }\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}} \right)$
7. 　　$\mathbf{x}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{x}_{\mathbf{{ik}}}\mathbf{+}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}$
8. 　　$\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{r}_{\mathbf{{ik}}}\mathbf{- A}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}$
9. 　　$\mathbf{{if\ }}\left\| \mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}$
10. 　　$\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{i + 1}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{i + 1}}\mathbf{,}\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}$
11. 　　$\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}}$
12. 　　$\mathbf{B}_{\mathbf{i}}\mathbf{=}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}} \right)\mathbf{\ }$
13. 　　$\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{S}_{\mathbf{i + 1}}\mathbf{-}\mathbf{Q}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}$
14. 　　$\mathbf{A}\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{{AS}}_{\mathbf{i + 1}}\mathbf{-}\mathbf{{AQ}}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}$
15. $\mathbf{{end}}$

• Compute Chebyshev basis $\mathbf{\lbrack S,AS,r,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}$

$\mathbf{\lambda}_{\mathbf{\max}}$, $\mathbf{\lambda}_{\mathbf{\min}}$represent the largest and smallest eigenvalues of $\mathbf{AK}^{-1}$.

1. $\mathbf{\eta = 2\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}$
2. $\mathbf{\zeta =}\left( \mathbf{\lambda}_{\mathbf{\max}}\mathbf{+}\mathbf{\lambda}_{\mathbf{\min}} \right)\mathbf{\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}$
3. $\mathbf{s}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}\mathbf{\ }$
4. $\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{= \ }\mathbf{s}_{\mathbf{0}}$
5. $\mathbf{s}_{\mathbf{1}}\mathbf{= \eta A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{- \zeta}\mathbf{s}_{\mathbf{0}}$
6. $\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{= \ }\mathbf{s}_{\mathbf{1}}$
7. $\mathbf{for\ j = 2,3,\ldots,k}$
8. 　　$\mathbf{s}_{\mathbf{j}}\mathbf{= 2\eta A}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{- 2\zeta}\mathbf{s}_{\mathbf{j - 1}}\mathbf{-}\mathbf{s}_{\mathbf{j - 2}}$
9. 　　$\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{= \ }\mathbf{s}_{\mathbf{j}}$
10. $\mathbf{{end}}$
11. $\mathbf{S = (}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}$
12. $\mathbf{AS = (A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}$

# 2 Preconditioning

In iterative algorithms, the application of a preconditioner of the form $K \approx A$ can help to reduce the number of iterations until convergence. The notation [${{solve\ }}\widehat{{p}}{{\ from\ K}}\widehat{{p}}{= \ }{p}$] means [$approximately solve {A}\widehat{p}{= \ }{p}$ with respect to the vector $\widehat{p}$]. Although the more accurate approximation leads to the less number of iterations until convergence, the cost of preconditioner is normally increased. In non-parallel computing, one of the most common preconditioners is the incomplete LU factorization (ILU). However, in parallel computing, parallel preconditioners are needed. PARCEL provides the following parallel preconditioners: Point Jacobi, Block Jacobi and Additive Schwarz.

## 2.1 Point Jacobi Preconditioner

The point Jacobi preconditioner is one of the most simplest preconditioners. $K$ only consists of the diagonal elements of the matrix $A$. Compared to other preconditioners the efficiency of the point Jacobi preconditioner is very low. However, it can be easily applied in parallel and it does not require any additional communication between processors.

## 2.2 Zero Fill-in Incomplete LU Factorization Preconditioners (ILU(0))

LU factorization decompose a square matrix $A$ into a lower triangular matrix $L$ and an upper triangular matrix $U$. For a typical sparse matrix, the LU factors can be much less sparse than the original matrix, which is called as fill-in. This increases the memory and computational requirements. In order to avoid this issue, PARCEL provides the zero fill-in incomplete LU factorization (ILU(0)).

## 2.3 Diagonal Incomplete LU Factorization Preconditioners (D-ILU)

The diagonal incomplete LU factorization (D-ILU) computes a diagonal matrix $D$, a lower triangular matrix $L$ and an upper triangular matrix $U$ for the input matrix $A$. $A = L_{A} + D_{A} + U_{A}$ The preconditioner $K$ is constructed with $L_{A}$, $U_{A}$ and $D$ ($D \neq D_{A}$) with: $K = \left( D + L_{A} \right)D^{- 1}\left( D + U_{A} \right)$ Two different conditions exist in order to construct the diagonal matrix $D$:

• Condition 1: The diagonal elements of $K$ equal those of $A$: $A_{{ii}} = K_{{ii}}\ \ (i = 1,\ldots,n)$

• Condition 2: The row sum of $K$ equals the row sum of $A$: $\sum_{j}^{}A_{{ij}} = \sum_{j}^{}K_{{ij}}\ \ (i = 1,\ldots,n)$

Given the conditions above $D$ can be computed as follows:

• If condition 1 should be fullfilled the following computation is used: $D_{{ii}} = A_{{ii}} - \sum_{j < i}^{}{A_{{ij}}D_{jj}^{-1}A_{{ji}}}\ \ (i = 1,\ldots,n)$

• If condition 2 should be fullfilled the following computation is used: $D_{{ii}} = A_{{ii}} - \sum_{j < i}^{}{\sum_{k > j}^{}{A_{{ij}}D_{{jj}}^{-1}A_{jk}}}\ \ (i = 1,\ldots,n)$

## 2.4 Block Jacobi Preconditioner

The Block Jacobi Preconditioner constructs $K$ out of the diagonal blocks of the matrix $A$. For each block, the incomplete ILU factorization is computed. When the preconditioner is applied, each block can be processed independently, resulting in a high level of parallelism. In addition, no communication is needed when each block is defined within a sub-matrix on each processor.

The Additive Schwarz Preconditioner constructs $K$ out of the overlapping diagonal blocks of $A$. For each block the incomplete LU factorization is computed. Compared to the Block Jacobi Preconditioner, the Additive Schwarz Preconditioner may require additional communication. PARCEL provides four different overlapping methods, BASIC, RESTRICT, INTERPOLATE and NONE. Fig.1 Additive Schwarz Preconditioner with overlapping diagonal blocks

### BASIC

Solve $Ks=r$ with an extended $r$ including an overlapping region with the neighboring process, $s$ in the overlapping region is determined by summing up the results between the neighboring region. Fig.2 BASIC

### RESTRICT

Solve $Ks=r$ with an extended $r$ including an overlapping region with the neighboring process, $s$ in the overlapping region is determined without taking account of the result in the neighboring region. Fig.3 RESTRICT

### INTERPOLATE

Solve $Ks=r$ with an extended $r$ without an overlapping region with the neighboring process, $s$ in the overlapping region is determined by summing up the results between the neighboring regions. Fig.4 INTERPOLATE

### NONE

Solve $Ks=r$ with an extended $r$ without an overlapping region with the neighboring process, $s$ in the overlapping region is determined without taking account of the result in the neighboring regions. Fig.5 NONE

# 3 Sparse Matrix Data Formats

In order to save memory in sparse matrix computation, only the non-zero elements and their locations are stored. PARCEL supports different sparse matrix formats.

## 3.1 Compressed Row Storage (CRS) Format

The CRS format compresses the row indices of the non-zero elements of the sparse matrix. Together with the compressed rows, the column index and the value for each non-zero element are stored in one dimensional arrays. Compresed row indeces are stored in an incremental order.

$A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)$

### One process case

• In case a single process processes the whole matrix from above

Values of non-zero elements :: $crsA=\{a,b,c,d,e,f,g,h,i,j\}$

Compressed row indices :: $crsRow\_ptr=\{1,4,5,8,11\}$

Column indices of non-zero elements :: $crsCol=\{1,2,3,2,1,3,4,2,3,4\}$

### Two process case

• Rank 0 process

Values of non-zero elements :: $crsA=\{a,b,c,d\}$

Compressed row indices :: $crsRow\_ptr=\{1,4,5\}$

Column indices of non-zero elements :: $crsCol=\{1,2,3,2\}$

• Rank 1 process

Values of non-zero elements :: $crsA=\{e,f,g,h,i,j\}$

Compressed row indices :: $crsRow\_ptr=\{1,4,7\}$

Column indices of non-zero elements :: $crsCol=\{1,3,4,2,3,4\}$

## 3.2 Diagonal Format

The Diagonal Format stores the non-zero elements of a matrix as diagonals. This format provides high performance for band block diagonal matrices. In addition to the element values, an offset value for each diagonal of the matrix needs to be stored inside an one dimensional array. A negative offset indicates that the diagonal is below the main diagonal of the matrix, a positive offset indicates that the diagonal is above the main diagonal. The main diagonal has the offset zero. Offset values are stored in an incremental order.

$A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)$

### Single process case

• In case the matrix above is processed by a single process

Elements :: $diaA=\{0,0,e,h,0,0,0,i,a,d,f,j,b,0,g,0,c,0,0,0\}$

Offsets :: $offset=\{-2,-1,0,1,2\}$

Number of diagonals :: $nnd=5$

### Two process case

• Rank 0 process

Elements :: $diaA=\{a,d,b,0,c,0\}$

Offsets :: $offset=\{0,1,2\}$

Number of diagonals :: $nnd=3$

• Rank 1 process

Elements :: $diaA=\{e,h,0,i,f,j,g,0\}$

Offsets :: $offset=\{-2,-1,0,1\}$

Number of diagonals :: $nnd=4$

# 4 The structure of PARCEL

This chapter explains how to compile and install the PARCEL library.

## 4.1 Directory Structure

After extracting the PARCEL library, the directory structure should be the same as in Fig.6. Fig.6 The PARCEL directory structure

## 4.2 Compiling the PARCEL library

PARCEL requires a C compiler, a FORTRAN compiler, which supports preprocessor, MPI, LAPACK, and OpenMP. When compiling with $gfortran$, the option $-cpp$ has to be specified to to activate preprocessor. Compiler options can be changed inside the file arch/make_config. An example of a make_config file is shown in Fig.7. The make command should be executed from the top directory of the extracted file. In order to re-compile, $make\ clean$ followed by $make$ should be executed. If the $make$ command was successful, each $example$ directory should contain an executable file. 64-bit integer is needed when working with over roughly 2 billion unknowns. For example, FDEFINE has to be specified for a large scale test, example5, in this package. Option 　Explanation
CC 　C compiler command name
FC 　FORTRAN compiler command name
CFLAGS 　C compiler options
FFLAGS 　FORTRAN compiler options
FP_MODE 　Option to suppress optimizations with arithmetic order change (Only for quad precision)
INCLUDEDIR 　Include directories
LD_MPI 　MPI library
LD_LAPACK 　LAPACK library
FDEFINE 　64bit integer becomes available with -DPARCEL_INT8.

Fig.7 Example of a make_config file

## 4.3 Usage of the PARCEL library

In order to call PARCEL library, programs need to be linked against the $*.a$ library files inside the $src$ directory.

# 5 PARCEL routines (direct interface)

The PARCEL routines with direct call interfaces are shown below.

## 5.1 parcel_dcg

• Interface

call parcel_dcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

• Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

## 5.2 parcel_dcg_dia

• Interface

call parcel_dcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

• Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

## 5.3 parcel_ddcg

• Interface

call parcel_ddcg( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

• Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

## 5.4 parcel_ddcg_dia

• Interface

call parcel_ddcg_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

• Function

Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

## 5.5 parcel_dbicgstab

• Interface

call parcel_dbicgstab( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

• Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored​in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

## 5.6 parcel_dbicgstab_dia

• Interface

call parcel_dbicgstab_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )

• Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

## 5.7 parcel_ddbicgstab

• Interface

call parcel_ddbicgstab( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

• Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

## 5.8 parcel_ddbicgstab_dia

• Interface

call parcel_ddbicgstab_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

• Function

Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

## 5.9 parcel_dgmres

• Interface

call parcel_dgmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )

• Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

## 5.10 parcel_dgmres_dia

• Interface

call parcel_dgmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )

• Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

## 5.11 parcel_ddgmres

• Interface

call parcel_ddgmres( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

• Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

## 5.12 parcel_ddgmres_dia

• Interface

call parcel_ddgmres_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )

• Function

Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

## 5.13 parcel_dcbcg

• Interface

call parcel_dcbcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )

• Function

The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CBCG_kstep integer in Number of communication avoiding steps
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax integer in Maximum number of iterations in the power method
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

## 5.14 parcel_dcbcg_dia

• Interface

call parcel_dcbcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )

• Function

The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CBCG_kstep integer in Number of communication avoiding steps
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax integer in Maximum number of iterations in the power method
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

## 5.15 parcel_dcagmres

• Interface

call parcel_dcagmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )

• Function

Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values​in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CAGMRES_sstep integer in Number of communication avoiding steps s in the CA-GMRES method
CAGMRES_tstep integer in Number of outer iterations t in the CA-GMRES method (restart length = st)
CAGMRES_basis integer in Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)
CAGMRES_QRflag integer in Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

## 5.16 parcel_dcagmres_dia

• Interface

call parcel_dcagmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )

• Function

Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CAGMRES_sstep integer in Small communication step of CA-GMRES method
CAGMRES_tstep integer in List step of the CA-GMRES method
CAGMRES_basis integer in Base generation flag of CA-GMRES method (0: single basis, 1: newton basis)
CAGMRES_QRflag integer in QR Decomposition flag of CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

# 6 PARCEL routines (indirect interface)

The PARCEL routines with indirect call interfaces are shown below.

## 6.1 PARCEL_KSP_Default_Setting

• Interface

call PARCEL_KSP_Default_Setting(method)

• Function

Apply initial settings to Struct in the PARCEL format.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format

## 6.2 set_KSP_CRS

• Interface

• Function

Set matrix in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
itrmax integer in in: maximum number of iterations, out: number of iterations
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
method type(KSP) in/out Struct in the PARCEL format

## 6.3 set_KSP_DIA

• Interface

• Function

Set matrix in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
itrmax integer in in: maximum number of iterations, out: number of iterations
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
method type(KSP) in/out Struct in the PARCEL format

## 6.4 set_vecx_KSP

• Interface

call set_vecx_KSP(n,x,method)

• Function

Set initial vector.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
x(n) double precision in Initial vector
method type(KSP) in/out Struct in the PARCEL format

## 6.5 set_vecb_KSP

• Interface

call set_vecb_KSP(n,b,method)

• Function

Set right hand side vector b.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
b(n) double precision in Right hand side vector
method type(KSP) in/out Struct in the PARCEL format

## 6.6 get_vecx_KSP

• Interface

call get_vecx_KSP(n,x,method)

• Function

Get solution vector.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
x(n) double precision out Solution vector
method type(KSP) in Struct in the PARCEL format

## 6.7 get_reshistory_KSP

• Interface

call get_reshistory_KSP(niter,reshistory, method)

• Function

Get convergence history.

Argument name (dimension) Type Input/Output Description
niter integer out Number of iterations until convergence
reshistory(method%itrmax) double precision out Convergence history
method type(KSP) in Struct in the PARCEL format

## 6.8 reset_CRS_Mat

• Interface

• Function

Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the CRS format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension) Type Input/Output Description
itrmax integer in Maximum number of iterations (-1: keep the existing setting)
ipreflag integer in Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL integer in Overlapping width in the additive Schwartz method (-1: keep the existing setting)
crsA(method%nnz) double precision in Non-zero elements of matrix stored in the CRS format
method type(KSP) in/out Struct in the PARCEL format

## 6.9 reset_DIA_Mat

• Interface

• Function

Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the Diagonal format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension) Type Input/Output Description
itrmax integer in Maximum number of iterations (-1: keep the existing setting)
ipreflag integer in Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL integer in Overlapping width in the additive Schwartz method (-1: keep the existing setting)
diaMat(method%n*method%nnd) double precision in Non-zero elements of matrix stored in the CRS format
method type(KSP) in/out Struct in the PARCEL format

## 6.10 set_KSP_solver

• Interface

call set_KSP_solver(method,solver)

• Function

Set Krylov subspace method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
solver integer in Flag for Krylov subspace method (1: CG, 2: BiCGGSTAB, 3: GMRES, 4: CA-GMRES, 5: CBCG)

## 6.11 set_KSP_abstolmax(method,abstolmax)

• Interface

call set_KSP_abstolmax(method,abstolmax)

• Function

Set maximun tolerance in absolute value of residual error.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
abstolmax double precision in Convergence criterion (norm of residual error)

## 6.12 set_KSP_rtolmax

• Interface

call set_KSP_rtolmax(method,rtolmax)

• Function

Set maximum tolerance in relative error.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
rtolmax double precision in Convergence criterion (norm of relative residual error)

## 6.13 set_KSP_iovlflag

• Interface

call set_KSP_iovlflag(method,iovlflag)

• Function

Set communication-computation overlap method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)

## 6.14 set_KSP_GMRES_m(method,GMRES_m)

• Interface

call set_KSP_GMRES_m(method,GMRES_m)

• Function

Set the number of iterations m until restart in the GMRES(m) method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
GMRES_m integer in Number of iterations until restart

## 6.15 set_KSP_GMRES_GSflag

• Interface

call set_KSP_GMRES_GSflag(method,GMRES_GSflag)

• Function

Set orthogonalization algorithm in the GMRES(m) method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

## 6.16 set_KSP_CAGMRES_sstep

• Interface

call set_KSP_CAGMRES_sstep(method,CAGMRES_sstep)

• Function

Set number of communication avoiding steps s of CA-GMRES.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_sstep integer in Number of communication steps of the CA-GMRES method

## 6.17 set_KSP_CAGMRES_tstep

• Interface

call set_KSP_CAGMRES_tstep(method,CAGMRES_tstep)

• Function

Set number of outer iterations t of CA-GMRES.)

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_tstep integer in Number of outer iterations t in the CA-GMRES method method (restart length = st)

## 6.18 set_KSP_CAGMRES_basis

• Interface

call set_KSP_CAGMRES_basis(method,CAGMRES_basis)

• Function

Set basis vector in the CA-GMRES method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_basis integer in Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)

## 6.19 set_KSP_CAGMRES_QRflag

• Interface

call set_KSP_CAGMRES_QRflag(method,CAGMRES_QRflag)

• Function

Set QR factorization algorithm in the CA-GMRES method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_QRflag integer in Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

## 6.20 set_KSP_CBCG_kstep

• Interface

call set_KSP_CBCG_kstep(method,CBCG_kstep)

• Function

Set number of communication avoiding steps in the CBCG method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CBCG_kstep integer in Number of communication avoiding steps

## 6.21 set_KSP_CBCG_Eigenflag

• Interface

call set_KSP_CBCG_Eigenflag(method,CBCG_Eigenflag)

• Function

Set eigenvalue computation algorithm in the CBCG method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)

## 6.22 set_KSP_CAARNOLDI_sstep

• Interface

call set_KSP_CAARNOLDI_sstep(method,CAARNOLDI_sstep)

• Function

Set number of communication avoiding steps s in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method

## 6.23 set_KSP_CAARNOLDI_tstep

• Interface

call set_KSP_CAARNOLDI_tstep(method,CAARNOLDI_tstep)

• Function

Set number of outer iterations t in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)

## 6.24 set_KSP_CAARNOLDI_basis

• Interface

call set_KSP_CAARNOLDI_basis(method,CAARNOLDI_basis)

• Function

Set basis vector in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)

## 6.25 set_KSP_CAARNOLDI_QRflag

• Interface

call set_KSP_CAARNOLDI_QRflag(method,CAARNOLDI_QRflag)

• Function

Set QR factorization algorithm in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

## 6.26 set_KSP_power_method_itrmax

• Interface

call set_KSP_power_method_itrmax(method,power_method_itrmax)

• Function

Set maximum number of iterations in the power method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
power_method_itrmax integer in Maximum number of iterations in the power method

## 6.27 START_KSP_SOLVER

• Interface

call START_KSP_SOLVER(method)

• Function

Solve a simultaneous linear equation system Ax = b.

Argument name (dimension) Type Input/Output Description
method type(KSP) in/out Struct in the PARCEL format

# 7 How to use

The use of the PARCEL routines is explained by FORTRAN sample programs of a CG solver. Details of the usage of the PARCEL in C programs can be found in C sample programs stored in SPARSE/example_C in the archive. The PARCEL can be called also from C programs in the same manner, provided that a MPI communicator is prepared for FORTRAN routines. In the C sample programs, a MPI communicator created in C is converted to FORTRAN via MPI_Comm_c2f in the MPI library. In the C sample program for the indirect interface, a struct in the PARCEL format is generated in FORTRAN, and then, it is used in C programs.

## 7.1 Compressed Row Storage (CRS) Format

Sample codes in the CRS format are shown in Fig.8 and Fig.9. Here, make_matrix_CRS denotes an arbitrary routine, which generates a matrix in the CRS format in PARCEL. Fig.8 and Fig.9 show examples with the direct and indirect interfaces, respectively. In Fig.9, a compiling option to include SPARSE/src is needed. When only values of matrix elements in $A$, elements in $b$, and the initial values in $x$ are changed with the same matrix structure or the non-zero element positions, in Fig.8, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.

• Sample code 1 in the CRS format

  program main
use mpi
implicit none

integer n,gn,nnz,istart
real*8,allocatable  :: crsA(:)
integer,allocatable :: crsRow_ptr(:),crsCol(:)
real*8,allocatable  :: vecx(:)
real*8,allocatable  :: vecb(:)
integer itrmax
real*8  rtolmax
real*8, allocatable :: reshistory(:)
integer ipreflag
integer ILU_method
integer iflagAS
integer iovlflag
integer iret
integer ierr

call MPI_Init(ierr)

call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)

allocate(vecx(n))
allocate(vecb(n))
allocate(reshistory(itrmax))
ipreflag=0
ILU_method=1
iflagAS=1
itrmax=100
rtolmax=1.0d-8
iovlflag=0

vecb=1.0d0
vecx=1.0d0
call parcel_dcg( &
MPI_COMM_WORLD, &
vecx,vecb, &
n,gn,nnz,istart, &
crsA,crsRow_ptr,crsCol, &
itrmax,rtolmax, &
reshistory, &
iovlflag,iret &
)
call MPI_Finalize(ierr)

deallocate(vecx)
deallocate(vecb)

deallocate(crsA)
deallocate(crsRow_ptr)
deallocate(crsCol)

deallocate(reshistory)

end program main


 Fig.8 Sample code in the CRS format (direct interface)
• Sample code 2 in the CRS format

  program main
use mpi
use krylov
use krylov_kernel
implicit none

integer n,gn,nnz,istart
real*8,allocatable  :: crsA(:)
integer,allocatable :: crsRow_ptr(:),crsCol(:)
real*8,allocatable  :: vecx(:)
real*8,allocatable  :: vecb(:)
integer itrmax
real*8  rtolmax
real*8, allocatable :: reshistory(:)
integer ipreflag
integer ILU_method
integer iflagAS
integer iovlflag
integer iret
integer ierr
type(KSP) :: method

call MPI_Init(ierr)

call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)

allocate(vecx(n))
allocate(vecb(n))
allocate(reshistory(itrmax))

ipreflag=0
ILU_method=1
iflagAS=1
itrmax=100
rtolmax=1.0d-8
iovlflag=0
vecb=1.0d0
vecx=1.0d0

call PARCEL_KSP_Default_Setting(method)
call set_KSP_CRS(&
MPI_COMM_WORLD, &
vecx,vecb, &
n,gn,nnz,istart, &
crsA,crsRow_ptr,crsCol, &
method)
call set_KSP_solver(method,1)
call START_KSP_SOLVER(method)
call get_vecx_KSP(n,vecx,method)
call free_KSP(method)

call MPI_Finalize(ierr)

deallocate(vecx)
deallocate(vecb)

deallocate(reshistory)

end program main


 Fig.9 Sample code in the CRS format (indirect interface)

## 7.2 Diagonal Format

Sample codes in the Diagonal format are shown in Fig.10 and Fig.11. Here, make_matrix_DIA denotes an arbitrary routine, which generates a matrix in the Diagonal format in PARCEL. Fig.10 and Fig.11 show examples with the direct and indirect interfaces, respectively. In Fig.11, a compiling option to include SPARSE/src is needed. When only values of matrix elements in $A$, elements in $b$, and the initial values in $x$ are changed with the same matrix structure or the non-zero element positions, in Fig.10, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.

• Sample code 1 in the Diagonal format

  program main
use mpi
implicit none

integer n,gn,nnd,istart
real*8,allocatable  :: diaA(:)
integer,allocatable :: offset(:)
real*8,allocatable  :: vecx(:)
real*8,allocatable  :: vecb(:)
integer itrmax
real*8  rtolmax
real*8, allocatable :: reshistory(:)
integer ipreflag
integer ILU_method
integer iflagAS
integer iovlflag
integer iret
integer ierr

call MPI_Init(ierr)

call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)

allocate(vecx(n))
allocate(vecb(n))
allocate(reshistory(itrmax))

ipreflag=0
ILU_method=1
iflagAS=1
itrmax=100
rtolmax=1.0d-8
iovlflag=0
vecb=1.0d0
vecx=1.0d0

call parcel_dcg_dia(&
MPI_COMM_WORLD, &
vecx,vecb,&
n,gn,istart,&
diaA,offset,nnd,&
itrmax,rtolmax,&
reshistory,&
iovlflag,iret&
)

call MPI_Finalize(ierr)

deallocate(vecx)
deallocate(vecb)

deallocate(diaA)
deallocate(offset)

deallocate(reshistory)

end program main


 Fig.10 Sample code in the Diagonal format (direct interface)
• Sample code 2 in the Diagonal format

  program main
use mpi
use krylov
use krylov_kernel
implicit none

integer n,gn,nnd,istart
real*8,allocatable  :: diaA(:)
integer,allocatable :: offset(:)
real*8,allocatable  :: vecx(:)
real*8,allocatable  :: vecb(:)
integer itrmax
real*8  rtolmax
real*8, allocatable :: reshistory(:)
integer ipreflag
integer ILU_method
integer iflagAS
integer iovlflag
integer iret
integer ierr

call MPI_Init(ierr)

call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)

allocate(vecx(n))
allocate(vecb(n))
allocate(reshistory(itrmax))

ipreflag=0
ILU_method=1
iflagAS=1
itrmax=100
rtolmax=1.0d-8
iovlflag=0
vecb=1.0d0
vecx=1.0d0

call PARCEL_KSP_Default_Setting(method)
call set_KSP_DIA(&
MPI_COMM_WORLD, &
vecx,vecb, &
n,gn,istart, &
diaA,offset,nnd,&
method)
call set_KSP_solver(method,1)
call START_KSP_SOLVER(method)
call get_vecx_KSP(n,vecx,method)
call free_KSP(method)

call MPI_Finalize(ierr)

deallocate(vecx)
deallocate(vecb)

deallocate(reshistory)

end program main


 Fig.11 Sample code in the Diagonal format (indirect interface)