1 Krylov subspace methods

This chapter describes the iterative methods contained in this library. The following notation is used, Ax=bAx = b represents the linear system which will be solved, with AA being an nร—\timesn sparse matrix and x0x_0 representing the initial vector. KK represents an appropriate preconditioner.

1.1 The Preconditioned Conjugate Gradient Method

The Conjugate Gradient (CG) method is an iterative Krylov subspace algorithm for solving symmetric matrices.

  1. ๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐ซ0=๐›โˆ’๐€๐ฑ0๐Ÿ๐จ๐ซ๐ฌ๐จ๐ฆ๐ž๐ข๐ง๐ข๐ญ๐ข๐š๐ฅ๐ ๐ฎ๐ž๐ฌ๐ฌ๐ฑโˆ’1=๐ฑ0\mathbf{{Compute\ }}\mathbf{r}^{\mathbf{0}}\mathbf{= b - A}\mathbf{x}^{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}^{\mathbf{- 1}}\mathbf{=}\mathbf{x}^{\mathbf{0}}\mathbf{\ }
  2. ๐ฉโˆ’1=0\mathbf{p}^{\mathbf{- 1}}\mathbf{= 0}
  3. ๐›‚โˆ’1=0\mathbf{\alpha}_{\mathbf{- 1}}\mathbf{= 0\ }\mathbf{\ }
  4. ๐›ƒโˆ’1=0\mathbf{\beta}_{\mathbf{- 1}}\mathbf{= 0}
  5. ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฌ=๐ซ0\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{0}}
  6. ๐›’0=โŸจ๐ฌ,๐ซ0โŸฉ\mathbf{\rho}_{\mathbf{0}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{0}} \right\rangle
  7. ๐Ÿ๐จ๐ซ๐ข=0,1,2,3โ€ฆ\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}
  8. ใ€€ใ€€๐ฉ๐ข=๐ฌ+๐›ƒ๐ขโˆ’1๐ฉ๐ขโˆ’1\mathbf{p}^{\mathbf{i}}\mathbf{= s +}\mathbf{\beta}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  9. ใ€€ใ€€๐ช๐ข=๐€๐ฉ๐ข\mathbf{q}^{\mathbf{i}}\mathbf{= A}\mathbf{p}^{\mathbf{i}}
  10. ใ€€ใ€€๐›„=โŸจ๐ฉ๐ข,๐ช๐ขโŸฉ\mathbf{\gamma =}\left\langle \mathbf{p}^{\mathbf{i}}\mathbf{,}\mathbf{q}^{\mathbf{i}} \right\rangle
  11. ใ€€ใ€€๐ฑ๐ข=๐ฑ๐ขโˆ’1+๐›‚๐ขโˆ’1๐ฉ๐ขโˆ’1\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  12. ใ€€ใ€€๐›‚๐ข=๐›’๐ข/๐›„\mathbf{\alpha}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i}}\mathbf{\ /\ \gamma}
  13. ใ€€ใ€€๐ซ๐ข+1=๐ซ๐ขโˆ’๐›‚๐ข๐ช๐ข\mathbf{r}^{\mathbf{i + 1}}\mathbf{=}\mathbf{r}^{\mathbf{i}}\mathbf{-}\mathbf{\alpha}_{\mathbf{i}}\mathbf{q}^{\mathbf{i}}
  14. ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฌ=๐ซ๐ข+1\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{i + 1}}
  15. ใ€€ใ€€๐›’๐ข+1=โŸจ๐ฌ,๐ซ๐ข+1โŸฉ\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{i + 1}} \right\rangle
  16. ใ€€ใ€€๐ข๐Ÿโˆฅ๐ซ๐ข+1โˆฅ/โˆฅ๐›โˆฅ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ง๐จ๐ฎ๐ ๐ก๐ญ๐ก๐ž๐ง\mathbf{{if\ }}\left\| \mathbf{r}^{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ the}}\mathbf{n}
  17. ใ€€ใ€€ใ€€ใ€€๐ฑ๐ข=๐ฑ๐ขโˆ’1+๐›‚๐ขโˆ’1๐ฉ๐ขโˆ’1\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  18. ใ€€ใ€€ใ€€ใ€€๐ช๐ฎ๐ข๐ญ\mathbf{{qui}}\mathbf{t}
  19. ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endi}}\mathbf{f}
  20. ใ€€ใ€€๐›ƒ๐ข=๐›’๐ข+1/๐›’๐ข\mathbf{\beta}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{\ /\ }\mathbf{\rho}_{\mathbf{i}}
  21. ๐ž๐ง๐\mathbf{{\ en}}\mathbf{d}

1.2 The Preconditioned Biconjugate Gradient Stabilized Method

The Biconjugate Gradient Stabilized (Bi-CGSTAB) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices.

  1. ๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐ซ0=๐›โˆ’๐€๐ฑ0๐Ÿ๐จ๐ซ๐ฌ๐จ๐ฆ๐ž๐ข๐ง๐ข๐ญ๐ข๐š๐ฅ๐ ๐ฎ๐ž๐ฌ๐ฌ๐ฑโˆ’1=๐ฑ0\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}
  2. ๐ฉ0=๐ซ0\mathbf{p}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}
  3. ๐œ1=โŸจ๐ซ0,๐ซ0โŸฉ\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}} \right\rangle
  4. ๐Ÿ๐จ๐ซ๐ข=0,1,2,3โ€ฆ\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}
  5. ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฉฬ‚๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฉฬ‚=๐ฉ๐ข\mathbf{{solve\ }}\widehat{\mathbf{p}}\mathbf{{\ from\ K}}\widehat{\mathbf{p}}\mathbf{= \ }\mathbf{p}_{\mathbf{i}}
  6. ใ€€ใ€€๐ช=๐€๐ฉฬ‚\mathbf{q = A}\widehat{\mathbf{p}}
  7. ใ€€ใ€€๐œ2=โŸจ๐ซ0,๐ชโŸฉ\mathbf{c}_{\mathbf{2}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,q} \right\rangle
  8. ใ€€ใ€€๐›‚=๐œ1/๐œ2\mathbf{\alpha =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\mathbf{c}_{\mathbf{2}}
  9. ใ€€ใ€€๐ž=๐ซ๐ขโˆ’๐›‚๐ช\mathbf{e =}\mathbf{r}_{\mathbf{i}}\mathbf{- \alpha q}
  10. ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐žฬ‚๐Ÿ๐ซ๐จ๐ฆ๐Š๐žฬ‚=๐ž\mathbf{{solve\ }}\widehat{\mathbf{e}}\mathbf{{\ from\ K}}\widehat{\mathbf{e}}\mathbf{= e}
  11. ใ€€ใ€€๐ฏ=๐€๐žฬ‚\mathbf{v = A}\widehat{\mathbf{e}}
  12. ใ€€ใ€€๐œ3=โŸจ๐ž,๐ฏโŸฉ/โŸจ๐ฏ,๐ฏโŸฉ\mathbf{c}_{\mathbf{3}}\mathbf{=}\left\langle \mathbf{e,v} \right\rangle\mathbf{\ /\ }\left\langle \mathbf{v,v} \right\rangle
  13. ใ€€ใ€€๐ฑ๐ข+1=๐ฑ๐ข+๐›‚๐ฉฬ‚+๐œ3๐žฬ‚\mathbf{x}_{\mathbf{i + 1}}\mathbf{=}\mathbf{x}_{\mathbf{i}}\mathbf{+ \alpha}\widehat{\mathbf{p}}\mathbf{+}\mathbf{c}_{\mathbf{3}}\widehat{\mathbf{e}}
  14. ใ€€ใ€€๐ซ๐ข+1=๐žโˆ’๐œ3๐ฏ\mathbf{r}_{\mathbf{i + 1}}\mathbf{= e -}\mathbf{c}_{\mathbf{3}}\mathbf{v}
  15. ใ€€ใ€€๐œ1=โŸจ๐ซ0,๐ซ๐ข+1โŸฉ\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{i + 1}} \right\rangle
  16. ใ€€ใ€€๐ข๐Ÿโˆฅ๐ซ๐ข+1โˆฅ/โˆฅ๐›โˆฅ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ง๐จ๐ฎ๐ ๐ก๐ญ๐ก๐ž๐ง\mathbf{{if\ }}\left\| \mathbf{r}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{the}}\mathbf{n}
  17. ใ€€ใ€€ใ€€ใ€€๐ช๐ฎ๐ข๐ญ\mathbf{{quit}}
  18. ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  19. ใ€€ใ€€๐›ƒ=๐œ1/(๐œ2๐œ3)\mathbf{\beta =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\left( \mathbf{c}_{\mathbf{2}}\mathbf{c}_{\mathbf{3}} \right)
  20. ใ€€ใ€€๐ฉ๐ข+1=๐ซ๐ข+1+๐›ƒ(๐ฉ๐ขโˆ’๐œ3๐ช)\mathbf{p}_{\mathbf{i + 1}}\mathbf{=}\mathbf{r}_{\mathbf{i + 1}}\mathbf{+ \beta}\left( \mathbf{p}_{\mathbf{i}}\mathbf{-}\mathbf{c}_{\mathbf{3}}\mathbf{q} \right)
  21. ๐ž๐ง๐\mathbf{{end}}

1.3 The Preconditioned Generalized Minimum Residual Method

The Generalized Minimum Residual (GMRES(m)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. GMRES(m) is restarted periodically after mm iterations if it did not converge within mm iterations, the solution obtained until mm iterations will be used as an input for the new restart cycle. PARCEL provides variants of GMRES with the Classical Gram-Schmidt (sometimes referred to as standard Gram-Schmidt) and Modified Gram-Schmidt (MGS) orthogonalization methods.

๐‡๐ง\mathbf{H}_{\mathbf{n}} is an upper triangular matrix with ๐ก๐ฃ,๐ค\mathbf{h}_{\mathbf{j,k}} representing its elements, ๐ž๐ข\mathbf{\ }\mathbf{e}_{\mathbf{i}} represents a vector consisting of the first i elements of the vector ๐ž\mathbf{e}.

  1. ๐Ÿ๐จ๐ซ๐ฃ=0,1,2,3โ€ฆ\mathbf{for\ j = 0,1,2,3}\mathbf{\ldots}
  2. ใ€€ใ€€๐ซ=๐›โˆ’๐€๐ฑ0\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}
  3. ใ€€ใ€€๐ฏ1=โˆ’๐ซ/โˆฅ๐ซโˆฅ\mathbf{v}_{\mathbf{1}}\mathbf{= - r\ /\ }\left\| \mathbf{r} \right\|
  4. ใ€€ใ€€๐ž=(โˆ’โˆฅ๐ซโˆฅ,0,โ€ฆ,0)๐“\mathbf{e =}\left( \mathbf{-}\left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}
  5. ใ€€ใ€€๐ง=๐ฆ\mathbf{n = m}
  6. ใ€€ใ€€๐Ÿ๐จ๐ซ๐ข=1,2,3โ€ฆ๐ฆ\mathbf{for\ i = 1,2,3}\mathbf{\ldots}\mathbf{m}
  7. ใ€€ ใ€€ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฏฬ‚๐ข๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฏฬ‚๐ข=๐ฏ๐ข\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{=}\mathbf{v}_{\mathbf{i}}
  8. ใ€€ใ€€ ใ€€ใ€€๐›š=๐€๐ฏฬ‚๐ข\mathbf{\omega = A}{\widehat{\mathbf{v}}}_{\mathbf{i}}
  9. ใ€€ใ€€ใ€€ ใ€€๐Ÿ๐จ๐ซ๐ค=1,2,3โ€ฆ๐ข\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i}
  10. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ก๐ค,๐ข=โŸจ๐›š,๐ฏ๐คโŸฉ\mathbf{h}_{\mathbf{k,i}}\mathbf{=}\left\langle \mathbf{\omega,}\mathbf{v}_{\mathbf{k}} \right\rangle
  11. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐›š=๐›šโˆ’๐ก๐ค,๐ข๐ฏ๐ค\mathbf{\omega = \omega -}\mathbf{h}_{\mathbf{k,i}}\mathbf{v}_{\mathbf{k}}
  12. ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐\mathbf{{end}}
  13. ใ€€ใ€€ใ€€ใ€€๐ก๐ข+1,๐ข=โˆฅ๐›šโˆฅ\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{=}\left\| \mathbf{\omega} \right\|
  14. ใ€€ใ€€ใ€€ใ€€๐ฏ๐ข+1=๐›š/โˆฅ๐›šโˆฅ\mathbf{v}_{\mathbf{i + 1}}\mathbf{= \omega/}\left\| \mathbf{\omega} \right\|
  15. ใ€€ใ€€ใ€€ใ€€๐Ÿ๐จ๐ซ๐ค=1,2,3โ€ฆ๐ขโˆ’1\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i - 1}
  16. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€(๐ก๐ค,๐ข๐ก๐ค+1,๐ข)=(๐œ๐ขโˆ’๐ฌ๐ข๐ฌ๐ข๐œ๐ข)(๐ก๐ค,๐ข๐ก๐ค+1,๐ข)\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}
  17. ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐\mathbf{{end}}
  18. ใ€€ใ€€ใ€€ใ€€๐œ๐ข=11+(๐ก๐ข+1,๐ข๐ก๐ข,๐ข)2\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  19. ใ€€ใ€€ใ€€ใ€€๐ฌ๐ข=โˆ’๐ก๐ข+1,๐ข๐ก๐ข,๐ข11+(๐ก๐ข+1,๐ข๐ก๐ข,๐ข)2\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  20. ใ€€ใ€€ใ€€ใ€€(๐ž๐ข๐ž๐ข+1)=(๐œ๐ขโˆ’๐ฌ๐ข๐ฌ๐ข๐œ๐ข)(๐ž๐ข๐ž๐ข+1)\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}
  21. ใ€€ใ€€ใ€€ใ€€๐ก๐ข,๐ข=๐œ๐ข๐ก๐ข,๐ขโˆ’๐ฌ๐ข๐ก๐ข+1,๐ข\mathbf{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathbf{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathbf{h}_{\mathbf{i + 1,i}}
  22. ใ€€ใ€€ใ€€ใ€€๐ก๐ข+1,๐ข=0\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{= 0}
  23. ใ€€ใ€€ใ€€ใ€€๐ข๐Ÿโˆฅ๐ž๐ข+1โˆฅ/โˆฅ๐›โˆฅ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ง๐จ๐ฎ๐ ๐ก๐ญ๐ก๐ž๐ง\mathbf{{if\ }}\left\| \mathbf{e}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}
  24. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ง=๐ข\mathbf{n = i}
  25. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ž๐ฑ๐ข๐ญ\mathbf{{exit}}
  26. ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  27. ใ€€ใ€€๐ž๐ง๐\mathbf{{end}}
  28. ใ€€ใ€€๐ฒ๐ง=๐‡๐งโˆ’1๐ž๐ง\mathbf{y}_{\mathbf{n}}\mathbf{=}\mathbf{H}_{\mathbf{n}}^{\mathbf{- 1}}\mathbf{e}_{\mathbf{n}}
  29. ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฑฬ‚๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฑฬ‚=โˆ‘๐ค=1๐ง๐ฒ๐ค๐ฏ๐ค\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 1}}^{\mathbf{n}}{\mathbf{y}_{\mathbf{k}}\mathbf{v}_{\mathbf{k}}}
  30. ใ€€ใ€€๐ฑ๐ง=๐ฑ0+๐ฑฬ‚\mathbf{x}_{\mathbf{n}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}
  31. ใ€€ใ€€๐ฑ0=๐ฑ๐ง\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{n}}
  32. ๐ž๐ง๐\mathbf{{end}}

1.4 The Preconditioned Communication Avoiding Generalized Minimum Residual Method

The Communication Avoiding Generalized Minimum Residual (CA-GMRES(s,t)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. The calculation which is done in ss iterations in the GMRES(m) method is equivalent to one iteration of the CA-GMRES method. CA-GMRES(s,t) is restarted periodically after tt iterations if it did not converge within tt iterations, the solution obtained until mm iterations will be used as an input for the new restart cycle. CA-GMRES(s,t) is equivalent to GMRES(m) if ss and tt are chosen so that m=tร—sm = t \times s. The convergence property of CA-GMRES(s,t) is the same as GMRES(m) in exact arithmetic. In addition, the number of global collective communication calls is reduced by communication avoiding QR factorization. As CA-GMRES(s,t) produces ss basis vectors at once, when ss is too large, the linear independency of the basic vectors may become worse because of round off errors, leading to worse convergence property than GMRES(m). In order to improve the orthogonality of the basic vectors, the PARCEL implementation of CA-GMRES(s,t) provides an option to use the Newton basis in addition to the monomial basis.

๐ž\mathbf{e} represents a unit vector, ๐›’ฬƒ๐ค=๐‘๐ค{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}} represents the ss-th row and ss-th column element of the matrix ๐‘๐ค\mathbf{R}_{\mathbf{k}}, and ๐„\mathbf{E} represents the eigenvalues obtained from the Hessenberg matrix, which is generated by the iteration process.

  1. ๐๐คโˆผ=[๐ž2,๐ž3,โ€ฆ,๐ž๐ฌ+1]\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{= \lbrack}\mathbf{e}_{\mathbf{2}}\mathbf{,}\mathbf{e}_{\mathbf{3}}\mathbf{,\ldots,}\mathbf{e}_{\mathbf{s + 1}}\mathbf{\rbrack}
  2. ๐Ÿ๐จ๐ซ๐ฃ=0,1,2,3โ€ฆ\mathbf{for\ j = 0,1,2,3\ldots}
  3. ใ€€ใ€€๐ซ=๐›โˆ’๐€๐ฑ0\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}
  4. ใ€€ใ€€๐ฏ1=๐ซ/โˆฅ๐ซโˆฅ\mathbf{v}_{\mathbf{1}}\mathbf{= r\ /\ }\left\| \mathbf{r} \right\|
  5. ใ€€ใ€€๐›‡=(โˆฅ๐ซโˆฅ,0,โ€ฆ,0)๐“\mathbf{\zeta =}\left( \left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}
  6. ใ€€ใ€€๐Ÿ๐จ๐ซ๐ค=0,1โ€ฆ,๐ญโˆ’1\mathbf{for\ k = 0,1\ldots,t - 1}
  7. ใ€€ใ€€ใ€€ใ€€๐…๐ข๐ฑ๐›๐š๐ฌ๐ข๐ฌ๐œ๐จ๐ง๐ฏ๐ž๐ซ๐ฌ๐ข๐จ๐ง๐ฆ๐š๐ญ๐ซ๐ข๐ฑ[๐๐คโˆผ,๐„]\mathbf{Fix\ basis\ conversion\ matrix\ \lbrack}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,E\rbrack}
  8. ใ€€ใ€€ใ€€ใ€€๐‚๐จ๐ฆ๐ฎ๐ฉ๐ฎ๐ญ๐ž๐›๐š๐ฌ๐ข๐œ๐ฏ๐ž๐œ๐ญ๐จ๐ซ[๐ฌ,๐•ฬ๐คโˆผ,๐๐คโˆผ]\mathbf{Comupute\ \ basic\ vector\ \ \lbrack\ s,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{\ \rbrack}
  9. ใ€€ใ€€ใ€€ใ€€๐ข๐Ÿ(๐ค.๐ž๐ช.0)\mathbf{if(k.eq.0)}
  10. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐‘๐๐ž๐œ๐จ๐ฆ๐ฉ๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง๐•0โˆผ=๐0โˆผ๐‘0โˆผ\mathbf{{QR\ decomposition\ }}\mathbf{V}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}
  11. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐””0โˆผ=๐0โˆผ\mathfrak{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}
  12. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ„Œ0โˆผ=๐‘0โˆผ๐0โˆผ๐‘0โˆ’1\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{- 1}}
  13. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ„‹0=โ„Œ0โˆผ\mathcal{H}_{\mathbf{0}}\mathbf{=}\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}
  14. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐Ÿ๐จ๐ซ๐จ=1,2โ€ฆ,๐ฌ\mathbf{for\ o = 1,2\ldots,s}
  15. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐†๐ข๐ฏ๐ž๐ง๐ฌ๐ซ๐จ๐ญ๐š๐ญ๐ข๐จ๐ง[๐จ,โ„‹0,๐›‡]\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta\rbrack}
  16. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ข๐Ÿโˆฅ๐›‡๐จ+1โˆฅ/โˆฅ๐›โˆฅ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ง๐จ๐ฎ๐ ๐ก๐ญ๐ก๐ž๐ง\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}
  17. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ฎ๐ฉ๐๐š๐ญ๐ž๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฏ๐ž๐œ๐ญ๐จ๐ซ[๐ฌ,๐ค,๐จ,๐•ฬ0โˆผ,๐ฬ0โˆผ,๐‘0,โ„‹0,๐›‡,๐ฑ0]\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{Q}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}\mathbf{R}_{\mathbf{0}}\mathbf{,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  18. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ช๐ฎ๐ข๐ญ\mathbf{{quit}}
  19. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  20. ใ€€ใ€€ใ€€ใ€€๐ž๐ฅ๐ฌ๐ž\mathbf{{else}}
  21. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ„œฬ๐คโˆ’1,๐คโˆผ=(๐””๐คโˆ’1โˆผ)๐‡๐•ฬ๐คโˆผ{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{=}\left( \mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}} \right)^{\mathbf{H}}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}
  22. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐•ฬ๐คโˆผ=๐•ฬ๐คโˆผโˆ’๐””๐คโˆ’1โˆผโ„œฬ๐คโˆ’1,๐คโˆผ{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{-}\mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}}{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}
  23. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐‘๐๐ž๐œ๐จ๐ฆ๐ฉ๐จ๐ฌ๐ข๐ญ๐ข๐จ๐ง๐•ฬ๐คโˆผ=๐ฬ๐คโˆผ๐‘ฬ๐คโˆผ\mathbf{{QR\ decomposition\ }}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{Q}}}_{\mathbf{k}}^{\mathbf{\sim}}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}
  24. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐‘๐คโˆผ=(๐‘๐ค๐ณ๐ค01,๐ค๐›’)\mathbf{R}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathbf{R}_{\mathbf{k}} & \mathbf{z}_{\mathbf{k}} \\ \mathbf{0}_{\mathbf{1,k}} & \mathbf{\rho} \\ \end{pmatrix}
  25. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ„Œ๐คโˆ’1,๐คโˆผ=โˆ’โ„Œ๐คโˆ’1โ„œ๐คโˆ’1,๐ค๐‘๐คโˆ’1+โ„œ๐คโˆ’1,๐คโˆผ๐๐คโˆผ๐‘๐คโˆ’1\mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{= -}\mathfrak{H}_{\mathbf{k - 1}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}\mathfrak{R}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}
  26. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐‡๐ค=๐‘๐ค๐๐ค๐‘๐คโˆ’1+๐›’ฬƒ๐คโˆ’1๐›๐ค๐ณ๐ค๐ž๐ฌ๐“โˆ’๐ก๐คโˆ’1๐ž1๐ž๐ฌ(๐คโˆ’1)๐“โ„œ๐คโˆ’1,๐ค๐‘๐คโˆ’1\mathbf{H}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}\mathbf{B}_{\mathbf{k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{b}_{\mathbf{k}}\mathbf{z}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}}\mathbf{-}\mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}
  27. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ก๐ค=๐›’ฬƒ๐คโˆ’1๐›’๐ค๐›๐ค\mathbf{h}_{\mathbf{k}}\mathbf{=}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{\rho}_{\mathbf{k}}\mathbf{b}_{\mathbf{k}}
  28. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ„Œ๐คโˆผ=(โ„Œ๐คโˆ’1โ„Œ๐คโˆ’1,๐คโˆผ๐ก๐คโˆ’1๐ž1๐ž๐ฌ(๐คโˆ’1)๐“01,๐ฌ(๐คโˆ’1)๐‡๐ค๐ก๐ค๐ž๐ฌ๐“)\mathfrak{H}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1}} & \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \begin{matrix} \mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}} \\ \mathbf{0}_{\mathbf{1,s(k - 1)}} \\ \end{matrix} & \begin{matrix} \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{matrix} \\ \end{pmatrix}
  29. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ„‹๐ค=(โ„Œ๐คโˆ’1,๐คโˆผ๐‡๐ค๐ก๐ค๐ž๐ฌ๐“)\mathcal{H}_{\mathbf{k}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{pmatrix}
  30. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐Ÿ๐จ๐ซ๐จ=1+๐ฌ๐ค,โ€ฆ,๐ฌ(๐ค+1)\mathbf{for\ o = 1 + sk,\ldots,s(k + 1)}
  31. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐†๐ข๐ฏ๐ž๐ง๐ฌ๐ซ๐จ๐ญ๐š๐ญ๐ข๐จ๐ง[๐จ,โ„‹๐ค,๐›‡]\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta\rbrack}
  32. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ข๐Ÿโˆฅ๐›‡๐จ+1โˆฅ/โˆฅ๐›โˆฅ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ง๐จ๐ฎ๐ ๐ก๐ญ๐ก๐ž๐ง\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}
  33. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ฎ๐ฉ๐๐š๐ญ๐ž๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฏ๐ž๐œ๐ญ๐จ๐ซ[๐ฌ,๐ค,๐จ,๐•ฬ๐คโˆผ,๐””๐คโˆผ,๐‘ฬ๐คโˆผ,โ„‹๐ค,๐›‡,๐ฑ0]\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  34. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ช๐ฎ๐ข๐ญ\mathbf{{quit}}
  35. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  36. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐\mathbf{{end}}
  37. ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  38. ใ€€ใ€€๐ž๐ง๐\mathbf{{end}}
  39. ใ€€ใ€€๐ฎ๐ฉ๐๐š๐ญ๐ž๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฏ๐ž๐œ๐ญ๐จ๐ซ[๐ฌ,๐ญโˆ’1,๐ฌ๐ญ,๐•ฬ๐ญโˆ’1โˆผ,๐””๐ญโˆ’1โˆผ,๐‘ฬ๐ญโˆ’1โˆผ,โ„‹๐ญโˆ’1,๐›‡,๐ฑ0]\mathbf{update\ \ solution\ \ vector\lbrack s,t - 1,st,}{\acute{\mathbf{V}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{t - 1}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  40. ๐ž๐ง๐\mathbf{{end}}

  1. ๐ข๐Ÿ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐๐ž๐ฐ๐ญ๐จ๐ง๐๐š๐ฌ๐ข๐ฌ\mathbf{{if\ compute\ Newton\ Basis}}
  2. ใ€€ใ€€๐ข=0\mathbf{i = 0}
  3. ใ€€ใ€€๐ฐ๐ก๐ข๐ฅ๐ž(๐ขโ‰ค๐ฌโˆ’1)\mathbf{{while}}\left( \mathbf{i \leq s - 1} \right)
  4. ใ€€ใ€€ใ€€ใ€€๐ข๐Ÿ(๐ข.๐ž๐ช.๐ฌโˆ’1)๐ญ๐ก๐ž๐ง\mathbf{{if}}\left( \mathbf{i.eq.s - 1} \right)\mathbf{{then}}
  5. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐ข,๐ข=๐‘๐ž[๐„๐ข]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  6. ใ€€ใ€€ใ€€ใ€€๐ž๐ฅ๐ฌ๐ž\mathbf{{else}}
  7. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ข๐Ÿ(๐ˆ๐ฆ[๐„๐ข].๐ž๐ช.0)๐ญ๐ก๐ž๐ง\mathbf{{if}}\left( \mathbf{{Im}}\left\lbrack \mathbf{E}_{\mathbf{i}} \right\rbrack\mathbf{.eq.0} \right)\mathbf{{then}}
  8. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐ข,๐ข=๐‘๐ž[๐„๐ข]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  9. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ž๐ฅ๐ฌ๐ž\mathbf{{else}}
  10. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐ข,๐ข=๐‘๐ž[๐„๐ข]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  11. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐ข+1,๐ข+1=๐‘๐ž[๐„๐ข]\mathbf{B}_{\mathbf{i + 1,i + 1}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  12. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐๐ข,๐ข+1=โˆ’(๐ˆ๐ฆ[๐„๐ข])2\mathbf{B}_{\mathbf{i,i + 1}}\mathbf{= -}\left( \mathbf{Im\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack} \right)^{\mathbf{2}}
  13. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ข=๐ข+1\mathbf{i = i + 1}
  14. ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  15. ใ€€ใ€€ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  16. ใ€€ใ€€ใ€€ใ€€๐ข=๐ข+1\mathbf{i = i + 1}
  17. ใ€€ใ€€๐ž๐ง๐\mathbf{{end\ \ }}
  18. ๐ž๐ง๐\mathbf{{end\ \ }}

The elements of the matrix ๐\mathbf{B} are represented as ๐›๐ค,๐ข\mathbf{b}_{\mathbf{k,i}}

  1. ๐Ÿ๐จ๐ซ๐ค=1,2,3โ€ฆ๐ฌ\mathbf{for\ k = 1,2,3\ldots s}
  2. ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฏฬ‚๐คโˆ’1๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฏฬ‚๐คโˆ’1=๐ฏ๐คโˆ’1\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{=}\mathbf{v}_{\mathbf{k - 1}}
  3. ใ€€ใ€€๐ข๐Ÿ(๐ค.๐ง๐ž.1)๐ญ๐ก๐ž๐ง\mathbf{if(k.ne.1)then}
  4. ใ€€ใ€€ใ€€ใ€€๐›‚=๐›๐คโˆ’1,๐คโˆ’1\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}
  5. ใ€€ใ€€ใ€€ใ€€๐›ƒ=๐›๐คโˆ’2,๐คโˆ’1\mathbf{\beta =}\mathbf{b}_{\mathbf{k - 2,k - 1}}
  6. ใ€€ใ€€ใ€€ใ€€๐ฏ๐ค=๐€๐ฏฬ‚๐คโˆ’1โˆ’๐›‚๐ฏ๐คโˆ’1+๐›ƒ๐ฏ๐คโˆ’2\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}\mathbf{+ \beta}\mathbf{v}_{\mathbf{k - 2}}
  7. ใ€€ใ€€๐ž๐ฅ๐ฌ๐ž\mathbf{{else}}
  8. ใ€€ใ€€ใ€€ใ€€๐›‚=๐›๐คโˆ’1,๐คโˆ’1\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}
  9. ใ€€ใ€€ใ€€ใ€€๐ฏ๐ค=๐€๐ฏฬ‚๐คโˆ’1โˆ’๐›‚๐ฏ๐คโˆ’1\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}
  10. ใ€€ใ€€๐ž๐ง๐๐ข๐Ÿ\mathbf{{endif}}
  11. ๐ž๐ง๐\mathbf{{en}}\mathbf{d}

ใ€€The elements of the matrix โ„‹\mathcal{H} are represented as ๐’ฝ๐ค,๐ข\mathcal{h}_{\mathbf{k,i}}.

  1. ๐Ÿ๐จ๐ซ๐ค=1,2,3โ€ฆ๐ขโˆ’1\mathbf{for\ k = 1,2,3\ldots i - 1}
  2. ใ€€ใ€€(๐’ฝ๐ค,๐ข๐’ฝ๐ค+1,๐ข)=(๐œ๐ขโˆ’๐ฌ๐ข๐ฌ๐ข๐œ๐ข)(๐’ฝ๐ค,๐ข๐’ฝ๐ค+1,๐ข)\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}
  3. end\mathbf{\text{end}}
  4. ๐œ๐ข=11+(๐’ฝ๐ข+1,๐ข๐’ฝ๐ข,๐ข)2\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  5. ๐ฌ๐ข=โˆ’๐’ฝ๐ข+1,๐ข๐’ฝ๐ข,๐ข11+(๐’ฝ๐ข+1,๐ข๐’ฝ๐ข,๐ข)2\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  6. (๐›‡๐ข๐›‡๐ข+1)=(๐œ๐ขโˆ’๐ฌ๐ข๐ฌ๐ข๐œ๐ข)(๐›‡๐ข๐›‡๐ข+1)\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}
  7. ๐’ฝ๐ข,๐ข=๐œ๐ข๐’ฝ๐ข,๐ขโˆ’๐ฌ๐ข๐’ฝ๐ข+1,๐ข\mathcal{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathcal{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathcal{h}_{\mathbf{i + 1,i}}
  8. ๐’ฝ๐ข+1,๐ข=0\mathcal{h}_{\mathbf{i + 1,i}}\mathbf{= 0}

  1. ๐ฒ=โ„‹โˆ’1๐›‡\mathbf{y =}\mathcal{H}^{\mathbf{- 1}}\mathbf{\zeta}
  2. ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฑฬ‚๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฑฬ‚=โˆ‘๐ค=0๐ฌ๐ฆโˆ’1๐๐ค๐ฒ๐ค+โˆ‘๐ฅ=๐ฌ๐ฆ๐งโˆ’1๐•๐ฅโˆ’๐ฌ๐ฆโˆ‘๐ค=๐ฌ๐ฆ๐งโˆ’1๐‘๐ฅโˆ’๐ฌ๐ฆ,๐คโˆ’๐ฌ๐ฆโˆ’1๐ฒ๐ค\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 0}}^{\mathbf{s}\mathbf{m}\mathbf{- 1}}\mathbf{Q}_{\mathbf{k}}\mathbf{y}_{\mathbf{k}}\mathbf{+}\sum_{\mathbf{l =}\mathbf{{sm}}}^{\mathbf{n - 1}}\mathbf{V}_{\mathbf{l}\mathbf{- sm}}\sum_{\mathbf{k =}\mathbf{{sm}}}^{\mathbf{n - 1}}{\mathbf{R}_{\mathbf{l - sm}\mathbf{,}\mathbf{k}\mathbf{- sm}}^{\mathbf{- 1}}\mathbf{y}_{\mathbf{k}}}
  3. ๐ฑ0=๐ฑ0+๐ฑฬ‚\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}

1.5 The Preconditioned Chebyshev basis Conjugate Gradient Method

The Chebyshev basis Conjugate Gradient (CBCG) method is an iterative Krylov subspace algorithm for solving symmetric matrices. The CBCG method calculates kk iterations of the CG method in one iteration. By this approach, the CBCG method reduces the global collective communication calls. In order to use the Chebyshev basis the largest and smallest eigenvalues are needed. PARCEL provides two methods to obtain these eigenvalues, the so called power method and the communication avoiding Arnoldi method.

  1. ๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐ซ0=๐›โˆ’๐€๐ฑ0๐Ÿ๐จ๐ซ๐ฌ๐จ๐ฆ๐ž๐ข๐ง๐ข๐ญ๐ข๐š๐ฅ๐ ๐ฎ๐ž๐ฌ๐ฌ๐ฑโˆ’1=๐ฑ0\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ gue}}\mathbf{{ss\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}
  2. ๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐‚๐ก๐ž๐›๐ฒ๐ฌ๐ก๐ž๐ฏ๐›๐š๐ฌ๐ข๐ฌ[๐’0,๐€๐’0,๐ซ0,๐›Œmax,๐›Œmin]\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}
  3. ๐0=๐’0\mathbf{Q}_{\mathbf{0}}\mathbf{=}\mathbf{S}_{\mathbf{0}}
  4. ๐Ÿ๐จ๐ซ๐ข=0,1,2,3โ€ฆ\mathbf{for\ i = 0,1,2,3\ldots}
  5. ๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐๐ข๐“๐€๐๐ข,๐๐ข๐“๐ซ๐ข๐ค\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}}\mathbf{\ ,\ }\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}}
  6. ใ€€ใ€€๐›‚๐ข=(๐๐ข๐“๐€๐๐ข)โˆ’1(๐๐ข๐“๐ซ๐ข๐ค)\mathbf{\alpha}_{\mathbf{i}}\mathbf{= \ }\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}} \right)
  7. ใ€€ใ€€๐ฑ(๐ข+1)๐ค=๐ฑ๐ข๐ค+๐๐ข๐›‚๐ข\mathbf{x}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{x}_{\mathbf{{ik}}}\mathbf{+}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}
  8. ใ€€ใ€€๐ซ(๐ข+1)๐ค=๐ซ๐ข๐คโˆ’๐€๐๐ข๐›‚๐ข\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{r}_{\mathbf{{ik}}}\mathbf{- A}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}
  9. ใ€€ใ€€๐ข๐Ÿโˆฅ๐ซ(๐ข+1)๐คโˆฅ/โˆฅ๐›โˆฅ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ง๐จ๐ฎ๐ ๐ก๐ญ๐ก๐ž๐ง\mathbf{{if\ }}\left\| \mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}
  10. ใ€€ใ€€๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐‚๐ก๐ž๐›๐ฒ๐ฌ๐ก๐ž๐ฏ๐›๐š๐ฌ๐ข๐ฌ[๐’๐ข+1,๐€๐’๐ข+1,๐ซ(๐ข+1)๐ค,๐›Œmax,๐›Œmin]\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{i + 1}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{i + 1}}\mathbf{,}\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}
  11. ใ€€ใ€€๐‚๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž๐๐ข๐“๐€๐’๐ข+1\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}}
  12. ใ€€ใ€€๐๐ข=(๐๐ข๐“๐€๐๐ข)โˆ’1(๐๐ข๐“๐€๐’๐ข+1)\mathbf{B}_{\mathbf{i}}\mathbf{=}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}} \right)\mathbf{\ }
  13. ใ€€ใ€€๐๐ข+1=๐’๐ข+1โˆ’๐๐ข๐๐ข\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{S}_{\mathbf{i + 1}}\mathbf{-}\mathbf{Q}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}
  14. ใ€€ใ€€๐€๐๐ข+1=๐€๐’๐ข+1โˆ’๐€๐๐ข๐๐ข\mathbf{A}\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{{AS}}_{\mathbf{i + 1}}\mathbf{-}\mathbf{{AQ}}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}
  15. ๐ž๐ง๐\mathbf{{end}}

๐›Œmax\mathbf{\lambda}_{\mathbf{\max}}, ๐›Œmin\mathbf{\lambda}_{\mathbf{\min}}represent the largest and smallest eigenvalues of ๐€๐Šโˆ’1\mathbf{AK}^{-1}.

  1. ๐›ˆ=2/(๐›Œmaxโˆ’๐›Œmin)\mathbf{\eta = 2\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}
  2. ๐›‡=(๐›Œmax+๐›Œmin)/(๐›Œmaxโˆ’๐›Œmin)\mathbf{\zeta =}\left( \mathbf{\lambda}_{\mathbf{\max}}\mathbf{+}\mathbf{\lambda}_{\mathbf{\min}} \right)\mathbf{\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}
  3. ๐ฌ0=๐ซ0\mathbf{s}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}\mathbf{\ }
  4. ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌฬƒ0๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฌฬƒ0=๐ฌ0\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{= \ }\mathbf{s}_{\mathbf{0}}
  5. ๐ฌ1=๐›ˆ๐€๐ฌฬƒ0โˆ’๐›‡๐ฌ0\mathbf{s}_{\mathbf{1}}\mathbf{= \eta A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{- \zeta}\mathbf{s}_{\mathbf{0}}
  6. ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌฬƒ1๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฌฬƒ1=๐ฌ1\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{= \ }\mathbf{s}_{\mathbf{1}}
  7. ๐Ÿ๐จ๐ซ๐ฃ=2,3,โ€ฆ,๐ค\mathbf{for\ j = 2,3,\ldots,k}
  8. ใ€€ใ€€๐ฌ๐ฃ=2๐›ˆ๐€๐ฌฬƒ๐ฃโˆ’2๐›‡๐ฌ๐ฃโˆ’1โˆ’๐ฌ๐ฃโˆ’2\mathbf{s}_{\mathbf{j}}\mathbf{= 2\eta A}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{- 2\zeta}\mathbf{s}_{\mathbf{j - 1}}\mathbf{-}\mathbf{s}_{\mathbf{j - 2}}
  9. ใ€€ใ€€๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌฬƒ๐ฃ๐Ÿ๐ซ๐จ๐ฆ๐Š๐ฌฬƒ๐ฃ=๐ฌ๐ฃ\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{= \ }\mathbf{s}_{\mathbf{j}}
  10. ๐ž๐ง๐\mathbf{{end}}
  11. ๐’=(๐ฌฬƒ0,๐ฌฬƒ1,โ€ฆ,๐ฌฬƒ๐คโˆ’1)\mathbf{S = (}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}
  12. ๐€๐’=(๐€๐ฌฬƒ0,๐€๐ฌฬƒ1,โ€ฆ,๐€๐ฌฬƒ๐คโˆ’1)\mathbf{AS = (A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}

2 Preconditioning

In iterative algorithms, the application of a preconditioner of the form Kโ‰ˆAK \approx A can help to reduce the number of iterations until convergence. The notation [solvepฬ‚fromKpฬ‚=p{{solve\ }}\widehat{{p}}{{\ from\ K}}\widehat{{p}}{= \ }{p}] means [approximatelysolveApฬ‚=papproximately solve {A}\widehat{p}{= \ }{p} with respect to the vector pฬ‚\widehat{p}]. Although the more accurate approximation leads to the less number of iterations until convergence, the cost of preconditioner is normally increased. In non-parallel computing, one of the most common preconditioners is the incomplete LU factorization (ILU). However, in parallel computing, parallel preconditioners are needed. PARCEL provides the following parallel preconditioners: Point Jacobi, Block Jacobi and Additive Schwarz.

2.1 Point Jacobi Preconditioner

The point Jacobi preconditioner is one of the most simplest preconditioners. KK only consists of the diagonal elements of the matrix AA. Compared to other preconditioners the efficiency of the point Jacobi preconditioner is very low. However, it can be easily applied in parallel and it does not require any additional communication between processors.

2.2 Zero Fill-in Incomplete LU Factorization Preconditioners (ILU(0))

LU factorization decompose a square matrix AA into a lower triangular matrix LL and an upper triangular matrix UU. For a typical sparse matrix, the LU factors can be much less sparse than the original matrix, which is called as fill-in. This increases the memory and computational requirements. In order to avoid this issue, PARCEL provides the zero fill-in incomplete LU factorization (ILU(0)).

2.3 Diagonal Incomplete LU Factorization Preconditioners (D-ILU)

The diagonal incomplete LU factorization (D-ILU) computes a diagonal matrix DD, a lower triangular matrix LL and an upper triangular matrix UU for the input matrix AA. A=LA+DA+UAA = L_{A} + D_{A} + U_{A} The preconditioner KK is constructed with LAL_{A}, UAU_{A} and DD (Dโ‰ DAD \neq D_{A}) with: K=(D+LA)Dโˆ’1(D+UA)K = \left( D + L_{A} \right)D^{- 1}\left( D + U_{A} \right) Two different conditions exist in order to construct the diagonal matrix DD:

Given the conditions above DD can be computed as follows:

2.4 Block Jacobi Preconditioner

The Block Jacobi Preconditioner constructs KK out of the diagonal blocks of the matrix AA. For each block, the incomplete ILU factorization is computed. When the preconditioner is applied, each block can be processed independently, resulting in a high level of parallelism. In addition, no communication is needed when each block is defined within a sub-matrix on each processor.

2.5 Additive Schwarz Preconditioner

The Additive Schwarz Preconditioner constructs KK out of the overlapping diagonal blocks of AA. For each block the incomplete LU factorization is computed. Compared to the Block Jacobi Preconditioner, the Additive Schwarz Preconditioner may require additional communication. PARCEL provides four different overlapping methods, BASIC, RESTRICT, INTERPOLATE and NONE.

Fig.1 Additive Schwarz Preconditioner with overlapping diagonal blocks

BASIC

Solve Ks=rKs=r with an extended rr including an overlapping region with the neighboring process, ss in the overlapping region is determined by summing up the results between the neighboring region.

Fig.2 BASIC

RESTRICT

ใ€€ Solve Ks=rKs=r with an extended rr including an overlapping region with the neighboring process, ss in the overlapping region is determined without taking account of the result in the neighboring region.

Fig.3 RESTRICT

INTERPOLATE

Solve Ks=rKs=r with an extended rr without an overlapping region with the neighboring process, ss in the overlapping region is determined by summing up the results between the neighboring regions.ใ€€

Fig.4 INTERPOLATE

NONE

Solve Ks=rKs=r with an extended rr without an overlapping region with the neighboring process, ss in the overlapping region is determined without taking account of the result in the neighboring regions.ใ€€

Fig.5 NONE

3 Sparse Matrix Data Formats

In order to save memory in sparse matrix computation, only the non-zero elements and their locations are stored. PARCEL supports different sparse matrix formats.

3.1 Compressed Row Storage (CRS) Format

ใ€€The CRS format compresses the row indices of the non-zero elements of the sparse matrix. Together with the compressed rows, the column index and the value for each non-zero element are stored in one dimensional arrays. Compresed row indeces are stored in an incremental order.

A=(abc00d00e0fg0hij) A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)


One process case

Values of non-zero elements :: crsA={a,b,c,d,e,f,g,h,i,j}crsA=\{a,b,c,d,e,f,g,h,i,j\}

Compressed row indices :: crsRow_ptr={1,4,5,8,11}crsRow\_ptr=\{1,4,5,8,11\}

Column indices of non-zero elements :: crsCol={1,2,3,2,1,3,4,2,3,4}crsCol=\{1,2,3,2,1,3,4,2,3,4\}


Two process case

Values of non-zero elements :: crsA={a,b,c,d}crsA=\{a,b,c,d\}

Compressed row indices :: crsRow_ptr={1,4,5}crsRow\_ptr=\{1,4,5\}

Column indices of non-zero elements :: crsCol={1,2,3,2}crsCol=\{1,2,3,2\}

Values of non-zero elements :: crsA={e,f,g,h,i,j}crsA=\{e,f,g,h,i,j\}

Compressed row indices :: crsRow_ptr={1,4,7}crsRow\_ptr=\{1,4,7\}

Column indices of non-zero elements :: crsCol={1,3,4,2,3,4}crsCol=\{1,3,4,2,3,4\}


3.2 Diagonal Format

ใ€€The Diagonal Format stores the non-zero elements of a matrix as diagonals. This format provides high performance for band block diagonal matrices. In addition to the element values, an offset value for each diagonal of the matrix needs to be stored inside an one dimensional array. A negative offset indicates that the diagonal is below the main diagonal of the matrix, a positive offset indicates that the diagonal is above the main diagonal. The main diagonal has the offset zero. Offset values are stored in an incremental order.

A=(abc00d00e0fg0hij) A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)


Single process case

Elements :: diaA={0,0,e,h,0,0,0,i,a,d,f,j,b,0,g,0,c,0,0,0}diaA=\{0,0,e,h,0,0,0,i,a,d,f,j,b,0,g,0,c,0,0,0\}

Offsets :: offset={โˆ’2,โˆ’1,0,1,2}offset=\{-2,-1,0,1,2\}

Number of diagonals :: nnd=5nnd=5


Two process case

Elements :: diaA={a,d,b,0,c,0}diaA=\{a,d,b,0,c,0\}

Offsets :: offset={0,1,2}offset=\{0,1,2\}

Number of diagonals :: nnd=3nnd=3

Elements :: diaA={e,h,0,i,f,j,g,0}diaA=\{e,h,0,i,f,j,g,0\}

Offsets :: offset={โˆ’2,โˆ’1,0,1}offset=\{-2,-1,0,1\}

Number of diagonals :: nnd=4nnd=4


4 The structure of PARCEL

ใ€€This chapter explains how to compile and install the PARCEL library.

4.1 Directory Structure

ใ€€After extracting the PARCEL library, the directory structure should be the same as in Fig.6.

Fig.6 The PARCEL directory structure

4.2 Compiling the PARCEL library

PARCEL requires a C compiler, a FORTRAN compiler, which supports preprocessor, MPI, LAPACK, and OpenMP. When compiling with gfortrangfortran, the option โˆ’cpp-cpp has to be specified to to activate preprocessor. Compiler options can be changed inside the file arch/make_config. An example of a make_config file is shown in Fig.7. The make command should be executed from the top directory of the extracted file. In order to re-compile, makecleanmake\ clean followed by makemake should be executed. If the makemake command was successful, each exampleexample directory should contain an executable file. 64-bit integer is needed when working with over roughly 2 billion unknowns. For example, FDEFINE has to be specified for a large scale test, example5, in this package.


Option ใ€€Explanation
CC ใ€€C compiler command name
FC ใ€€FORTRAN compiler command name
CFLAGS ใ€€C compiler options
FFLAGS ใ€€FORTRAN compiler options
FP_MODE ใ€€Option to suppress optimizations with arithmetic order change (Only for quad precision)
INCLUDEDIR ใ€€Include directories
LD_MPI ใ€€MPI library
LD_LAPACK ใ€€LAPACK library
FDEFINE ใ€€64bit integer becomes available with -DPARCEL_INT8.

Fig.7 Example of a make_config file

4.3 Usage of the PARCEL library

In order to call PARCEL library, programs need to be linked against the *.a*.a library files inside the srcsrc directory.

5 PARCEL routines (direct interface)

ใ€€The PARCEL routines with direct call interfaces are shown below.

5.1 parcel_dcg


call parcel_dcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.2 parcel_dcg_dia


call parcel_dcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.3 parcel_ddcg


call parcel_ddcg( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.4 parcel_ddcg_dia


call parcel_ddcg_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.5 parcel_dbicgstab


call parcel_dbicgstab( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are storedโ€‹in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.6 parcel_dbicgstab_dia


call parcel_dbicgstab_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.7 parcel_ddbicgstab


call parcel_ddbicgstab( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.8 parcel_ddbicgstab_dia


call parcel_ddbicgstab_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.9 parcel_dgmres


call parcel_dgmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

5.10 parcel_dgmres_dia


call parcel_dgmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

5.11 parcel_ddgmres


call parcel_ddgmres( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.12 parcel_ddgmres_dia


call parcel_ddgmres_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.13 parcel_dcbcg


call parcel_dcbcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )


The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CBCG_kstep integer in Number of communication avoiding steps
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax integer in Maximum number of iterations in the power method
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

5.14 parcel_dcbcg_dia


call parcel_dcbcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )


The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CBCG_kstep integer in Number of communication avoiding steps
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax integer in Maximum number of iterations in the power method
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

5.15 parcel_dcagmres


call parcel_dcagmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )


Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element valuesโ€‹in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CAGMRES_sstep integer in Number of communication avoiding steps s in the CA-GMRES method
CAGMRES_tstep integer in Number of outer iterations t in the CA-GMRES method (restart length = st)
CAGMRES_basis integer in Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)
CAGMRES_QRflag integer in Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

5.16 parcel_dcagmres_dia


call parcel_dcagmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )


Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CAGMRES_sstep integer in Small communication step of CA-GMRES method
CAGMRES_tstep integer in List step of the CA-GMRES method
CAGMRES_basis integer in Base generation flag of CA-GMRES method (0: single basis, 1: newton basis)
CAGMRES_QRflag integer in QR Decomposition flag of CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

6 PARCEL routines (indirect interface)

ใ€€The PARCEL routines with indirect call interfaces are shown below.

6.1 PARCEL_KSP_Default_Setting


call PARCEL_KSP_Default_Setting(method)


Apply initial settings to Struct in the PARCEL format.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format

6.2 set_KSP_CRS


call set_KSP_CRS(icomm,vecx,vecb,n,gn,nnz,istart,crsA,crsRow_ptr,crsCol,itrmax,ipreflag,ILU_method,addL,method)


ใ€€Set matrix in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
itrmax integer in in: maximum number of iterations, out: number of iterations
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
method type(KSP) in/out Struct in the PARCEL format

6.3 set_KSP_DIA


call set_KSP_DIA(icomm,vecx,vecb,n,gn,istart,diaA,offset,nnd,itrmax,ipreflag,ILU_method,addL,method)


Set matrix in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
itrmax integer in in: maximum number of iterations, out: number of iterations
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match),Only 1 and 2 are available for additive Schwartz)
addL integer in Overlapping width in the additive Schwartz method
method type(KSP) in/out Struct in the PARCEL format

6.4 set_vecx_KSP


ใ€€call set_vecx_KSP(n,x,method)


Set initial vector.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
x(n) double precision in Initial vector
method type(KSP) in/out Struct in the PARCEL format

6.5 set_vecb_KSP


ใ€€call set_vecb_KSP(n,b,method)


Set right hand side vector b.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
b(n) double precision in Right hand side vector
method type(KSP) in/out Struct in the PARCEL format

6.6 get_vecx_KSP


ใ€€call get_vecx_KSP(n,x,method)


Get solution vector.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
x(n) double precision out Solution vector
method type(KSP) in Struct in the PARCEL format

6.7 get_reshistory_KSP


call get_reshistory_KSP(niter,reshistory, method)


Get convergence history.

Argument name (dimension) Type Input/Output Description
niter integer out Number of iterations until convergence
reshistory(method%itrmax) double precision out Convergence history
method type(KSP) in Struct in the PARCEL format

6.8 reset_CRS_Mat


call reset_CRS_Mat(itrmax,ipreflag,ILU_method,addL,crsA,method)


ใ€€Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the CRS format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension) Type Input/Output Description
itrmax integer in Maximum number of iterations (-1: keep the existing setting)
ipreflag integer in Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL integer in Overlapping width in the additive Schwartz method (-1: keep the existing setting)
crsA(method%nnz) double precision in Non-zero elements of matrix stored in the CRS format
method type(KSP) in/out Struct in the PARCEL format

6.9 reset_DIA_Mat


call reset_DIA_Mat(itrmax,ipreflag,ILU_method,addL,diaMat,method)


ใ€€Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the Diagonal format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension) Type Input/Output Description
itrmax integer in Maximum number of iterations (-1: keep the existing setting)
ipreflag integer in Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL integer in Overlapping width in the additive Schwartz method (-1: keep the existing setting)
diaMat(method%n*method%nnd) double precision in Non-zero elements of matrix stored in the CRS format
method type(KSP) in/out Struct in the PARCEL format

6.10 set_KSP_solver


call set_KSP_solver(method,solver)


Set Krylov subspace method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
solver integer in Flag for Krylov subspace method (1: CG, 2: BiCGGSTAB, 3: GMRES, 4: CA-GMRES, 5: CBCG)

6.11 set_KSP_abstolmax(method,abstolmax)


call set_KSP_abstolmax(method,abstolmax)


Set maximun tolerance in absolute value of residual error.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
abstolmax double precision in Convergence criterion (norm of residual error)

6.12 set_KSP_rtolmax


call set_KSP_rtolmax(method,rtolmax)


Set maximum tolerance in relative error.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
rtolmax double precision in Convergence criterion (norm of relative residual error)

6.13 set_KSP_iovlflag


call set_KSP_iovlflag(method,iovlflag)


Set communication-computation overlap method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)

6.14 set_KSP_GMRES_m(method,GMRES_m)


call set_KSP_GMRES_m(method,GMRES_m)


Set the number of iterations m until restart in the GMRES(m) method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
GMRES_m integer in Number of iterations until restart

6.15 set_KSP_GMRES_GSflag


call set_KSP_GMRES_GSflag(method,GMRES_GSflag)


Set orthogonalization algorithm in the GMRES(m) method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

6.16 set_KSP_CAGMRES_sstep


call set_KSP_CAGMRES_sstep(method,CAGMRES_sstep)


Set number of communication avoiding steps s of CA-GMRES.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_sstep integer in Number of communication steps of the CA-GMRES method

6.17 set_KSP_CAGMRES_tstep


call set_KSP_CAGMRES_tstep(method,CAGMRES_tstep)


Set number of outer iterations t of CA-GMRES.)

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_tstep integer in Number of outer iterations t in the CA-GMRES method method (restart length = st)

6.18 set_KSP_CAGMRES_basis


call set_KSP_CAGMRES_basis(method,CAGMRES_basis)


Set basis vector in the CA-GMRES method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_basis integer in Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)

6.19 set_KSP_CAGMRES_QRflag


call set_KSP_CAGMRES_QRflag(method,CAGMRES_QRflag)


Set QR factorization algorithm in the CA-GMRES method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_QRflag integer in Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

6.20 set_KSP_CBCG_kstep


call set_KSP_CBCG_kstep(method,CBCG_kstep)


Set number of communication avoiding steps in the CBCG method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CBCG_kstep integer in Number of communication avoiding steps

6.21 set_KSP_CBCG_Eigenflag


call set_KSP_CBCG_Eigenflag(method,CBCG_Eigenflag)


Set eigenvalue computation algorithm in the CBCG method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)

6.22 set_KSP_CAARNOLDI_sstep


call set_KSP_CAARNOLDI_sstep(method,CAARNOLDI_sstep)


Set number of communication avoiding steps s in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method

6.23 set_KSP_CAARNOLDI_tstep


call set_KSP_CAARNOLDI_tstep(method,CAARNOLDI_tstep)


Set number of outer iterations t in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)

6.24 set_KSP_CAARNOLDI_basis


call set_KSP_CAARNOLDI_basis(method,CAARNOLDI_basis)


Set basis vector in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)

6.25 set_KSP_CAARNOLDI_QRflag


call set_KSP_CAARNOLDI_QRflag(method,CAARNOLDI_QRflag)


Set QR factorization algorithm in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

6.26 set_KSP_power_method_itrmax


call set_KSP_power_method_itrmax(method,power_method_itrmax)


Set maximum number of iterations in the power method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
power_method_itrmax integer in Maximum number of iterations in the power method

6.27 START_KSP_SOLVER


call START_KSP_SOLVER(method)


Solve a simultaneous linear equation system Ax = b.

Argument name (dimension) Type Input/Output Description
method type(KSP) in/out Struct in the PARCEL format

7 How to use

ใ€€The use of the PARCEL routines is explained by FORTRAN sample programs of a CG solver. Details of the usage of the PARCEL in C programs can be found in C sample programs stored in SPARSE/example_C in the archive. The PARCEL can be called also from C programs in the same manner, provided that a MPI communicator is prepared for FORTRAN routines. In the C sample programs, a MPI communicator created in C is converted to FORTRAN via MPI_Comm_c2f in the MPI library. In the C sample program for the indirect interface, a struct in the PARCEL format is generated in FORTRAN, and then, it is used in C programs.

7.1 Compressed Row Storage (CRS) Format

ใ€€Sample codes in the CRS format are shown in Fig.8 and Fig.9. Here, make_matrix_CRS denotes an arbitrary routine, which generates a matrix in the CRS format in PARCEL. Fig.8 and Fig.9 show examples with the direct and indirect interfaces, respectively. In Fig.9, a compiling option to include SPARSE/src is needed. When only values of matrix elements in AA, elements in bb, and the initial values in xx are changed with the same matrix structure or the non-zero element positions, in Fig.8, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.


ย  program main
ย    use mpi
ย    implicit none
ย 
ย    integer n,gn,nnz,istart
ย    real*8,allocatable  :: crsA(:)
ย    integer,allocatable :: crsRow_ptr(:),crsCol(:)
ย    real*8,allocatable  :: vecx(:)
ย    real*8,allocatable  :: vecb(:)
ย    integer itrmax
ย    real*8  rtolmax
ย    real*8, allocatable :: reshistory(:)
ย    integer ipreflag
ย    integer ILU_method
ย    integer iflagAS
ย    integer addL
ย    integer iovlflag
ย    integer iret
ย    integer ierr
ย 
ย    call MPI_Init(ierr)
ย 
ย    call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)
ย 
ย    allocate(vecx(n))
ย    allocate(vecb(n))
ย    allocate(reshistory(itrmax))
ย    ipreflag=0
ย    ILU_method=1
ย    addL=0
ย    iflagAS=1
ย    itrmax=100
ย    rtolmax=1.0d-8
ย    iovlflag=0
ย 
ย    vecb=1.0d0
ย    vecx=1.0d0
ย    call parcel_dcg( &
ย         MPI_COMM_WORLD, &
ย         vecx,vecb, &
ย         n,gn,nnz,istart, &
ย         crsA,crsRow_ptr,crsCol, &
ย         ipreflag,ILU_method,addL,iflagAS, &
ย         itrmax,rtolmax, &
ย         reshistory, &
ย         iovlflag,iret &
ย         )
ย    call MPI_Finalize(ierr)
ย 
ย    deallocate(vecx)
ย    deallocate(vecb)
ย 
ย    deallocate(crsA)
ย    deallocate(crsRow_ptr)
ย    deallocate(crsCol)
ย 
ย    deallocate(reshistory)
ย 
ย  end program main


Fig.8 Sample code in the CRS format (direct interface)

ย  program main
ย    use mpi
ย    use krylov
ย    use krylov_kernel
ย    implicit none
ย 
ย    integer n,gn,nnz,istart
ย    real*8,allocatable  :: crsA(:)
ย    integer,allocatable :: crsRow_ptr(:),crsCol(:)
ย    real*8,allocatable  :: vecx(:)
ย    real*8,allocatable  :: vecb(:)
ย    integer itrmax
ย    real*8  rtolmax
ย    real*8, allocatable :: reshistory(:)
ย    integer ipreflag
ย    integer ILU_method
ย    integer iflagAS
ย    integer addL
ย    integer iovlflag
ย    integer iret
ย    integer ierr
ย    type(KSP) :: method
ย 
ย    call MPI_Init(ierr)
ย 
ย    call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)
ย 
ย    allocate(vecx(n))
ย    allocate(vecb(n))
ย    allocate(reshistory(itrmax))
ย 
ย    ipreflag=0
ย    ILU_method=1
ย    addL=0
ย    iflagAS=1
ย    itrmax=100
ย    rtolmax=1.0d-8
ย    iovlflag=0
ย    vecb=1.0d0
ย    vecx=1.0d0
ย 
ย    call PARCEL_KSP_Default_Setting(method)
ย    call set_KSP_CRS(&
ย         MPI_COMM_WORLD, &
ย         vecx,vecb, &
ย         n,gn,nnz,istart, &
ย         crsA,crsRow_ptr,crsCol, &
ย         itrmax,ipreflag,ILU_method,addL,&
ย         method)
ย    call set_KSP_solver(method,1)
ย    call START_KSP_SOLVER(method)
ย    call get_vecx_KSP(n,vecx,method)
ย    call free_KSP(method)
ย 
ย    call MPI_Finalize(ierr)
ย 
ย    deallocate(vecx)
ย    deallocate(vecb)
ย 
ย    deallocate(reshistory)
ย 
ย  end program main


Fig.9 Sample code in the CRS format (indirect interface)

7.2 Diagonal Format

ใ€€Sample codes in the Diagonal format are shown in Fig.10 and Fig.11. Here, make_matrix_DIA denotes an arbitrary routine, which generates a matrix in the Diagonal format in PARCEL. Fig.10 and Fig.11 show examples with the direct and indirect interfaces, respectively. In Fig.11, a compiling option to include SPARSE/src is needed. When only values of matrix elements in AA, elements in bb, and the initial values in xx are changed with the same matrix structure or the non-zero element positions, in Fig.10, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.


ย  program main
ย    use mpi
ย    implicit none
ย 
ย    integer n,gn,nnd,istart
ย    real*8,allocatable  :: diaA(:)
ย    integer,allocatable :: offset(:)
ย    real*8,allocatable  :: vecx(:)
ย    real*8,allocatable  :: vecb(:)
ย    integer itrmax
ย    real*8  rtolmax
ย    real*8, allocatable :: reshistory(:)
ย    integer ipreflag
ย    integer ILU_method
ย    integer iflagAS
ย    integer addL
ย    integer iovlflag
ย    integer iret
ย    integer ierr
ย 
ย    call MPI_Init(ierr)
ย 
ย    call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)
ย 
ย    allocate(vecx(n))
ย    allocate(vecb(n))
ย    allocate(reshistory(itrmax))
ย 
ย    ipreflag=0
ย    ILU_method=1
ย    addL=0
ย    iflagAS=1
ย    itrmax=100
ย    rtolmax=1.0d-8
ย    iovlflag=0
ย    vecb=1.0d0
ย    vecx=1.0d0
ย 
ย    call parcel_dcg_dia(&
ย         MPI_COMM_WORLD, &
ย         vecx,vecb,&
ย         n,gn,istart,&
ย         diaA,offset,nnd,&
ย         ipreflag,ILU_method,addL,iflagAS,&
ย         itrmax,rtolmax,&
ย         reshistory,&
ย         iovlflag,iret&
ย         )
ย 
ย    call MPI_Finalize(ierr)
ย 
ย    deallocate(vecx)
ย    deallocate(vecb)
ย 
ย    deallocate(diaA)
ย    deallocate(offset)
ย 
ย    deallocate(reshistory)
ย 
ย  end program main


Fig.10 Sample code in the Diagonal format (direct interface)

ย  program main
ย    use mpi
ย    use krylov
ย    use krylov_kernel
ย    implicit none
ย 
ย    integer n,gn,nnd,istart
ย    real*8,allocatable  :: diaA(:)
ย    integer,allocatable :: offset(:)
ย    real*8,allocatable  :: vecx(:)
ย    real*8,allocatable  :: vecb(:)
ย    integer itrmax
ย    real*8  rtolmax
ย    real*8, allocatable :: reshistory(:)
ย    integer ipreflag
ย    integer ILU_method
ย    integer iflagAS
ย    integer addL
ย    integer iovlflag
ย    integer iret
ย    integer ierr
ย 
ย    call MPI_Init(ierr)
ย 
ย    call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)
ย 
ย    allocate(vecx(n))
ย    allocate(vecb(n))
ย    allocate(reshistory(itrmax))
ย 
ย    ipreflag=0
ย    ILU_method=1
ย    addL=0
ย    iflagAS=1
ย    itrmax=100
ย    rtolmax=1.0d-8
ย    iovlflag=0
ย    vecb=1.0d0
ย    vecx=1.0d0
ย 
ย    call PARCEL_KSP_Default_Setting(method)
ย    call set_KSP_DIA(&
ย         MPI_COMM_WORLD, &
ย         vecx,vecb, &
ย         n,gn,istart, &
ย         diaA,offset,nnd,&
ย         itrmax,ipreflag,ILU_method,addL,&
ย         method)
ย    call set_KSP_solver(method,1)
ย    call START_KSP_SOLVER(method)
ย    call get_vecx_KSP(n,vecx,method)
ย    call free_KSP(method)
ย 
ย    call MPI_Finalize(ierr)
ย 
ย    deallocate(vecx)
ย    deallocate(vecb)
ย 
ย    deallocate(reshistory)
ย 
ย  end program main


Fig.11 Sample code in the Diagonal format (indirect interface)