1 Krylov subspace methods

This chapter describes the iterative methods contained in this library. The following notation is used, Ax=bAx = b represents the linear system which will be solved, with AA being an n×\timesn sparse matrix and x0x_0 representing the initial vector. KK represents an appropriate preconditioner.

1.1 The Preconditioned Conjugate Gradient Method

The Conjugate Gradient (CG) method is an iterative Krylov subspace algorithm for solving symmetric matrices.

  1. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫0=𝐛𝐀𝐱0𝐟𝐨𝐫𝐬𝐨𝐦𝐞𝐢𝐧𝐢𝐭𝐢𝐚𝐥𝐠𝐮𝐞𝐬𝐬𝐱1=𝐱0\mathbf{{Compute\ }}\mathbf{r}^{\mathbf{0}}\mathbf{= b - A}\mathbf{x}^{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}^{\mathbf{- 1}}\mathbf{=}\mathbf{x}^{\mathbf{0}}\mathbf{\ }
  2. 𝐩1=0\mathbf{p}^{\mathbf{- 1}}\mathbf{= 0}
  3. 𝛂1=0\mathbf{\alpha}_{\mathbf{- 1}}\mathbf{= 0\ }\mathbf{\ }
  4. 𝛃1=0\mathbf{\beta}_{\mathbf{- 1}}\mathbf{= 0}
  5. 𝐬𝐨𝐥𝐯𝐞𝐬𝐟𝐫𝐨𝐦𝐊𝐬=𝐫0\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{0}}
  6. 𝛒0=𝐬,𝐫0\mathbf{\rho}_{\mathbf{0}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{0}} \right\rangle
  7. 𝐟𝐨𝐫𝐢=0,1,2,3\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}
  8.   𝐩𝐢=𝐬+𝛃𝐢1𝐩𝐢1\mathbf{p}^{\mathbf{i}}\mathbf{= s +}\mathbf{\beta}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  9.   𝐪𝐢=𝐀𝐩𝐢\mathbf{q}^{\mathbf{i}}\mathbf{= A}\mathbf{p}^{\mathbf{i}}
  10.   𝛄=𝐩𝐢,𝐪𝐢\mathbf{\gamma =}\left\langle \mathbf{p}^{\mathbf{i}}\mathbf{,}\mathbf{q}^{\mathbf{i}} \right\rangle
  11.   𝐱𝐢=𝐱𝐢1+𝛂𝐢1𝐩𝐢1\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  12.   𝛂𝐢=𝛒𝐢/𝛄\mathbf{\alpha}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i}}\mathbf{\ /\ \gamma}
  13.   𝐫𝐢+1=𝐫𝐢𝛂𝐢𝐪𝐢\mathbf{r}^{\mathbf{i + 1}}\mathbf{=}\mathbf{r}^{\mathbf{i}}\mathbf{-}\mathbf{\alpha}_{\mathbf{i}}\mathbf{q}^{\mathbf{i}}
  14.   𝐬𝐨𝐥𝐯𝐞𝐬𝐟𝐫𝐨𝐦𝐊𝐬=𝐫𝐢+1\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{i + 1}}
  15.   𝛒𝐢+1=𝐬,𝐫𝐢+1\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{i + 1}} \right\rangle
  16.   𝐢𝐟𝐫𝐢+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{r}^{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ the}}\mathbf{n}
  17.     𝐱𝐢=𝐱𝐢1+𝛂𝐢1𝐩𝐢1\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  18.     𝐪𝐮𝐢𝐭\mathbf{{qui}}\mathbf{t}
  19.   𝐞𝐧𝐝𝐢𝐟\mathbf{{endi}}\mathbf{f}
  20.   𝛃𝐢=𝛒𝐢+1/𝛒𝐢\mathbf{\beta}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{\ /\ }\mathbf{\rho}_{\mathbf{i}}
  21. 𝐞𝐧𝐝\mathbf{{\ en}}\mathbf{d}

1.2 The Preconditioned Biconjugate Gradient Stabilized Method

The Biconjugate Gradient Stabilized (Bi-CGSTAB) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices.

  1. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫0=𝐛𝐀𝐱0𝐟𝐨𝐫𝐬𝐨𝐦𝐞𝐢𝐧𝐢𝐭𝐢𝐚𝐥𝐠𝐮𝐞𝐬𝐬𝐱1=𝐱0\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}
  2. 𝐩0=𝐫0\mathbf{p}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}
  3. 𝐜1=𝐫0,𝐫0\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}} \right\rangle
  4. 𝐟𝐨𝐫𝐢=0,1,2,3\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}
  5.   𝐬𝐨𝐥𝐯𝐞𝐩̂𝐟𝐫𝐨𝐦𝐊𝐩̂=𝐩𝐢\mathbf{{solve\ }}\widehat{\mathbf{p}}\mathbf{{\ from\ K}}\widehat{\mathbf{p}}\mathbf{= \ }\mathbf{p}_{\mathbf{i}}
  6.   𝐪=𝐀𝐩̂\mathbf{q = A}\widehat{\mathbf{p}}
  7.   𝐜2=𝐫0,𝐪\mathbf{c}_{\mathbf{2}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,q} \right\rangle
  8.   𝛂=𝐜1/𝐜2\mathbf{\alpha =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\mathbf{c}_{\mathbf{2}}
  9.   𝐞=𝐫𝐢𝛂𝐪\mathbf{e =}\mathbf{r}_{\mathbf{i}}\mathbf{- \alpha q}
  10.   𝐬𝐨𝐥𝐯𝐞𝐞̂𝐟𝐫𝐨𝐦𝐊𝐞̂=𝐞\mathbf{{solve\ }}\widehat{\mathbf{e}}\mathbf{{\ from\ K}}\widehat{\mathbf{e}}\mathbf{= e}
  11.   𝐯=𝐀𝐞̂\mathbf{v = A}\widehat{\mathbf{e}}
  12.   𝐜3=𝐞,𝐯/𝐯,𝐯\mathbf{c}_{\mathbf{3}}\mathbf{=}\left\langle \mathbf{e,v} \right\rangle\mathbf{\ /\ }\left\langle \mathbf{v,v} \right\rangle
  13.   𝐱𝐢+1=𝐱𝐢+𝛂𝐩̂+𝐜3𝐞̂\mathbf{x}_{\mathbf{i + 1}}\mathbf{=}\mathbf{x}_{\mathbf{i}}\mathbf{+ \alpha}\widehat{\mathbf{p}}\mathbf{+}\mathbf{c}_{\mathbf{3}}\widehat{\mathbf{e}}
  14.   𝐫𝐢+1=𝐞𝐜3𝐯\mathbf{r}_{\mathbf{i + 1}}\mathbf{= e -}\mathbf{c}_{\mathbf{3}}\mathbf{v}
  15.   𝐜1=𝐫0,𝐫𝐢+1\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{i + 1}} \right\rangle
  16.   𝐢𝐟𝐫𝐢+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{r}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{the}}\mathbf{n}
  17.     𝐪𝐮𝐢𝐭\mathbf{{quit}}
  18.   𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  19.   𝛃=𝐜1/(𝐜2𝐜3)\mathbf{\beta =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\left( \mathbf{c}_{\mathbf{2}}\mathbf{c}_{\mathbf{3}} \right)
  20.   𝐩𝐢+1=𝐫𝐢+1+𝛃(𝐩𝐢𝐜3𝐪)\mathbf{p}_{\mathbf{i + 1}}\mathbf{=}\mathbf{r}_{\mathbf{i + 1}}\mathbf{+ \beta}\left( \mathbf{p}_{\mathbf{i}}\mathbf{-}\mathbf{c}_{\mathbf{3}}\mathbf{q} \right)
  21. 𝐞𝐧𝐝\mathbf{{end}}

1.3 The Preconditioned Generalized Minimum Residual Method

The Generalized Minimum Residual (GMRES(m)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. GMRES(m) is restarted periodically after mm iterations if it did not converge within mm iterations, the solution obtained until mm iterations will be used as an input for the new restart cycle. PARCEL provides variants of GMRES with the Classical Gram-Schmidt (sometimes referred to as standard Gram-Schmidt) and Modified Gram-Schmidt (MGS) orthogonalization methods.

𝐇𝐧\mathbf{H}_{\mathbf{n}} is an upper triangular matrix with 𝐡𝐣,𝐤\mathbf{h}_{\mathbf{j,k}} representing its elements, 𝐞𝐢\mathbf{\ }\mathbf{e}_{\mathbf{i}} represents a vector consisting of the first i elements of the vector 𝐞\mathbf{e}.

  1. 𝐟𝐨𝐫𝐣=0,1,2,3\mathbf{for\ j = 0,1,2,3}\mathbf{\ldots}
  2.   𝐫=𝐛𝐀𝐱0\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}
  3.   𝐯1=𝐫/𝐫\mathbf{v}_{\mathbf{1}}\mathbf{= - r\ /\ }\left\| \mathbf{r} \right\|
  4.   𝐞=(𝐫,0,,0)𝐓\mathbf{e =}\left( \mathbf{-}\left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}
  5.   𝐧=𝐦\mathbf{n = m}
  6.   𝐟𝐨𝐫𝐢=1,2,3𝐦\mathbf{for\ i = 1,2,3}\mathbf{\ldots}\mathbf{m}
  7.      𝐬𝐨𝐥𝐯𝐞𝐯̂𝐢𝐟𝐫𝐨𝐦𝐊𝐯̂𝐢=𝐯𝐢\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{=}\mathbf{v}_{\mathbf{i}}
  8.      𝛚=𝐀𝐯̂𝐢\mathbf{\omega = A}{\widehat{\mathbf{v}}}_{\mathbf{i}}
  9.      𝐟𝐨𝐫𝐤=1,2,3𝐢\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i}
  10.       𝐡𝐤,𝐢=𝛚,𝐯𝐤\mathbf{h}_{\mathbf{k,i}}\mathbf{=}\left\langle \mathbf{\omega,}\mathbf{v}_{\mathbf{k}} \right\rangle
  11.       𝛚=𝛚𝐡𝐤,𝐢𝐯𝐤\mathbf{\omega = \omega -}\mathbf{h}_{\mathbf{k,i}}\mathbf{v}_{\mathbf{k}}
  12.     𝐞𝐧𝐝\mathbf{{end}}
  13.     𝐡𝐢+1,𝐢=𝛚\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{=}\left\| \mathbf{\omega} \right\|
  14.     𝐯𝐢+1=𝛚/𝛚\mathbf{v}_{\mathbf{i + 1}}\mathbf{= \omega/}\left\| \mathbf{\omega} \right\|
  15.     𝐟𝐨𝐫𝐤=1,2,3𝐢1\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i - 1}
  16.       (𝐡𝐤,𝐢𝐡𝐤+1,𝐢)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝐡𝐤,𝐢𝐡𝐤+1,𝐢)\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}
  17.     𝐞𝐧𝐝\mathbf{{end}}
  18.     𝐜𝐢=11+(𝐡𝐢+1,𝐢𝐡𝐢,𝐢)2\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  19.     𝐬𝐢=𝐡𝐢+1,𝐢𝐡𝐢,𝐢11+(𝐡𝐢+1,𝐢𝐡𝐢,𝐢)2\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  20.     (𝐞𝐢𝐞𝐢+1)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝐞𝐢𝐞𝐢+1)\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}
  21.     𝐡𝐢,𝐢=𝐜𝐢𝐡𝐢,𝐢𝐬𝐢𝐡𝐢+1,𝐢\mathbf{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathbf{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathbf{h}_{\mathbf{i + 1,i}}
  22.     𝐡𝐢+1,𝐢=0\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{= 0}
  23.     𝐢𝐟𝐞𝐢+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{e}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}
  24.       𝐧=𝐢\mathbf{n = i}
  25.       𝐞𝐱𝐢𝐭\mathbf{{exit}}
  26.     𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  27.   𝐞𝐧𝐝\mathbf{{end}}
  28.   𝐲𝐧=𝐇𝐧1𝐞𝐧\mathbf{y}_{\mathbf{n}}\mathbf{=}\mathbf{H}_{\mathbf{n}}^{\mathbf{- 1}}\mathbf{e}_{\mathbf{n}}
  29.   𝐬𝐨𝐥𝐯𝐞𝐱̂𝐟𝐫𝐨𝐦𝐊𝐱̂=𝐤=1𝐧𝐲𝐤𝐯𝐤\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 1}}^{\mathbf{n}}{\mathbf{y}_{\mathbf{k}}\mathbf{v}_{\mathbf{k}}}
  30.   𝐱𝐧=𝐱0+𝐱̂\mathbf{x}_{\mathbf{n}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}
  31.   𝐱0=𝐱𝐧\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{n}}
  32. 𝐞𝐧𝐝\mathbf{{end}}

1.4 The Preconditioned Communication Avoiding Generalized Minimum Residual Method

The Communication Avoiding Generalized Minimum Residual (CA-GMRES(s,t)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. The calculation which is done in ss iterations in the GMRES(m) method is equivalent to one iteration of the CA-GMRES method. CA-GMRES(s,t) is restarted periodically after tt iterations if it did not converge within tt iterations, the solution obtained until mm iterations will be used as an input for the new restart cycle. CA-GMRES(s,t) is equivalent to GMRES(m) if ss and tt are chosen so that m=t×sm = t \times s. The convergence property of CA-GMRES(s,t) is the same as GMRES(m) in exact arithmetic. In addition, the number of global collective communication calls is reduced by communication avoiding QR factorization. As CA-GMRES(s,t) produces ss basis vectors at once, when ss is too large, the linear independency of the basic vectors may become worse because of round off errors, leading to worse convergence property than GMRES(m). In order to improve the orthogonality of the basic vectors, the PARCEL implementation of CA-GMRES(s,t) provides an option to use the Newton basis in addition to the monomial basis.

𝐞\mathbf{e} represents a unit vector, 𝛒̃𝐤=𝐑𝐤{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}} represents the ss-th row and ss-th column element of the matrix 𝐑𝐤\mathbf{R}_{\mathbf{k}}, and 𝐄\mathbf{E} represents the eigenvalues obtained from the Hessenberg matrix, which is generated by the iteration process.

  1. 𝐁𝐤=[𝐞2,𝐞3,,𝐞𝐬+1]\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{= \lbrack}\mathbf{e}_{\mathbf{2}}\mathbf{,}\mathbf{e}_{\mathbf{3}}\mathbf{,\ldots,}\mathbf{e}_{\mathbf{s + 1}}\mathbf{\rbrack}
  2. 𝐟𝐨𝐫𝐣=0,1,2,3\mathbf{for\ j = 0,1,2,3\ldots}
  3.   𝐫=𝐛𝐀𝐱0\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}
  4.   𝐯1=𝐫/𝐫\mathbf{v}_{\mathbf{1}}\mathbf{= r\ /\ }\left\| \mathbf{r} \right\|
  5.   𝛇=(𝐫,0,,0)𝐓\mathbf{\zeta =}\left( \left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}
  6.   𝐟𝐨𝐫𝐤=0,1,𝐭1\mathbf{for\ k = 0,1\ldots,t - 1}
  7.     𝐅𝐢𝐱𝐛𝐚𝐬𝐢𝐬𝐜𝐨𝐧𝐯𝐞𝐫𝐬𝐢𝐨𝐧𝐦𝐚𝐭𝐫𝐢𝐱[𝐁𝐤,𝐄]\mathbf{Fix\ basis\ conversion\ matrix\ \lbrack}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,E\rbrack}
  8.     𝐂𝐨𝐦𝐮𝐩𝐮𝐭𝐞𝐛𝐚𝐬𝐢𝐜𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐕́𝐤,𝐁𝐤]\mathbf{Comupute\ \ basic\ vector\ \ \lbrack\ s,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{\ \rbrack}
  9.     𝐢𝐟(𝐤.𝐞𝐪.0)\mathbf{if(k.eq.0)}
  10.       𝐐𝐑𝐝𝐞𝐜𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧𝐕0=𝐐0𝐑0\mathbf{{QR\ decomposition\ }}\mathbf{V}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}
  11.       𝔔0=𝐐0\mathfrak{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}
  12.       0=𝐑0𝐁0𝐑01\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{- 1}}
  13.       0=0\mathcal{H}_{\mathbf{0}}\mathbf{=}\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}
  14.       𝐟𝐨𝐫𝐨=1,2,𝐬\mathbf{for\ o = 1,2\ldots,s}
  15.       𝐆𝐢𝐯𝐞𝐧𝐬𝐫𝐨𝐭𝐚𝐭𝐢𝐨𝐧[𝐨,0,𝛇]\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta\rbrack}
  16.       𝐢𝐟𝛇𝐨+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}
  17.         𝐮𝐩𝐝𝐚𝐭𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐤,𝐨,𝐕́0,𝐐́0,𝐑0,0,𝛇,𝐱0]\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{Q}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}\mathbf{R}_{\mathbf{0}}\mathbf{,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  18.         𝐪𝐮𝐢𝐭\mathbf{{quit}}
  19.       𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  20.     𝐞𝐥𝐬𝐞\mathbf{{else}}
  21.       ́𝐤1,𝐤=(𝔔𝐤1)𝐇𝐕́𝐤{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{=}\left( \mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}} \right)^{\mathbf{H}}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}
  22.       𝐕́𝐤=𝐕́𝐤𝔔𝐤1́𝐤1,𝐤{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{-}\mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}}{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}
  23.       𝐐𝐑𝐝𝐞𝐜𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧𝐕́𝐤=𝐐́𝐤𝐑́𝐤\mathbf{{QR\ decomposition\ }}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{Q}}}_{\mathbf{k}}^{\mathbf{\sim}}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}
  24.       𝐑𝐤=(𝐑𝐤𝐳𝐤01,𝐤𝛒)\mathbf{R}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathbf{R}_{\mathbf{k}} & \mathbf{z}_{\mathbf{k}} \\ \mathbf{0}_{\mathbf{1,k}} & \mathbf{\rho} \\ \end{pmatrix}
  25.       𝐤1,𝐤=𝐤1𝐤1,𝐤𝐑𝐤1+𝐤1,𝐤𝐁𝐤𝐑𝐤1\mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{= -}\mathfrak{H}_{\mathbf{k - 1}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}\mathfrak{R}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}
  26.       𝐇𝐤=𝐑𝐤𝐁𝐤𝐑𝐤1+𝛒̃𝐤1𝐛𝐤𝐳𝐤𝐞𝐬𝐓𝐡𝐤1𝐞1𝐞𝐬(𝐤1)𝐓𝐤1,𝐤𝐑𝐤1\mathbf{H}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}\mathbf{B}_{\mathbf{k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{b}_{\mathbf{k}}\mathbf{z}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}}\mathbf{-}\mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}
  27.       𝐡𝐤=𝛒̃𝐤1𝛒𝐤𝐛𝐤\mathbf{h}_{\mathbf{k}}\mathbf{=}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{\rho}_{\mathbf{k}}\mathbf{b}_{\mathbf{k}}
  28.       𝐤=(𝐤1𝐤1,𝐤𝐡𝐤1𝐞1𝐞𝐬(𝐤1)𝐓01,𝐬(𝐤1)𝐇𝐤𝐡𝐤𝐞𝐬𝐓)\mathfrak{H}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1}} & \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \begin{matrix} \mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}} \\ \mathbf{0}_{\mathbf{1,s(k - 1)}} \\ \end{matrix} & \begin{matrix} \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{matrix} \\ \end{pmatrix}
  29.       𝐤=(𝐤1,𝐤𝐇𝐤𝐡𝐤𝐞𝐬𝐓)\mathcal{H}_{\mathbf{k}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{pmatrix}
  30.       𝐟𝐨𝐫𝐨=1+𝐬𝐤,,𝐬(𝐤+1)\mathbf{for\ o = 1 + sk,\ldots,s(k + 1)}
  31.         𝐆𝐢𝐯𝐞𝐧𝐬𝐫𝐨𝐭𝐚𝐭𝐢𝐨𝐧[𝐨,𝐤,𝛇]\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta\rbrack}
  32.         𝐢𝐟𝛇𝐨+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}
  33.           𝐮𝐩𝐝𝐚𝐭𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐤,𝐨,𝐕́𝐤,𝔔𝐤,𝐑́𝐤,𝐤,𝛇,𝐱0]\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  34.           𝐪𝐮𝐢𝐭\mathbf{{quit}}
  35.         𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  36.       𝐞𝐧𝐝\mathbf{{end}}
  37.     𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  38.   𝐞𝐧𝐝\mathbf{{end}}
  39.   𝐮𝐩𝐝𝐚𝐭𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐭1,𝐬𝐭,𝐕́𝐭1,𝔔𝐭1,𝐑́𝐭1,𝐭1,𝛇,𝐱0]\mathbf{update\ \ solution\ \ vector\lbrack s,t - 1,st,}{\acute{\mathbf{V}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{t - 1}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  40. 𝐞𝐧𝐝\mathbf{{end}}

  1. 𝐢𝐟𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐍𝐞𝐰𝐭𝐨𝐧𝐁𝐚𝐬𝐢𝐬\mathbf{{if\ compute\ Newton\ Basis}}
  2.   𝐢=0\mathbf{i = 0}
  3.   𝐰𝐡𝐢𝐥𝐞(𝐢𝐬1)\mathbf{{while}}\left( \mathbf{i \leq s - 1} \right)
  4.     𝐢𝐟(𝐢.𝐞𝐪.𝐬1)𝐭𝐡𝐞𝐧\mathbf{{if}}\left( \mathbf{i.eq.s - 1} \right)\mathbf{{then}}
  5.       𝐁𝐢,𝐢=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  6.     𝐞𝐥𝐬𝐞\mathbf{{else}}
  7.       𝐢𝐟(𝐈𝐦[𝐄𝐢].𝐞𝐪.0)𝐭𝐡𝐞𝐧\mathbf{{if}}\left( \mathbf{{Im}}\left\lbrack \mathbf{E}_{\mathbf{i}} \right\rbrack\mathbf{.eq.0} \right)\mathbf{{then}}
  8.         𝐁𝐢,𝐢=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  9.       𝐞𝐥𝐬𝐞\mathbf{{else}}
  10.         𝐁𝐢,𝐢=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  11.         𝐁𝐢+1,𝐢+1=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i + 1,i + 1}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  12.         𝐁𝐢,𝐢+1=(𝐈𝐦[𝐄𝐢])2\mathbf{B}_{\mathbf{i,i + 1}}\mathbf{= -}\left( \mathbf{Im\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack} \right)^{\mathbf{2}}
  13.         𝐢=𝐢+1\mathbf{i = i + 1}
  14.       𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  15.     𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  16.     𝐢=𝐢+1\mathbf{i = i + 1}
  17.   𝐞𝐧𝐝\mathbf{{end\ \ }}
  18. 𝐞𝐧𝐝\mathbf{{end\ \ }}

The elements of the matrix 𝐁\mathbf{B} are represented as 𝐛𝐤,𝐢\mathbf{b}_{\mathbf{k,i}}

  1. 𝐟𝐨𝐫𝐤=1,2,3𝐬\mathbf{for\ k = 1,2,3\ldots s}
  2.   𝐬𝐨𝐥𝐯𝐞𝐯̂𝐤1𝐟𝐫𝐨𝐦𝐊𝐯̂𝐤1=𝐯𝐤1\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{=}\mathbf{v}_{\mathbf{k - 1}}
  3.   𝐢𝐟(𝐤.𝐧𝐞.1)𝐭𝐡𝐞𝐧\mathbf{if(k.ne.1)then}
  4.     𝛂=𝐛𝐤1,𝐤1\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}
  5.     𝛃=𝐛𝐤2,𝐤1\mathbf{\beta =}\mathbf{b}_{\mathbf{k - 2,k - 1}}
  6.     𝐯𝐤=𝐀𝐯̂𝐤1𝛂𝐯𝐤1+𝛃𝐯𝐤2\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}\mathbf{+ \beta}\mathbf{v}_{\mathbf{k - 2}}
  7.   𝐞𝐥𝐬𝐞\mathbf{{else}}
  8.     𝛂=𝐛𝐤1,𝐤1\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}
  9.     𝐯𝐤=𝐀𝐯̂𝐤1𝛂𝐯𝐤1\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}
  10.   𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  11. 𝐞𝐧𝐝\mathbf{{en}}\mathbf{d}

 The elements of the matrix \mathcal{H} are represented as 𝒽𝐤,𝐢\mathcal{h}_{\mathbf{k,i}}.

  1. 𝐟𝐨𝐫𝐤=1,2,3𝐢1\mathbf{for\ k = 1,2,3\ldots i - 1}
  2.   (𝒽𝐤,𝐢𝒽𝐤+1,𝐢)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝒽𝐤,𝐢𝒽𝐤+1,𝐢)\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}
  3. end\mathbf{\text{end}}
  4. 𝐜𝐢=11+(𝒽𝐢+1,𝐢𝒽𝐢,𝐢)2\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  5. 𝐬𝐢=𝒽𝐢+1,𝐢𝒽𝐢,𝐢11+(𝒽𝐢+1,𝐢𝒽𝐢,𝐢)2\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  6. (𝛇𝐢𝛇𝐢+1)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝛇𝐢𝛇𝐢+1)\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}
  7. 𝒽𝐢,𝐢=𝐜𝐢𝒽𝐢,𝐢𝐬𝐢𝒽𝐢+1,𝐢\mathcal{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathcal{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathcal{h}_{\mathbf{i + 1,i}}
  8. 𝒽𝐢+1,𝐢=0\mathcal{h}_{\mathbf{i + 1,i}}\mathbf{= 0}

  1. 𝐲=1𝛇\mathbf{y =}\mathcal{H}^{\mathbf{- 1}}\mathbf{\zeta}
  2. 𝐬𝐨𝐥𝐯𝐞𝐱̂𝐟𝐫𝐨𝐦𝐊𝐱̂=𝐤=0𝐬𝐦1𝐐𝐤𝐲𝐤+𝐥=𝐬𝐦𝐧1𝐕𝐥𝐬𝐦𝐤=𝐬𝐦𝐧1𝐑𝐥𝐬𝐦,𝐤𝐬𝐦1𝐲𝐤\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 0}}^{\mathbf{s}\mathbf{m}\mathbf{- 1}}\mathbf{Q}_{\mathbf{k}}\mathbf{y}_{\mathbf{k}}\mathbf{+}\sum_{\mathbf{l =}\mathbf{{sm}}}^{\mathbf{n - 1}}\mathbf{V}_{\mathbf{l}\mathbf{- sm}}\sum_{\mathbf{k =}\mathbf{{sm}}}^{\mathbf{n - 1}}{\mathbf{R}_{\mathbf{l - sm}\mathbf{,}\mathbf{k}\mathbf{- sm}}^{\mathbf{- 1}}\mathbf{y}_{\mathbf{k}}}
  3. 𝐱0=𝐱0+𝐱̂\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}

1.5 The Preconditioned Chebyshev basis Conjugate Gradient Method

The Chebyshev basis Conjugate Gradient (CBCG) method is an iterative Krylov subspace algorithm for solving symmetric matrices. The CBCG method calculates kk iterations of the CG method in one iteration. By this approach, the CBCG method reduces the global collective communication calls. In order to use the Chebyshev basis the largest and smallest eigenvalues are needed. PARCEL provides two methods to obtain these eigenvalues, the so called power method and the communication avoiding Arnoldi method.

  1. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫0=𝐛𝐀𝐱0𝐟𝐨𝐫𝐬𝐨𝐦𝐞𝐢𝐧𝐢𝐭𝐢𝐚𝐥𝐠𝐮𝐞𝐬𝐬𝐱1=𝐱0\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ gue}}\mathbf{{ss\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}
  2. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐂𝐡𝐞𝐛𝐲𝐬𝐡𝐞𝐯𝐛𝐚𝐬𝐢𝐬[𝐒0,𝐀𝐒0,𝐫0,𝛌max,𝛌min]\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}
  3. 𝐐0=𝐒0\mathbf{Q}_{\mathbf{0}}\mathbf{=}\mathbf{S}_{\mathbf{0}}
  4. 𝐟𝐨𝐫𝐢=0,1,2,3\mathbf{for\ i = 0,1,2,3\ldots}
  5. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐐𝐢𝐓𝐀𝐐𝐢,𝐐𝐢𝐓𝐫𝐢𝐤\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}}\mathbf{\ ,\ }\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}}
  6.   𝛂𝐢=(𝐐𝐢𝐓𝐀𝐐𝐢)1(𝐐𝐢𝐓𝐫𝐢𝐤)\mathbf{\alpha}_{\mathbf{i}}\mathbf{= \ }\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{r}_{\mathbf{{ik}}} \right)
  7.   𝐱(𝐢+1)𝐤=𝐱𝐢𝐤+𝐐𝐢𝛂𝐢\mathbf{x}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{x}_{\mathbf{{ik}}}\mathbf{+}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}
  8.   𝐫(𝐢+1)𝐤=𝐫𝐢𝐤𝐀𝐐𝐢𝛂𝐢\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{=}\mathbf{r}_{\mathbf{{ik}}}\mathbf{- A}\mathbf{Q}_{\mathbf{i}}\mathbf{\alpha}_{\mathbf{i}}
  9.   𝐢𝐟𝐫(𝐢+1)𝐤/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}
  10.   𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐂𝐡𝐞𝐛𝐲𝐬𝐡𝐞𝐯𝐛𝐚𝐬𝐢𝐬[𝐒𝐢+1,𝐀𝐒𝐢+1,𝐫(𝐢+1)𝐤,𝛌max,𝛌min]\mathbf{Compute\ Chebyshev\ basis\ \lbrack}\mathbf{S}_{\mathbf{i + 1}}\mathbf{,}{\mathbf{A}\mathbf{S}}_{\mathbf{i + 1}}\mathbf{,}\mathbf{r}_{\left( \mathbf{i + 1} \right)\mathbf{k}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{,}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{\rbrack}
  11.   𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐐𝐢𝐓𝐀𝐒𝐢+1\mathbf{{Compute\ }}\mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}}
  12.   𝐁𝐢=(𝐐𝐢𝐓𝐀𝐐𝐢)1(𝐐𝐢𝐓𝐀𝐒𝐢+1)\mathbf{B}_{\mathbf{i}}\mathbf{=}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{Q}_{\mathbf{i}} \right)^{\mathbf{- 1}}\left( \mathbf{Q}_{\mathbf{i}}^{\mathbf{T}}\mathbf{A}\mathbf{S}_{\mathbf{i + 1}} \right)\mathbf{\ }
  13.   𝐐𝐢+1=𝐒𝐢+1𝐐𝐢𝐁𝐢\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{S}_{\mathbf{i + 1}}\mathbf{-}\mathbf{Q}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}
  14.   𝐀𝐐𝐢+1=𝐀𝐒𝐢+1𝐀𝐐𝐢𝐁𝐢\mathbf{A}\mathbf{Q}_{\mathbf{i + 1}}\mathbf{=}\mathbf{{AS}}_{\mathbf{i + 1}}\mathbf{-}\mathbf{{AQ}}_{\mathbf{i}}\mathbf{B}_{\mathbf{i}}
  15. 𝐞𝐧𝐝\mathbf{{end}}

𝛌max\mathbf{\lambda}_{\mathbf{\max}}, 𝛌min\mathbf{\lambda}_{\mathbf{\min}}represent the largest and smallest eigenvalues of 𝐀𝐊1\mathbf{AK}^{-1}.

  1. 𝛈=2/(𝛌max𝛌min)\mathbf{\eta = 2\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}
  2. 𝛇=(𝛌max+𝛌min)/(𝛌max𝛌min)\mathbf{\zeta =}\left( \mathbf{\lambda}_{\mathbf{\max}}\mathbf{+}\mathbf{\lambda}_{\mathbf{\min}} \right)\mathbf{\ /\ (}\mathbf{\lambda}_{\mathbf{\max}}\mathbf{-}\mathbf{\lambda}_{\mathbf{\min}}\mathbf{)}
  3. 𝐬0=𝐫0\mathbf{s}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}\mathbf{\ }
  4. 𝐬𝐨𝐥𝐯𝐞𝐬̃0𝐟𝐫𝐨𝐦𝐊𝐬̃0=𝐬0\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{= \ }\mathbf{s}_{\mathbf{0}}
  5. 𝐬1=𝛈𝐀𝐬̃0𝛇𝐬0\mathbf{s}_{\mathbf{1}}\mathbf{= \eta A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{- \zeta}\mathbf{s}_{\mathbf{0}}
  6. 𝐬𝐨𝐥𝐯𝐞𝐬̃1𝐟𝐫𝐨𝐦𝐊𝐬̃1=𝐬1\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{= \ }\mathbf{s}_{\mathbf{1}}
  7. 𝐟𝐨𝐫𝐣=2,3,,𝐤\mathbf{for\ j = 2,3,\ldots,k}
  8.   𝐬𝐣=2𝛈𝐀𝐬̃𝐣2𝛇𝐬𝐣1𝐬𝐣2\mathbf{s}_{\mathbf{j}}\mathbf{= 2\eta A}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{- 2\zeta}\mathbf{s}_{\mathbf{j - 1}}\mathbf{-}\mathbf{s}_{\mathbf{j - 2}}
  9.   𝐬𝐨𝐥𝐯𝐞𝐬̃𝐣𝐟𝐫𝐨𝐦𝐊𝐬̃𝐣=𝐬𝐣\mathbf{{solve\ }}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{{\ from\ K}}{\widetilde{\mathbf{s}}}_{\mathbf{j}}\mathbf{= \ }\mathbf{s}_{\mathbf{j}}
  10. 𝐞𝐧𝐝\mathbf{{end}}
  11. 𝐒=(𝐬̃0,𝐬̃1,,𝐬̃𝐤1)\mathbf{S = (}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}
  12. 𝐀𝐒=(𝐀𝐬̃0,𝐀𝐬̃1,,𝐀𝐬̃𝐤1)\mathbf{AS = (A}{\widetilde{\mathbf{s}}}_{\mathbf{0}}\mathbf{,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{1}}\mathbf{,\ldots,}{\mathbf{A}\widetilde{\mathbf{s}}}_{\mathbf{k - 1}}\mathbf{)}

2 Preconditioning

In iterative algorithms, the application of a preconditioner of the form KAK \approx A can help to reduce the number of iterations until convergence. The notation [solvep̂fromKp̂=p{{solve\ }}\widehat{{p}}{{\ from\ K}}\widehat{{p}}{= \ }{p}] means [approximatelysolveAp̂=papproximately solve {A}\widehat{p}{= \ }{p} with respect to the vector p̂\widehat{p}]. Although the more accurate approximation leads to the less number of iterations until convergence, the cost of preconditioner is normally increased. In non-parallel computing, one of the most common preconditioners is the incomplete LU factorization (ILU). However, in parallel computing, parallel preconditioners are needed. PARCEL provides the following parallel preconditioners: Point Jacobi, Block Jacobi and Additive Schwarz.

2.1 Point Jacobi Preconditioner

The point Jacobi preconditioner is one of the most simplest preconditioners. KK only consists of the diagonal elements of the matrix AA. Compared to other preconditioners the efficiency of the point Jacobi preconditioner is very low. However, it can be easily applied in parallel and it does not require any additional communication between processors.

2.2 Zero Fill-in Incomplete LU Factorization Preconditioners (ILU(0))

LU factorization decompose a square matrix AA into a lower triangular matrix LL and an upper triangular matrix UU. For a typical sparse matrix, the LU factors can be much less sparse than the original matrix, which is called as fill-in. This increases the memory and computational requirements. In order to avoid this issue, PARCEL provides the zero fill-in incomplete LU factorization (ILU(0)).

2.3 Diagonal Incomplete LU Factorization Preconditioners (D-ILU)

The diagonal incomplete LU factorization (D-ILU) computes a diagonal matrix DD, a lower triangular matrix LL and an upper triangular matrix UU for the input matrix AA. A=LA+DA+UAA = L_{A} + D_{A} + U_{A} The preconditioner KK is constructed with LAL_{A}, UAU_{A} and DD (DDAD \neq D_{A}) with: K=(D+LA)D1(D+UA)K = \left( D + L_{A} \right)D^{- 1}\left( D + U_{A} \right) Two different conditions exist in order to construct the diagonal matrix DD:

Given the conditions above DD can be computed as follows:

2.4 Block Jacobi Preconditioner

The Block Jacobi Preconditioner constructs KK out of the diagonal blocks of the matrix AA. For each block, the incomplete ILU factorization is computed. When the preconditioner is applied, each block can be processed independently, resulting in a high level of parallelism. In addition, no communication is needed when each block is defined within a sub-matrix on each processor.

2.5 Additive Schwarz Preconditioner

The Additive Schwarz Preconditioner constructs KK out of the overlapping diagonal blocks of AA. For each block the incomplete LU factorization is computed. Compared to the Block Jacobi Preconditioner, the Additive Schwarz Preconditioner may require additional communication. PARCEL provides four different overlapping methods, BASIC, RESTRICT, INTERPOLATE and NONE.

Fig.1 Additive Schwarz Preconditioner with overlapping diagonal blocks

BASIC

Solve Ks=rKs=r with an extended rr including an overlapping region with the neighboring process, ss in the overlapping region is determined by summing up the results between the neighboring region.

Fig.2 BASIC

RESTRICT

  Solve Ks=rKs=r with an extended rr including an overlapping region with the neighboring process, ss in the overlapping region is determined without taking account of the result in the neighboring region.

Fig.3 RESTRICT

INTERPOLATE

Solve Ks=rKs=r with an extended rr without an overlapping region with the neighboring process, ss in the overlapping region is determined by summing up the results between the neighboring regions. 

Fig.4 INTERPOLATE

NONE

Solve Ks=rKs=r with an extended rr without an overlapping region with the neighboring process, ss in the overlapping region is determined without taking account of the result in the neighboring regions. 

Fig.5 NONE

3 Sparse Matrix Data Formats

In order to save memory in sparse matrix computation, only the non-zero elements and their locations are stored. PARCEL supports different sparse matrix formats.

3.1 Compressed Row Storage (CRS) Format

 The CRS format compresses the row indices of the non-zero elements of the sparse matrix. Together with the compressed rows, the column index and the value for each non-zero element are stored in one dimensional arrays.

A=(abc00d00e0fg0hij) A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)


One process case

Values of non-zero elements :: crsA={a,b,c,d,e,f,g,h,i,j}crsA=\{a,b,c,d,e,f,g,h,i,j\}

Compressed row indices :: crsRow_ptr={1,4,5,8,11}crsRow\_ptr=\{1,4,5,8,11\}

Column indices of non-zero elements :: crsCol={1,2,3,2,1,3,4,2,3,4}crsCol=\{1,2,3,2,1,3,4,2,3,4\}


Two process case

Values of non-zero elements :: crsA={a,b,c,d}crsA=\{a,b,c,d\}

Compressed row indices :: crsRow_ptr={1,4,5}crsRow\_ptr=\{1,4,5\}

Column indices of non-zero elements :: crsCol={1,2,3,2}crsCol=\{1,2,3,2\}

Values of non-zero elements :: crsA={e,f,g,h,i,j}crsA=\{e,f,g,h,i,j\}

Compressed row indices :: crsRow_ptr={1,4,7}crsRow\_ptr=\{1,4,7\}

Column indices of non-zero elements :: crsCol={1,3,4,2,3,4}crsCol=\{1,3,4,2,3,4\}


3.2 Diagonal Format

 The Diagonal Format stores the non-zero elements of a matrix as diagonals. This format provides high performance for band block diagonal matrices. In addition to the element values, an offset value for each diagonal of the matrix needs to be stored inside an one dimensional array. A negative offset indicates that the diagonal is below the main diagonal of the matrix, a positive offset indicates that the diagonal is above the main diagonal. The main diagonal has the offset zero.

A=(abc00d00e0fg0hij) A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)


Single process case

Elements :: diaA={0,0,e,h,0,0,0,i,a,d,f,j,b,0,g,0,c,0,0,0}diaA=\{0,0,e,h,0,0,0,i,a,d,f,j,b,0,g,0,c,0,0,0\}

Offsets :: offset={2,1,0,1,2}offset=\{-2,-1,0,1,2\}

Number of diagonals :: nnd=5nnd=5


Two process case

Elements :: diaA={a,d,b,0,c,0}diaA=\{a,d,b,0,c,0\}

Offsets :: offset={0,1,2}offset=\{0,1,2\}

Number of diagonals :: nnd=3nnd=3

Elements :: diaA={e,h,0,i,f,j,g,0}diaA=\{e,h,0,i,f,j,g,0\}

Offsets :: offset={2,1,0,1}offset=\{-2,-1,0,1\}

Number of diagonals :: nnd=4nnd=4


4 The structure of PARCEL

 This chapter explains how to compile and install the PARCEL library.

4.1 Directory Structure

 After extracting the PARCEL library, the directory structure should be the same as in Fig.6.

Fig.6 The PARCEL directory structure

4.2 Compiling the PARCEL library

PARCEL requires a C compiler, a FORTRAN compiler, which supports preprocessor, MPI, LAPACK, and OpenMP. When compiling with gfortrangfortran, the option cpp-cpp has to be specified to to activate preprocessor. Compiler options can be changed inside the file arch/make_config. An example of a make_config file is shown in Fig.7. The make command should be executed from the top directory of the extracted file. In order to re-compile, makecleanmake\ clean followed by makemake should be executed. If the makemake command was successful, each exampleexample directory should contain an executable file. 64-bit integer is only needed when working with over roughly 2 billion unknowns.


Option  Explanation
CC  C compiler command name
FC  FORTRAN compiler command name
CFLAGS  C compiler options
FFLAGS  FORTRAN compiler options
FP_MODE  Option to suppress optimizations with arithmetic order change (Only for quad precision)
INCLUDEDIR  Include directories
LD_MPI  MPI library
LD_LAPACK  LAPACK library
FDEFINE  64bit integer becomes available with -DPARCEL_INT8.

Fig.7 Example of a make_config file

4.3 Usage of the PARCEL library

In order to call PARCEL library, programs need to be linked against the *.a*.a library files inside the srcsrc directory.

5 PARCEL routines (direct interface)

 The PARCEL routines with direct call interfaces are shown below.

5.1 parcel_dcg


call parcel_dcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.2 parcel_dcg_dia


call parcel_dcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.3 parcel_ddcg


call parcel_ddcg( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.4 parcel_ddcg_dia


call parcel_ddcg_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the conjugate gradient method (CG method). Non-zero elements are stored in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.5 parcel_dbicgstab


call parcel_dbicgstab( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored​in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.6 parcel_dbicgstab_dia


call parcel_dbicgstab_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)

5.7 parcel_ddbicgstab


call parcel_ddbicgstab( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.8 parcel_ddbicgstab_dia


call parcel_ddbicgstab_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the stabilized biconjugate gradient method (Bi-CGSTAB method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.9 parcel_dgmres


call parcel_dgmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

5.10 parcel_dgmres_dia


call parcel_dgmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

5.11 parcel_ddgmres


call parcel_ddgmres( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,nnz,istart, crsA_hi,crsA_lo,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, GMRES_m,GMRES_GSflag, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA_hi(nnz) double precision in Non-zero elements (upper bits) of matrix stored in the CRS format
crsA_lo(nnz) double precision in Non-zero elements (lower bit) of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.12 parcel_ddgmres_dia


call parcel_ddgmres_dia( icomm, vecx_hi,vecx_lo, vecb_hi,vecb_lo, n,gn,istart, diaA_hi,diaA_lo,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, PRECISION_A, PRECISION_b, PRECISION_x, PRECISION_PRECONDITION )


Solve a simultaneous linear equation system Ax = b by the generalized minimum residual method (GMRES (m) method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx_hi(n) double precision in/out in: Initial vector (upper bits), out: Solution vector (upper bits)
vecx_lo(n) double precision in/out in: initial value of the iteration (lower bit), out: solution vector (lower bit)
vecb_hi(n) double precision in Right hand side vector (upper bit)
vecb_lo(n) double precision in Right hand side vector (lower bit)
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA_hi(n*nnd) double precision in Non-zero elements (upper bits) of matrix stored in the Diagonal format
diaA_lo(n*nnd) double precision in Non-zero elements (lower bit) of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
GMRES_m integer in Number of iterations until restart
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)
PRECISION_A integer in Precision of matrix (1: double precision, 2: quad precision)
PRECISION_b integer in Precision of right hand side vector (1: double precision, 2: quad precision)
PRECISION_x integer in Precision of solution vector (1: double precision, 2: quad precision)
PRECISION_PRECONDITION integer in Precision of preconditioner (1: double precision, 2: quad precision)

5.13 parcel_dcbcg


call parcel_dcbcg( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )


The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CBCG_kstep integer in Number of communication avoiding steps
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax integer in Maximum number of iterations in the power method
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

5.14 parcel_dcbcg_dia


call parcel_dcbcg_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CBCG_kstep,CBCG_Eigenflag,power_method_itrmax, CAARNOLDI_sstep,CAARNOLDI_tstep, CAARNOLDI_basis,CAARNOLDI_QRflag )


The system of equations Ax = b is solved by the Chebyshev basis conjugate gradient method (CBCG method). Non-zero elements are stored in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CBCG_kstep integer in Number of communication avoiding steps
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)
power_method_itrmax integer in Maximum number of iterations in the power method
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

5.15 parcel_dcagmres


call parcel_dcagmres( icomm, vecx,vecb, n,gn,nnz,istart, crsA,crsRow_ptr,crsCol, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )


Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values​in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CAGMRES_sstep integer in Number of communication avoiding steps s in the CA-GMRES method
CAGMRES_tstep integer in Number of outer iterations t in the CA-GMRES method (restart length = st)
CAGMRES_basis integer in Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)
CAGMRES_QRflag integer in Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

5.16 parcel_dcagmres_dia


call parcel_dcagmres_dia( icomm, vecx,vecb, n,gn,istart, diaA,offset,nnd, ipreflag,ILU_method,addL,iflagAS, itrmax,rtolmax, reshistory, iovlflag,iret, CAGMRES_sstep, CAGMRES_tstep, CAGMRES_basis, CAGMRES_QRflag )


Solve a simultaneous linear equation system Ax = b by the communication reduced generalized minimum residual method (CA-GMRES (s, t) method). Separate non-zero element values in the Diagonal format

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in/out in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
iflagAS integer in Flag for the additive Schwartz method (1: BASIC, 2: RESTRICT, 3: INTERPOLATE, 4: NONE)
itrmax integer in/out in: maximum number of iterations, out: number of iterations
rtolmax double precision in Convergence criterion (norm of relative residual error)
reshistory(itrmax) double precision out History of relative residual error
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)
iret integer out Error code (0:normal)
CAGMRES_sstep integer in Small communication step of CA-GMRES method
CAGMRES_tstep integer in List step of the CA-GMRES method
CAGMRES_basis integer in Base generation flag of CA-GMRES method (0: single basis, 1: newton basis)
CAGMRES_QRflag integer in QR Decomposition flag of CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

6 PARCEL routines (indirect interface)

 The PARCEL routines with indirect call interfaces are shown below.

6.1 PARCEL_KSP_Default_Setting


call PARCEL_KSP_Default_Setting(method)


Apply initial settings to Struct in the PARCEL format.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format

6.2 set_KSP_CRS


call set_KSP_CRS(icomm,vecx,vecb,n,gn,nnz,istart,crsA,crsRow_ptr,crsCol,itrmax,ipreflag,ILU_method,addL,method)


 Set matrix in the CRS format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
nnz integer*4 / integer*8 in Number of non-zero elements on each process
istart integer*4 / integer*8 in Start line of matrix on each process
crsA(nnz) double precision in Non-zero elements of matrix stored in the CRS format
crsRow_ptr(n+1) integer*4 / integer*8 in Pointer table in the CRS format
crsCol(nnz) integer*4 / integer*8 in Column numbers of non-zero elements in the CRS format
itrmax integer in in: maximum number of iterations, out: number of iterations
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
method type(KSP) in/out Struct in the PARCEL format

6.3 set_KSP_DIA


call set_KSP_DIA(icomm,vecx,vecb,n,gn,istart,diaA,offset,nnd,itrmax,ipreflag,ILU_method,addL,method)


Set matrix in the Diagonal format.

Argument name (dimension) Type Input/Output Description
icomm integer in MPI communicator
vecx(n) double precision in in: initial vector, out: solution vector
vecb(n) double precision in Right hand side vector
n integer*4 / integer*8 in Size of vector on each process
gn integer*4 / integer*8 in Total size of vector
istart integer*4 / integer*8 in Start line of matrix on each process
diaA(n*nnd) double precision in Non-zero elements of matrix stored in the Diagonal format
offset(nnd) integer*4 / integer*8 in Offset of each non-zero diagonal elements array in the Diagonal format
nnd integer*4 / integer*8 in Number of diagonal elements arrays in the Diagonal format
itrmax integer in in: maximum number of iterations, out: number of iterations
ipreflag integer in Flag for preconditioner (0: none, 1: point Jacobi, 2: block Jacobi, 3: additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization(0:ILU(0),1:D-ILU(Diagonal components match),2:D-ILU(Element sums in a row match)
addL integer in Overlapping width in the additive Schwartz method
method type(KSP) in/out Struct in the PARCEL format

6.4 set_vecx_KSP


 call set_vecx_KSP(n,x,method)


Set initial vector.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
x(n) double precision in Initial vector
method type(KSP) in/out Struct in the PARCEL format

6.5 set_vecb_KSP


 call set_vecb_KSP(n,b,method)


Set right hand side vector b.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
b(n) double precision in Right hand side vector
method type(KSP) in/out Struct in the PARCEL format

6.6 get_vecx_KSP


 call get_vecx_KSP(n,x,method)


Get solution vector.

Argument name (dimension) Type Input/Output Description
n integer*4 / integer*8 in Size of vector on each process
x(n) double precision out Solution vector
method type(KSP) in Struct in the PARCEL format

6.7 get_reshistory_KSP


call get_reshistory_KSP(niter,reshistory, method)


Get convergence history.

Argument name (dimension) Type Input/Output Description
niter integer out Number of iterations until convergence
reshistory(method%itrmax) double precision out Convergence history
method type(KSP) in Struct in the PARCEL format

6.8 reset_CRS_Mat


call reset_CRS_Mat(itrmax,ipreflag,ILU_method,addL,crsA,method)


 Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the CRS format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension) Type Input/Output Description
itrmax integer in Maximum number of iterations (-1: keep the existing setting)
ipreflag integer in Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL integer in Overlapping width in the additive Schwartz method (-1: keep the existing setting)
crsA(method%nnz) double precision in Non-zero elements of matrix stored in the CRS format
method type(KSP) in/out Struct in the PARCEL format

6.9 reset_DIA_Mat


call reset_DIA_Mat(itrmax,ipreflag,ILU_method,addL,diaMat,method)


 Change the matrix elements, preconditioner, and the maximum number of iterations of solvers in the Diagonal format without changing the matrix structure or the positions of non-zero elements.

Argument name (dimension) Type Input/Output Description
itrmax integer in Maximum number of iterations (-1: keep the existing setting)
ipreflag integer in Flag for preconditioner (-1: keep the existing setting, 0: None, 1: Point Jacobi, 2: Block Jacobi, 3: Additive Schwartz)
ILU_method integer in Flag for incomplete LU factorization (-1: keep the existing setting, 1: match the diagonal components, 2: match the element sum in the row)
addL integer in Overlapping width in the additive Schwartz method (-1: keep the existing setting)
diaMat(method%n*method%nnd) double precision in Non-zero elements of matrix stored in the CRS format
method type(KSP) in/out Struct in the PARCEL format

6.10 set_KSP_solver


call set_KSP_solver(method,solver)


Set Krylov subspace method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
solver integer in Flag for Krylov subspace method (1: CG, 2: BiCGGSTAB, 3: GMRES, 4: CA-GMRES, 5: CBCG)

6.11 set_KSP_abstolmax(method,abstolmax)


call set_KSP_abstolmax(method,abstolmax)


Set maximun tolerance in absolute value of residual error.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
abstolmax double precision in Convergence criterion (norm of residual error)

6.12 set_KSP_rtolmax


call set_KSP_rtolmax(method,rtolmax)


Set maximum tolerance in relative error.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
rtolmax double precision in Convergence criterion (norm of relative residual error)

6.13 set_KSP_iovlflag


call set_KSP_iovlflag(method,iovlflag)


Set communication-computation overlap method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
iovlflag integer in Flag for communication-computation overlap (0: none, 1: all processes, 2: each process)

6.14 set_KSP_GMRES_m(method,GMRES_m)


call set_KSP_GMRES_m(method,GMRES_m)


Set the number of iterations m until restart in the GMRES(m) method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
GMRES_m integer in Number of iterations until restart

6.15 set_KSP_GMRES_GSflag


call set_KSP_GMRES_GSflag(method,GMRES_GSflag)


Set orthogonalization algorithm in the GMRES(m) method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
GMRES_GSflag integer in Flag for orthogonalization algorithm (1: MGS, 2: CGS)

6.16 set_KSP_CAGMRES_sstep


call set_KSP_CAGMRES_sstep(method,CAGMRES_sstep)


Set number of communication avoiding steps s of CA-GMRES.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_sstep integer in Number of communication steps of the CA-GMRES method

6.17 set_KSP_CAGMRES_tstep


call set_KSP_CAGMRES_tstep(method,CAGMRES_tstep)


Set number of outer iterations t of CA-GMRES.)

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_tstep integer in Number of outer iterations t in the CA-GMRES method method (restart length = st)

6.18 set_KSP_CAGMRES_basis


call set_KSP_CAGMRES_basis(method,CAGMRES_basis)


Set basis vector in the CA-GMRES method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_basis integer in Flag for basis vector in the CA-GMRES method (0: monomial basis, 1: Newton basis)

6.19 set_KSP_CAGMRES_QRflag


call set_KSP_CAGMRES_QRflag(method,CAGMRES_QRflag)


Set QR factorization algorithm in the CA-GMRES method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAGMRES_QRflag integer in Flag for QR factorization in the CA-GMRES method (1: MGS, 2: CGS, 3: TSQR, 4: CholeskyQR, 5: CholeskyQR2)

6.20 set_KSP_CBCG_kstep


call set_KSP_CBCG_kstep(method,CBCG_kstep)


Set number of communication avoiding steps in the CBCG method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CBCG_kstep integer in Number of communication avoiding steps

6.21 set_KSP_CBCG_Eigenflag


call set_KSP_CBCG_Eigenflag(method,CBCG_Eigenflag)


Set eigenvalue computation algorithm in the CBCG method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CBCG_Eigenflag integer in Flag for eigenvalue computation (1: power method, 2: CA-ARNOLDI)

6.22 set_KSP_CAARNOLDI_sstep


call set_KSP_CAARNOLDI_sstep(method,CAARNOLDI_sstep)


Set number of communication avoiding steps s in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_sstep integer in Number of communication avoiding steps s of the CA-ARNOLDI method

6.23 set_KSP_CAARNOLDI_tstep


call set_KSP_CAARNOLDI_tstep(method,CAARNOLDI_tstep)


Set number of outer iterations t in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_tstep integer in Number of outer iterations t in the CA-ARNOLDI method (restart length = st)

6.24 set_KSP_CAARNOLDI_basis


call set_KSP_CAARNOLDI_basis(method,CAARNOLDI_basis)


Set basis vector in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_basis integer in Flag for basis vector in the CA-ARNOLDI method (0: monomial basis, 1: Newton basis)

6.25 set_KSP_CAARNOLDI_QRflag


call set_KSP_CAARNOLDI_QRflag(method,CAARNOLDI_QRflag)


Set QR factorization algorithm in the CA-ARNOLDI method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
CAARNOLDI_QRflag integer in Flag for QR factorization in the CA-ARNOLDI method (1: MGS, 2: CGS, 3: TSQR, 4: Cholesky QR, 5: Cholesky QR2)

6.26 set_KSP_power_method_itrmax


call set_KSP_power_method_itrmax(method,power_method_itrmax)


Set maximum number of iterations in the power method.

Argument name (dimension) Type Input/Output Description
method type(KSP) out Struct in the PARCEL format
power_method_itrmax integer in Maximum number of iterations in the power method

6.27 START_KSP_SOLVER


call START_KSP_SOLVER(method)


Solve a simultaneous linear equation system Ax = b.

Argument name (dimension) Type Input/Output Description
method type(KSP) in/out Struct in the PARCEL format

7 How to use

 The use of the PARCEL routines is explained by FORTRAN sample programs of a CG solver. Details of the usage of the PARCEL in C programs can be found in C sample programs stored in SPARSE/example_C in the archive. The PARCEL can be called also from C programs in the same manner, provided that a MPI communicator is prepared for FORTRAN routines. In the C sample programs, a MPI communicator created in C is converted to FORTRAN via MPI_Comm_c2f in the MPI library. In the C sample program for the indirect interface, a struct in the PARCEL format is generated in FORTRAN, and then, it is used in C programs.

7.1 Compressed Row Storage (CRS) Format

 Sample codes in the CRS format are shown in Fig.8 and Fig.9. Here, make_matrix_CRS denotes an arbitrary routine, which generates a matrix in the CRS format in PARCEL. Fig.8 and Fig.9 show examples with the direct and indirect interfaces, respectively. In Fig.9, a compiling option to include SPARSE/src is needed. When only values of matrix elements in AA, elements in bb, and the initial values in xx are changed with the same matrix structure or the non-zero element positions, in Fig.8, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.


  program main
    use mpi
    implicit none
 
    integer n,gn,nnz,istart
    real*8,allocatable  :: crsA(:)
    integer,allocatable :: crsRow_ptr(:),crsCol(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
 
    call MPI_Init(ierr)
 
    call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
 
    vecb=1.0d0
    vecx=1.0d0
    call parcel_dcg( &
         MPI_COMM_WORLD, &
         vecx,vecb, &
         n,gn,nnz,istart, &
         crsA,crsRow_ptr,crsCol, &
         ipreflag,ILU_method,addL,iflagAS, &
         itrmax,rtolmax, &
         reshistory, &
         iovlflag,iret &
         )
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(crsA)
    deallocate(crsRow_ptr)
    deallocate(crsCol)
 
    deallocate(reshistory)
 
  end program main


Fig.8 Sample code in the CRS format (direct interface)

  program main
    use mpi
    use krylov
    use krylov_kernel
    implicit none
 
    integer n,gn,nnz,istart
    real*8,allocatable  :: crsA(:)
    integer,allocatable :: crsRow_ptr(:),crsCol(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
    type(KSP) :: method
 
    call MPI_Init(ierr)
 
    call make_matrix_CRS(n,gn,nnz,istart,crsA,crsRow_ptr,crsCol)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
 
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
    vecb=1.0d0
    vecx=1.0d0
 
    call PARCEL_KSP_Default_Setting(method)
    call set_KSP_CRS(&
         MPI_COMM_WORLD, &
         vecx,vecb, &
         n,gn,nnz,istart, &
         crsA,crsRow_ptr,crsCol, &
         itrmax,ipreflag,ILU_method,addL,&
         method)
    call set_KSP_solver(method,1)
    call START_KSP_SOLVER(method)
    call get_vecx_KSP(n,vecx,method)
    call free_KSP(method)
 
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(reshistory)
 
  end program main


Fig.9 Sample code in the CRS format (indirect interface)

7.2 Diagonal Format

 Sample codes in the Diagonal format are shown in Fig.10 and Fig.11. Here, make_matrix_DIA denotes an arbitrary routine, which generates a matrix in the Diagonal format in PARCEL. Fig.10 and Fig.11 show examples with the direct and indirect interfaces, respectively. In Fig.11, a compiling option to include SPARSE/src is needed. When only values of matrix elements in AA, elements in bb, and the initial values in xx are changed with the same matrix structure or the non-zero element positions, in Fig.10, parcel_dcg requires overhead for the initialization such as generating list of processes which exchange data with each other.


  program main
    use mpi
    implicit none
 
    integer n,gn,nnd,istart
    real*8,allocatable  :: diaA(:)
    integer,allocatable :: offset(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
 
    call MPI_Init(ierr)
 
    call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
 
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
    vecb=1.0d0
    vecx=1.0d0
 
    call parcel_dcg_dia(&
         MPI_COMM_WORLD, &
         vecx,vecb,&
         n,gn,istart,&
         diaA,offset,nnd,&
         ipreflag,ILU_method,addL,iflagAS,&
         itrmax,rtolmax,&
         reshistory,&
         iovlflag,iret&
         )
 
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(diaA)
    deallocate(offset)
 
    deallocate(reshistory)
 
  end program main


Fig.10 Sample code in the Diagonal format (direct interface)

  program main
    use mpi
    use krylov
    use krylov_kernel
    implicit none
 
    integer n,gn,nnd,istart
    real*8,allocatable  :: diaA(:)
    integer,allocatable :: offset(:)
    real*8,allocatable  :: vecx(:)
    real*8,allocatable  :: vecb(:)
    integer itrmax
    real*8  rtolmax
    real*8, allocatable :: reshistory(:)
    integer ipreflag
    integer ILU_method
    integer iflagAS
    integer addL
    integer iovlflag
    integer iret
    integer ierr
 
    call MPI_Init(ierr)
 
    call make_matrix_DIA(n,gn,nnd,istart,diaA,offset)
 
    allocate(vecx(n))
    allocate(vecb(n))
    allocate(reshistory(itrmax))
 
    ipreflag=0
    ILU_method=1
    addL=0
    iflagAS=1
    itrmax=100
    rtolmax=1.0d-8
    iovlflag=0
    vecb=1.0d0
    vecx=1.0d0
 
    call PARCEL_KSP_Default_Setting(method)
    call set_KSP_DIA(&
         MPI_COMM_WORLD, &
         vecx,vecb, &
         n,gn,istart, &
         diaA,offset,nnd,&
         itrmax,ipreflag,ILU_method,addL,&
         method)
    call set_KSP_solver(method,1)
    call START_KSP_SOLVER(method)
    call get_vecx_KSP(n,vecx,method)
    call free_KSP(method)
 
    call MPI_Finalize(ierr)
 
    deallocate(vecx)
    deallocate(vecb)
 
    deallocate(reshistory)
 
  end program main


Fig.11 Sample code in the Diagonal format (indirect interface)