1 Krylov subspace methods

This chapter describes the iterative methods contained in this library. The following notation is used, Ax=bAx = b represents the linear system which will be solved, with AA being an n×\timesn sparse matrix and x0x_0 representing the initial vector. KK represents an appropriate preconditioner.


1.1 Preconditioned Conjugate Gradient Method [1]

The Conjugate Gradient (CG) method is an iterative Krylov subspace algorithm for solving symmetric matrices.

  1. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫0=𝐛𝐀𝐱0𝐟𝐨𝐫𝐬𝐨𝐦𝐞𝐢𝐧𝐢𝐭𝐢𝐚𝐥𝐠𝐮𝐞𝐬𝐬𝐱1=𝐱0\mathbf{{Compute\ }}\mathbf{r}^{\mathbf{0}}\mathbf{= b - A}\mathbf{x}^{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}^{\mathbf{- 1}}\mathbf{=}\mathbf{x}^{\mathbf{0}}\mathbf{\ }
  2. 𝐩1=0\mathbf{p}^{\mathbf{- 1}}\mathbf{= 0}
  3. 𝛂1=0\mathbf{\alpha}_{\mathbf{- 1}}\mathbf{= 0\ }\mathbf{\ }
  4. 𝛃1=0\mathbf{\beta}_{\mathbf{- 1}}\mathbf{= 0}
  5. 𝐬𝐨𝐥𝐯𝐞𝐬𝐟𝐫𝐨𝐦𝐊𝐬=𝐫0\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{0}}
  6. 𝛒0=𝐬,𝐫0\mathbf{\rho}_{\mathbf{0}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{0}} \right\rangle
  7. 𝐟𝐨𝐫𝐢=0,1,2,3\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}
  8.   𝐩𝐢=𝐬+𝛃𝐢1𝐩𝐢1\mathbf{p}^{\mathbf{i}}\mathbf{= s +}\mathbf{\beta}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  9.   𝐪𝐢=𝐀𝐩𝐢\mathbf{q}^{\mathbf{i}}\mathbf{= A}\mathbf{p}^{\mathbf{i}}
  10.   𝛄=𝐩𝐢,𝐪𝐢\mathbf{\gamma =}\left\langle \mathbf{p}^{\mathbf{i}}\mathbf{,}\mathbf{q}^{\mathbf{i}} \right\rangle
  11.   𝐱𝐢=𝐱𝐢1+𝛂𝐢1𝐩𝐢1\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  12.   𝛂𝐢=𝛒𝐢/𝛄\mathbf{\alpha}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i}}\mathbf{\ /\ \gamma}
  13.   𝐫𝐢+1=𝐫𝐢𝛂𝐢𝐪𝐢\mathbf{r}^{\mathbf{i + 1}}\mathbf{=}\mathbf{r}^{\mathbf{i}}\mathbf{-}\mathbf{\alpha}_{\mathbf{i}}\mathbf{q}^{\mathbf{i}}
  14.   𝐬𝐨𝐥𝐯𝐞𝐬𝐟𝐫𝐨𝐦𝐊𝐬=𝐫𝐢+1\mathbf{{solve\ s\ from\ }}\mathbf{K}\mathbf{s = \ }\mathbf{r}^{\mathbf{i + 1}}
  15.   𝛒𝐢+1=𝐬,𝐫𝐢+1\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{=}\left\langle \mathbf{s,}\mathbf{r}^{\mathbf{i + 1}} \right\rangle
  16.   𝐢𝐟𝐫𝐢+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{r}^{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ the}}\mathbf{n}
  17.     𝐱𝐢=𝐱𝐢1+𝛂𝐢1𝐩𝐢1\mathbf{x}^{\mathbf{i}}\mathbf{=}\mathbf{x}^{\mathbf{i - 1}}\mathbf{+}\mathbf{\alpha}_{\mathbf{i - 1}}\mathbf{p}^{\mathbf{i - 1}}
  18.     𝐪𝐮𝐢𝐭\mathbf{{qui}}\mathbf{t}
  19.   𝐞𝐧𝐝𝐢𝐟\mathbf{{endi}}\mathbf{f}
  20.   𝛃𝐢=𝛒𝐢+1/𝛒𝐢\mathbf{\beta}_{\mathbf{i}}\mathbf{=}\mathbf{\rho}_{\mathbf{i + 1}}\mathbf{\ /\ }\mathbf{\rho}_{\mathbf{i}}
  21. 𝐞𝐧𝐝\mathbf{{\ en}}\mathbf{d}


1.2 Preconditioned Biconjugate Gradient Stabilized Method[1]

The Biconjugate Gradient Stabilized (Bi-CGSTAB) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices.

  1. 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫0=𝐛𝐀𝐱0𝐟𝐨𝐫𝐬𝐨𝐦𝐞𝐢𝐧𝐢𝐭𝐢𝐚𝐥𝐠𝐮𝐞𝐬𝐬𝐱1=𝐱0\mathbf{{Compute\ }}\mathbf{r}_{\mathbf{0}}\mathbf{= b - A}\mathbf{x}_{\mathbf{0}}\mathbf{{\ for\ some\ initial\ guess\ }}\mathbf{x}_{\mathbf{- 1}}\mathbf{=}\mathbf{x}_{\mathbf{0}}
  2. 𝐩0=𝐫0\mathbf{p}_{\mathbf{0}}\mathbf{=}\mathbf{r}_{\mathbf{0}}
  3. 𝐜1=𝐫0,𝐫0\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{0}} \right\rangle
  4. 𝐟𝐨𝐫𝐢=0,1,2,3\mathbf{for\ i = 0,1,2,3}\mathbf{\ldots}
  5.   𝐬𝐨𝐥𝐯𝐞𝐩̂𝐟𝐫𝐨𝐦𝐊𝐩̂=𝐩𝐢\mathbf{{solve\ }}\widehat{\mathbf{p}}\mathbf{{\ from\ K}}\widehat{\mathbf{p}}\mathbf{= \ }\mathbf{p}_{\mathbf{i}}
  6.   𝐪=𝐀𝐩̂\mathbf{q = A}\widehat{\mathbf{p}}
  7.   𝐜2=𝐫0,𝐪\mathbf{c}_{\mathbf{2}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,q} \right\rangle
  8.   𝛂=𝐜1/𝐜2\mathbf{\alpha =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\mathbf{c}_{\mathbf{2}}
  9.   𝐞=𝐫𝐢𝛂𝐪\mathbf{e =}\mathbf{r}_{\mathbf{i}}\mathbf{- \alpha q}
  10.   𝐬𝐨𝐥𝐯𝐞𝐞̂𝐟𝐫𝐨𝐦𝐊𝐞̂=𝐞\mathbf{{solve\ }}\widehat{\mathbf{e}}\mathbf{{\ from\ K}}\widehat{\mathbf{e}}\mathbf{= e}
  11.   𝐯=𝐀𝐞̂\mathbf{v = A}\widehat{\mathbf{e}}
  12.   𝐜3=𝐞,𝐯/𝐯,𝐯\mathbf{c}_{\mathbf{3}}\mathbf{=}\left\langle \mathbf{e,v} \right\rangle\mathbf{\ /\ }\left\langle \mathbf{v,v} \right\rangle
  13.   𝐱𝐢+1=𝐱𝐢+𝛂𝐩̂+𝐜3𝐞̂\mathbf{x}_{\mathbf{i + 1}}\mathbf{=}\mathbf{x}_{\mathbf{i}}\mathbf{+ \alpha}\widehat{\mathbf{p}}\mathbf{+}\mathbf{c}_{\mathbf{3}}\widehat{\mathbf{e}}
  14.   𝐫𝐢+1=𝐞𝐜3𝐯\mathbf{r}_{\mathbf{i + 1}}\mathbf{= e -}\mathbf{c}_{\mathbf{3}}\mathbf{v}
  15.   𝐜1=𝐫0,𝐫𝐢+1\mathbf{c}_{\mathbf{1}}\mathbf{=}\left\langle \mathbf{r}_{\mathbf{0}}\mathbf{,}\mathbf{r}_{\mathbf{i + 1}} \right\rangle
  16.   𝐢𝐟𝐫𝐢+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{r}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{the}}\mathbf{n}
  17.     𝐪𝐮𝐢𝐭\mathbf{{quit}}
  18.   𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  19.   𝛃=𝐜1/(𝐜2𝐜3)\mathbf{\beta =}\mathbf{c}_{\mathbf{1}}\mathbf{\ /\ }\left( \mathbf{c}_{\mathbf{2}}\mathbf{c}_{\mathbf{3}} \right)
  20.   𝐩𝐢+1=𝐫𝐢+1+𝛃(𝐩𝐢𝐜3𝐪)\mathbf{p}_{\mathbf{i + 1}}\mathbf{=}\mathbf{r}_{\mathbf{i + 1}}\mathbf{+ \beta}\left( \mathbf{p}_{\mathbf{i}}\mathbf{-}\mathbf{c}_{\mathbf{3}}\mathbf{q} \right)
  21. 𝐞𝐧𝐝\mathbf{{end}}


1.3 Preconditioned Generalized Minimum Residual Method [1]

The Generalized Minimum Residual (GMRES(m)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. GMRES(m) is restarted periodically after mm iterations if it did not converge within mm iterations, the solution obtained until mm iterations will be used as an input for the new restart cycle. PARCEL provides variants of GMRES with the Classical Gram-Schmidt (sometimes referred to as standard Gram-Schmidt) and Modified Gram-Schmidt (MGS) orthogonalization methods.

𝐇𝐧\mathbf{H}_{\mathbf{n}} is an upper triangular matrix with 𝐡𝐣,𝐤\mathbf{h}_{\mathbf{j,k}} representing its elements, 𝐞𝐢\mathbf{\ }\mathbf{e}_{\mathbf{i}} represents a vector consisting of the first i elements of the vector 𝐞\mathbf{e}.

  1. 𝐟𝐨𝐫𝐣=0,1,2,3\mathbf{for\ j = 0,1,2,3}\mathbf{\ldots}
  2.   𝐫=𝐛𝐀𝐱0\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}
  3.   𝐯1=𝐫/𝐫\mathbf{v}_{\mathbf{1}}\mathbf{= - r\ /\ }\left\| \mathbf{r} \right\|
  4.   𝐞=(𝐫,0,,0)𝐓\mathbf{e =}\left( \mathbf{-}\left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}
  5.   𝐧=𝐦\mathbf{n = m}
  6.   𝐟𝐨𝐫𝐢=1,2,3𝐦\mathbf{for\ i = 1,2,3}\mathbf{\ldots}\mathbf{m}
  7.      𝐬𝐨𝐥𝐯𝐞𝐯̂𝐢𝐟𝐫𝐨𝐦𝐊𝐯̂𝐢=𝐯𝐢\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{i}}\mathbf{=}\mathbf{v}_{\mathbf{i}}
  8.      𝛚=𝐀𝐯̂𝐢\mathbf{\omega = A}{\widehat{\mathbf{v}}}_{\mathbf{i}}
  9.      𝐟𝐨𝐫𝐤=1,2,3𝐢\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i}
  10.       𝐡𝐤,𝐢=𝛚,𝐯𝐤\mathbf{h}_{\mathbf{k,i}}\mathbf{=}\left\langle \mathbf{\omega,}\mathbf{v}_{\mathbf{k}} \right\rangle
  11.       𝛚=𝛚𝐡𝐤,𝐢𝐯𝐤\mathbf{\omega = \omega -}\mathbf{h}_{\mathbf{k,i}}\mathbf{v}_{\mathbf{k}}
  12.     𝐞𝐧𝐝\mathbf{{end}}
  13.     𝐡𝐢+1,𝐢=𝛚\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{=}\left\| \mathbf{\omega} \right\|
  14.     𝐯𝐢+1=𝛚/𝛚\mathbf{v}_{\mathbf{i + 1}}\mathbf{= \omega/}\left\| \mathbf{\omega} \right\|
  15.     𝐟𝐨𝐫𝐤=1,2,3𝐢1\mathbf{for\ k = 1,2,3}\mathbf{\ldots}\mathbf{i - 1}
  16.       (𝐡𝐤,𝐢𝐡𝐤+1,𝐢)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝐡𝐤,𝐢𝐡𝐤+1,𝐢)\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{h}_{\mathbf{k,i}} \\ \mathbf{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}
  17.     𝐞𝐧𝐝\mathbf{{end}}
  18.     𝐜𝐢=11+(𝐡𝐢+1,𝐢𝐡𝐢,𝐢)2\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  19.     𝐬𝐢=𝐡𝐢+1,𝐢𝐡𝐢,𝐢11+(𝐡𝐢+1,𝐢𝐡𝐢,𝐢)2\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathbf{h}_{\mathbf{i + 1,i}}}{\mathbf{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  20.     (𝐞𝐢𝐞𝐢+1)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝐞𝐢𝐞𝐢+1)\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{e}_{\mathbf{i}} \\ \mathbf{e}_{\mathbf{i + 1}} \\ \end{pmatrix}
  21.     𝐡𝐢,𝐢=𝐜𝐢𝐡𝐢,𝐢𝐬𝐢𝐡𝐢+1,𝐢\mathbf{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathbf{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathbf{h}_{\mathbf{i + 1,i}}
  22.     𝐡𝐢+1,𝐢=0\mathbf{h}_{\mathbf{i + 1,i}}\mathbf{= 0}
  23.     𝐢𝐟𝐞𝐢+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{e}_{\mathbf{i + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ }}\mathbf{{then}}
  24.       𝐧=𝐢\mathbf{n = i}
  25.       𝐞𝐱𝐢𝐭\mathbf{{exit}}
  26.     𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  27.   𝐞𝐧𝐝\mathbf{{end}}
  28.   𝐲𝐧=𝐇𝐧1𝐞𝐧\mathbf{y}_{\mathbf{n}}\mathbf{=}\mathbf{H}_{\mathbf{n}}^{\mathbf{- 1}}\mathbf{e}_{\mathbf{n}}
  29.   𝐬𝐨𝐥𝐯𝐞𝐱̂𝐟𝐫𝐨𝐦𝐊𝐱̂=𝐤=1𝐧𝐲𝐤𝐯𝐤\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 1}}^{\mathbf{n}}{\mathbf{y}_{\mathbf{k}}\mathbf{v}_{\mathbf{k}}}
  30.   𝐱𝐧=𝐱0+𝐱̂\mathbf{x}_{\mathbf{n}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}
  31.   𝐱0=𝐱𝐧\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{n}}
  32. 𝐞𝐧𝐝\mathbf{{end}}


1.4 Preconditioned Communication Avoiding Generalized Minimum Residual Method [2,3]

The Communication Avoiding Generalized Minimum Residual (CA-GMRES(s,t)) method is an iterative Krylov subspace algorithm for solving nonsymmetric matrices. The calculation which is done in ss iterations in the GMRES(m) method is equivalent to one iteration of the CA-GMRES method. CA-GMRES(s,t) is restarted periodically after tt iterations if it did not converge within tt iterations, the solution obtained until mm iterations will be used as an input for the new restart cycle. CA-GMRES(s,t) is equivalent to GMRES(m) if ss and tt are chosen so that m=t×sm = t \times s. The convergence property of CA-GMRES(s,t) is the same as GMRES(m) in exact arithmetic. In addition, the number of global collective communication calls is reduced by communication avoiding QR factorization. As CA-GMRES(s,t) produces ss basis vectors at once, when ss is too large, the linear independency of the basic vectors may become worse because of round off errors, leading to worse convergence property than GMRES(m). In order to improve the orthogonality of the basic vectors, the PARCEL implementation of CA-GMRES(s,t) provides an option to use the Newton basis in addition to the monomial basis.

𝐞\mathbf{e} represents a unit vector, 𝛒̃𝐤=𝐑𝐤{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}} represents the ss-th row and ss-th column element of the matrix 𝐑𝐤\mathbf{R}_{\mathbf{k}}, and 𝐄\mathbf{E} represents the eigenvalues obtained from the Hessenberg matrix, which is generated by the iteration process.

  1. 𝐁𝐤=[𝐞2,𝐞3,,𝐞𝐬+1]\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{= \lbrack}\mathbf{e}_{\mathbf{2}}\mathbf{,}\mathbf{e}_{\mathbf{3}}\mathbf{,\ldots,}\mathbf{e}_{\mathbf{s + 1}}\mathbf{\rbrack}
  2. 𝐟𝐨𝐫𝐣=0,1,2,3\mathbf{for\ j = 0,1,2,3\ldots}
  3.   𝐫=𝐛𝐀𝐱0\mathbf{r = b - A}\mathbf{x}_{\mathbf{0}}
  4.   𝐯1=𝐫/𝐫\mathbf{v}_{\mathbf{1}}\mathbf{= r\ /\ }\left\| \mathbf{r} \right\|
  5.   𝛇=(𝐫,0,,0)𝐓\mathbf{\zeta =}\left( \left\| \mathbf{r} \right\|\mathbf{,0,\ldots,0} \right)^{\mathbf{T}}
  6.   𝐟𝐨𝐫𝐤=0,1,𝐭1\mathbf{for\ k = 0,1\ldots,t - 1}
  7.     𝐅𝐢𝐱𝐛𝐚𝐬𝐢𝐬𝐜𝐨𝐧𝐯𝐞𝐫𝐬𝐢𝐨𝐧𝐦𝐚𝐭𝐫𝐢𝐱[𝐁𝐤,𝐄]\mathbf{Fix\ basis\ conversion\ matrix\ \lbrack}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,E\rbrack}
  8.     𝐂𝐨𝐦𝐮𝐩𝐮𝐭𝐞𝐛𝐚𝐬𝐢𝐜𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐕́𝐤,𝐁𝐤]\mathbf{Comupute\ \ basic\ vector\ \ \lbrack\ s,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{\ \rbrack}
  9.     𝐢𝐟(𝐤.𝐞𝐪.0)\mathbf{if(k.eq.0)}
  10.       𝐐𝐑𝐝𝐞𝐜𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧𝐕0=𝐐0𝐑0\mathbf{{QR\ decomposition\ }}\mathbf{V}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}
  11.       𝔔0=𝐐0\mathfrak{Q}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{Q}_{\mathbf{0}}^{\mathbf{\sim}}
  12.       0=𝐑0𝐁0𝐑01\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{=}\mathbf{R}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{0}}^{\mathbf{- 1}}
  13.       0=0\mathcal{H}_{\mathbf{0}}\mathbf{=}\mathfrak{H}_{\mathbf{0}}^{\mathbf{\sim}}
  14.       𝐟𝐨𝐫𝐨=1,2,𝐬\mathbf{for\ o = 1,2\ldots,s}
  15.       𝐆𝐢𝐯𝐞𝐧𝐬𝐫𝐨𝐭𝐚𝐭𝐢𝐨𝐧[𝐨,0,𝛇]\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta\rbrack}
  16.       𝐢𝐟𝛇𝐨+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}
  17.         𝐮𝐩𝐝𝐚𝐭𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐤,𝐨,𝐕́0,𝐐́0,𝐑0,0,𝛇,𝐱0]\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{Q}}}_{\mathbf{0}}^{\mathbf{\sim}}\mathbf{,}\mathbf{R}_{\mathbf{0}}\mathbf{,}\mathcal{H}_{\mathbf{0}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  18.         𝐪𝐮𝐢𝐭\mathbf{{quit}}
  19.       𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  20.     𝐞𝐥𝐬𝐞\mathbf{{else}}
  21.       ́𝐤1,𝐤=(𝔔𝐤1)𝐇𝐕́𝐤{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{=}\left( \mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}} \right)^{\mathbf{H}}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}
  22.       𝐕́𝐤=𝐕́𝐤𝔔𝐤1́𝐤1,𝐤{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{-}\mathfrak{Q}_{\mathbf{k - 1}}^{\mathbf{\sim}}{\acute{\mathfrak{R}}}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}
  23.       𝐐𝐑𝐝𝐞𝐜𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧𝐕́𝐤=𝐐́𝐤𝐑́𝐤\mathbf{{QR\ decomposition\ }}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}{\acute{\mathbf{Q}}}_{\mathbf{k}}^{\mathbf{\sim}}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}
  24.       𝐑𝐤=(𝐑𝐤𝐳𝐤01,𝐤𝛒)\mathbf{R}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathbf{R}_{\mathbf{k}} & \mathbf{z}_{\mathbf{k}} \\ \mathbf{0}_{\mathbf{1,k}} & \mathbf{\rho} \\ \end{pmatrix}
  25.       𝐤1,𝐤=𝐤1𝐤1,𝐤𝐑𝐤1+𝐤1,𝐤𝐁𝐤𝐑𝐤1\mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{= -}\mathfrak{H}_{\mathbf{k - 1}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}\mathfrak{R}_{\mathbf{k - 1,k}}^{\mathbf{\sim}}\mathbf{B}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}
  26.       𝐇𝐤=𝐑𝐤𝐁𝐤𝐑𝐤1+𝛒̃𝐤1𝐛𝐤𝐳𝐤𝐞𝐬𝐓𝐡𝐤1𝐞1𝐞𝐬(𝐤1)𝐓𝐤1,𝐤𝐑𝐤1\mathbf{H}_{\mathbf{k}}\mathbf{=}\mathbf{R}_{\mathbf{k}}\mathbf{B}_{\mathbf{k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{+}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{b}_{\mathbf{k}}\mathbf{z}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}}\mathbf{-}\mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}}\mathfrak{R}_{\mathbf{k - 1,k}}\mathbf{R}_{\mathbf{k}}^{\mathbf{- 1}}
  27.       𝐡𝐤=𝛒̃𝐤1𝛒𝐤𝐛𝐤\mathbf{h}_{\mathbf{k}}\mathbf{=}{\widetilde{\mathbf{\rho}}}_{\mathbf{k}}^{\mathbf{- 1}}\mathbf{\rho}_{\mathbf{k}}\mathbf{b}_{\mathbf{k}}
  28.       𝐤=(𝐤1𝐤1,𝐤𝐡𝐤1𝐞1𝐞𝐬(𝐤1)𝐓01,𝐬(𝐤1)𝐇𝐤𝐡𝐤𝐞𝐬𝐓)\mathfrak{H}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1}} & \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \begin{matrix} \mathbf{h}_{\mathbf{k - 1}}\mathbf{e}_{\mathbf{1}}\mathbf{e}_{\mathbf{s(k - 1)}}^{\mathbf{T}} \\ \mathbf{0}_{\mathbf{1,s(k - 1)}} \\ \end{matrix} & \begin{matrix} \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{matrix} \\ \end{pmatrix}
  29.       𝐤=(𝐤1,𝐤𝐇𝐤𝐡𝐤𝐞𝐬𝐓)\mathcal{H}_{\mathbf{k}}\mathbf{=}\begin{pmatrix} \mathfrak{H}_{\mathbf{k - 1,k}}^{\mathbf{\sim}} \\ \mathbf{H}_{\mathbf{k}} \\ \mathbf{h}_{\mathbf{k}}\mathbf{e}_{\mathbf{s}}^{\mathbf{T}} \\ \end{pmatrix}
  30.       𝐟𝐨𝐫𝐨=1+𝐬𝐤,,𝐬(𝐤+1)\mathbf{for\ o = 1 + sk,\ldots,s(k + 1)}
  31.         𝐆𝐢𝐯𝐞𝐧𝐬𝐫𝐨𝐭𝐚𝐭𝐢𝐨𝐧[𝐨,𝐤,𝛇]\mathbf{Givens\ rotation\ \lbrack o,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta\rbrack}
  32.         𝐢𝐟𝛇𝐨+1/𝐛𝐬𝐦𝐚𝐥𝐥𝐞𝐧𝐨𝐮𝐠𝐡𝐭𝐡𝐞𝐧\mathbf{{if\ }}\left\| \mathbf{\zeta}_{\mathbf{o + 1}} \right\|\mathbf{\ /\ }\left\| \mathbf{b} \right\|\mathbf{{\ small\ enough\ then}}
  33.           𝐮𝐩𝐝𝐚𝐭𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐤,𝐨,𝐕́𝐤,𝔔𝐤,𝐑́𝐤,𝐤,𝛇,𝐱0]\mathbf{update\ \ solution\ \ vector\lbrack s,k,o,}{\acute{\mathbf{V}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{k}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{k}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  34.           𝐪𝐮𝐢𝐭\mathbf{{quit}}
  35.         𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  36.       𝐞𝐧𝐝\mathbf{{end}}
  37.     𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  38.   𝐞𝐧𝐝\mathbf{{end}}
  39.   𝐮𝐩𝐝𝐚𝐭𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐯𝐞𝐜𝐭𝐨𝐫[𝐬,𝐭1,𝐬𝐭,𝐕́𝐭1,𝔔𝐭1,𝐑́𝐭1,𝐭1,𝛇,𝐱0]\mathbf{update\ \ solution\ \ vector\lbrack s,t - 1,st,}{\acute{\mathbf{V}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathfrak{Q}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}{\acute{\mathbf{R}}}_{\mathbf{t - 1}}^{\mathbf{\sim}}\mathbf{,}\mathcal{H}_{\mathbf{t - 1}}\mathbf{,\zeta,}\mathbf{x}_{\mathbf{0}}\mathbf{\ \rbrack}
  40. 𝐞𝐧𝐝\mathbf{{end}}

  1. 𝐢𝐟𝐜𝐨𝐦𝐩𝐮𝐭𝐞𝐍𝐞𝐰𝐭𝐨𝐧𝐁𝐚𝐬𝐢𝐬\mathbf{{if\ compute\ Newton\ Basis}}
  2.   𝐢=0\mathbf{i = 0}
  3.   𝐰𝐡𝐢𝐥𝐞(𝐢𝐬1)\mathbf{{while}}\left( \mathbf{i \leq s - 1} \right)
  4.     𝐢𝐟(𝐢.𝐞𝐪.𝐬1)𝐭𝐡𝐞𝐧\mathbf{{if}}\left( \mathbf{i.eq.s - 1} \right)\mathbf{{then}}
  5.       𝐁𝐢,𝐢=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  6.     𝐞𝐥𝐬𝐞\mathbf{{else}}
  7.       𝐢𝐟(𝐈𝐦[𝐄𝐢].𝐞𝐪.0)𝐭𝐡𝐞𝐧\mathbf{{if}}\left( \mathbf{{Im}}\left\lbrack \mathbf{E}_{\mathbf{i}} \right\rbrack\mathbf{.eq.0} \right)\mathbf{{then}}
  8.         𝐁𝐢,𝐢=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  9.       𝐞𝐥𝐬𝐞\mathbf{{else}}
  10.         𝐁𝐢,𝐢=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i,i}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  11.         𝐁𝐢+1,𝐢+1=𝐑𝐞[𝐄𝐢]\mathbf{B}_{\mathbf{i + 1,i + 1}}\mathbf{= Re\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack}
  12.         𝐁𝐢,𝐢+1=(𝐈𝐦[𝐄𝐢])2\mathbf{B}_{\mathbf{i,i + 1}}\mathbf{= -}\left( \mathbf{Im\lbrack}\mathbf{E}_{\mathbf{i}}\mathbf{\rbrack} \right)^{\mathbf{2}}
  13.         𝐢=𝐢+1\mathbf{i = i + 1}
  14.       𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  15.     𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  16.     𝐢=𝐢+1\mathbf{i = i + 1}
  17.   𝐞𝐧𝐝\mathbf{{end\ \ }}
  18. 𝐞𝐧𝐝\mathbf{{end\ \ }}

The elements of the matrix 𝐁\mathbf{B} are represented as 𝐛𝐤,𝐢\mathbf{b}_{\mathbf{k,i}}

  1. 𝐟𝐨𝐫𝐤=1,2,3𝐬\mathbf{for\ k = 1,2,3\ldots s}
  2.   𝐬𝐨𝐥𝐯𝐞𝐯̂𝐤1𝐟𝐫𝐨𝐦𝐊𝐯̂𝐤1=𝐯𝐤1\mathbf{{solve\ }}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{{\ from\ K}}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{=}\mathbf{v}_{\mathbf{k - 1}}
  3.   𝐢𝐟(𝐤.𝐧𝐞.1)𝐭𝐡𝐞𝐧\mathbf{if(k.ne.1)then}
  4.     𝛂=𝐛𝐤1,𝐤1\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}
  5.     𝛃=𝐛𝐤2,𝐤1\mathbf{\beta =}\mathbf{b}_{\mathbf{k - 2,k - 1}}
  6.     𝐯𝐤=𝐀𝐯̂𝐤1𝛂𝐯𝐤1+𝛃𝐯𝐤2\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}\mathbf{+ \beta}\mathbf{v}_{\mathbf{k - 2}}
  7.   𝐞𝐥𝐬𝐞\mathbf{{else}}
  8.     𝛂=𝐛𝐤1,𝐤1\mathbf{\alpha =}\mathbf{b}_{\mathbf{k - 1,k - 1}}
  9.     𝐯𝐤=𝐀𝐯̂𝐤1𝛂𝐯𝐤1\mathbf{v}_{\mathbf{k}}\mathbf{= A}{\widehat{\mathbf{v}}}_{\mathbf{k - 1}}\mathbf{- \alpha}\mathbf{v}_{\mathbf{k - 1}}
  10.   𝐞𝐧𝐝𝐢𝐟\mathbf{{endif}}
  11. 𝐞𝐧𝐝\mathbf{{en}}\mathbf{d}

The elements of the matrix \mathcal{H} are represented as 𝒽𝐤,𝐢\mathcal{h}_{\mathbf{k,i}}.

  1. 𝐟𝐨𝐫𝐤=1,2,3𝐢1\mathbf{for\ k = 1,2,3\ldots i - 1}
  2.   (𝒽𝐤,𝐢𝒽𝐤+1,𝐢)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝒽𝐤,𝐢𝒽𝐤+1,𝐢)\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathcal{h}_{\mathbf{k,i}} \\ \mathcal{h}_{\mathbf{k + 1,i}} \\ \end{pmatrix}
  3. end\mathbf{\text{end}}
  4. 𝐜𝐢=11+(𝒽𝐢+1,𝐢𝒽𝐢,𝐢)2\mathbf{c}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  5. 𝐬𝐢=𝒽𝐢+1,𝐢𝒽𝐢,𝐢11+(𝒽𝐢+1,𝐢𝒽𝐢,𝐢)2\mathbf{s}_{\mathbf{i}}\mathbf{= -}\frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}}\frac{\mathbf{1}}{\sqrt{\mathbf{1 +}\left( \frac{\mathcal{h}_{\mathbf{i + 1,i}}}{\mathcal{h}_{\mathbf{i,i}}} \right)^{\mathbf{2}}}}
  6. (𝛇𝐢𝛇𝐢+1)=(𝐜𝐢𝐬𝐢𝐬𝐢𝐜𝐢)(𝛇𝐢𝛇𝐢+1)\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}\mathbf{=}\begin{pmatrix} \mathbf{c}_{\mathbf{i}} & \mathbf{- s}_{\mathbf{i}} \\ \mathbf{s}_{\mathbf{i}} & \mathbf{c}_{\mathbf{i}} \\ \end{pmatrix}\begin{pmatrix} \mathbf{\zeta}_{\mathbf{i}} \\ \mathbf{\zeta}_{\mathbf{i + 1}} \\ \end{pmatrix}
  7. 𝒽𝐢,𝐢=𝐜𝐢𝒽𝐢,𝐢𝐬𝐢𝒽𝐢+1,𝐢\mathcal{h}_{\mathbf{i,i}}\mathbf{=}\mathbf{c}_{\mathbf{i}}\mathcal{h}_{\mathbf{i,i}}\mathbf{- \ }\mathbf{s}_{\mathbf{i}}\mathcal{h}_{\mathbf{i + 1,i}}
  8. 𝒽𝐢+1,𝐢=0\mathcal{h}_{\mathbf{i + 1,i}}\mathbf{= 0}

  1. 𝐲=1𝛇\mathbf{y =}\mathcal{H}^{\mathbf{- 1}}\mathbf{\zeta}
  2. 𝐬𝐨𝐥𝐯𝐞𝐱̂𝐟𝐫𝐨𝐦𝐊𝐱̂=𝐤=0𝐬𝐦1𝐐𝐤𝐲𝐤+𝐥=𝐬𝐦𝐧1𝐕𝐥𝐬𝐦𝐤=𝐬𝐦𝐧1𝐑𝐥𝐬𝐦,𝐤𝐬𝐦1𝐲𝐤\mathbf{{solve\ }}\widehat{\mathbf{x}}\mathbf{{\ from\ K}}\widehat{\mathbf{x}}\mathbf{=}\sum_{\mathbf{k = 0}}^{\mathbf{s}\mathbf{m}\mathbf{- 1}}\mathbf{Q}_{\mathbf{k}}\mathbf{y}_{\mathbf{k}}\mathbf{+}\sum_{\mathbf{l =}\mathbf{{sm}}}^{\mathbf{n - 1}}\mathbf{V}_{\mathbf{l}\mathbf{- sm}}\sum_{\mathbf{k =}\mathbf{{sm}}}^{\mathbf{n - 1}}{\mathbf{R}_{\mathbf{l - sm}\mathbf{,}\mathbf{k}\mathbf{- sm}}^{\mathbf{- 1}}\mathbf{y}_{\mathbf{k}}}
  3. 𝐱0=𝐱0+𝐱̂\mathbf{x}_{\mathbf{0}}\mathbf{=}\mathbf{x}_{\mathbf{0}}\mathbf{+}\widehat{\mathbf{x}}


2 Preconditioning

In iterative algorithms, the application of a preconditioner of the form KAK \approx A can help to reduce the number of iterations until convergence. The notation [solvep̂fromKp̂=p{{solve\ }}\widehat{{p}}{{\ from\ K}}\widehat{{p}}{= \ }{p}] means [approximatelysolveAp̂=papproximately solve {A}\widehat{p}{= \ }{p} with respect to the vector p̂\widehat{p}]. Although the more accurate approximation leads to the less number of iterations until convergence, the cost of preconditioner is normally increased. In non-parallel computing, one of the most common preconditioners is the incomplete LU factorization (ILU). However, in parallel computing, parallel preconditioners are needed. PARCEL provides the following parallel preconditioners: Point Jacobi, Block Jacobi and Additive Schwarz.


2.1 Point Jacobi Preconditioner [1]

The point Jacobi preconditioner is one of the most simplest preconditioners. KK only consists of the diagonal elements of the matrix AA. Compared to other preconditioners the efficiency of the point Jacobi preconditioner is very low. However, it can be easily applied in parallel and it does not require any additional communication between processors.


2.2 Zero Fill-in Incomplete LU Factorization Preconditioners (ILU(0)) [1]

LU factorization decompose a square matrix AA into a lower triangular matrix LL and an upper triangular matrix UU. For a typical sparse matrix, the LU factors can be much less sparse than the original matrix, which is called as fill-in. This increases the memory and computational requirements. In order to avoid this issue, PARCEL provides the zero fill-in incomplete LU factorization (ILU(0)).

2.3 Diagonal Incomplete LU Factorization Preconditioners (D-ILU) [1]

The diagonal incomplete LU factorization (D-ILU) computes a diagonal matrix DD, a lower triangular matrix LL and an upper triangular matrix UU for the input matrix AA. A=LA+DA+UAA = L_{A} + D_{A} + U_{A} The preconditioner KK is constructed with LAL_{A}, UAU_{A} and DD (DDAD \neq D_{A}) with: K=(D+LA)D1(D+UA)K = \left( D + L_{A} \right)D^{- 1}\left( D + U_{A} \right) Two different conditions exist in order to construct the diagonal matrix DD:

Given the conditions above DD can be computed as follows:


2.4 Block Jacobi Preconditioner [1]

The Block Jacobi Preconditioner constructs KK out of the diagonal blocks of the matrix AA. For each block, the incomplete ILU factorization is computed. When the preconditioner is applied, each block can be processed independently, resulting in a high level of parallelism. In addition, no communication is needed when each block is defined within a sub-matrix on each processor.


2.5 Jacobi Preconditioning [1]

 Jacobi preconditioning is a type of preconditioner that approximately solves the system using the following iterative method. Since there are no dependencies within a single iteration, it is suitable for GPU-based computation. p̂k+1=p̂k+D1(pAp̂k){\hat{p}}_{k+1}={\hat{p}}_k+D^{-1}\left(p-A{\hat{p}}_k\right)


3 QR factorization

This section describes the QR decomposition algorithms that can be used in this library.


3.1 Classical Gram-Schmidt (CGS) Method[1]

The QR factorization by the Gram-Schmidt orthogonalization. This algorithm has High parallelism but poor orthogonality.

  1. 𝐟𝐨𝐫𝐢=1,,𝐧𝐝𝐨\mathbf{f}\mathbf{or\ i = 1,\ldots,n\ do}
  2.   𝐟𝐨𝐫𝐤=1,𝐢1𝐝𝐨\mathbf{for\ k = 1,i - 1\ do}
  3.     𝐑𝐤,𝐢=<𝐐𝐤,𝐕𝐢>\mathbf{R}_{\mathbf{k,i}}\mathbf{= <}\mathbf{Q}_{\mathbf{k}}\mathbf{,}\mathbf{V}_{\mathbf{i}}\mathbf{>}
  4.   𝐞𝐧𝐝𝐝𝐨\mathbf{{enddo\ }}
  5.   𝐟𝐨𝐫𝐤=1,𝐢1𝐝𝐨\mathbf{for\ k = 1,i - 1\ do}
  6.     𝐕𝐢=𝐕𝐢𝐑𝐤,𝐢𝐐𝐤\mathbf{V}_{\mathbf{i}}\mathbf{=}\mathbf{V}_{\mathbf{i}}\mathbf{-}\mathbf{R}_{\mathbf{k,i}}\mathbf{Q}_{\mathbf{k}}
  7.     𝐞𝐧𝐝𝐝𝐨\mathbf{{enddo\ }}
  8.   𝐑𝐢,𝐢=<𝐕𝐢,𝐕𝐢>\mathbf{R}_{\mathbf{i,i}}\mathbf{= <}\mathbf{V}_{\mathbf{i}}\mathbf{,}\mathbf{V}_{\mathbf{i}}\mathbf{>}
  9.   𝐐𝐢=1𝐕𝐢𝐕𝐢\mathbf{Q}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\left\| \mathbf{V}_{\mathbf{i}} \right\|}\mathbf{V}_{\mathbf{i}}
  10. 𝐞𝐧𝐝𝐝𝐨\mathbf{{enddo\ }}


3.2 Modified Gram-Schmidt (MGS) Method[1]

The QR factorization with a modified algorithm of the classical Gram-Schmidt method to reduce the error. This algorithm improves the orthogonality from the classical Gram-Schmidt method, but requires more collective communication.

  1. 𝐟𝐨𝐫𝐢=1,,𝐧𝐝𝐨\mathbf{f}\mathbf{or\ i = 1,\ldots,n\ do}
  2.   𝐟𝐨𝐫𝐤=1,𝐢1𝐝𝐨\mathbf{for\ k = 1,i - 1\ do}
  3.     𝐑𝐤,𝐢=<𝐐𝐤,𝐕𝐢>\mathbf{R}_{\mathbf{k,i}}\mathbf{= <}\mathbf{Q}_{\mathbf{k}}\mathbf{,}\mathbf{V}_{\mathbf{i}}\mathbf{>}
  4.     𝐕𝐢=𝐕𝐢𝐑𝐤,𝐢𝐐𝐤\mathbf{V}_{\mathbf{i}}\mathbf{=}\mathbf{V}_{\mathbf{i}}\mathbf{-}\mathbf{R}_{\mathbf{k,i}}\mathbf{Q}_{\mathbf{k}}
  5.   𝐞𝐧𝐝𝐝𝐨\mathbf{{enddo\ }}
  6.   𝐑𝐢,𝐢=<𝐕𝐢,𝐕𝐢>\mathbf{R}_{\mathbf{i,i}}\mathbf{= <}\mathbf{V}_{\mathbf{i}}\mathbf{,}\mathbf{V}_{\mathbf{i}}\mathbf{>}
  7.   𝐐𝐢=1𝐕𝐢𝐕𝐢\mathbf{Q}_{\mathbf{i}}\mathbf{=}\frac{\mathbf{1}}{\left\| \mathbf{V}_{\mathbf{i}} \right\|}\mathbf{V}_{\mathbf{i}}
  8. 𝐞𝐧𝐝𝐝𝐨\mathbf{{enddo\ }}


3.3 Cholesky QR Method[7]

The Cholesky QR factoriztion consists of a matrix product and the Cholesky factorization. This algorithm has high computational intensity and can be computed with one collective communication. However, its orthogonality is poor.

  1.  𝐁=𝐕𝐓𝐕\mathbf{B =}\mathbf{V}^{\mathbf{T}}\mathbf{V}
  2.  𝐂𝐡𝐨𝐥𝐞𝐬𝐤𝐲𝐝𝐞𝐜𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧(𝐁)\mathbf{Cholesky\ decomposition(B)}
  3.  𝐐=𝐕𝐑1\mathbf{Q = V}\mathbf{R}^{\mathbf{- 1}}


4 Sparse Matrix Data Formats

In order to save memory in sparse matrix computation, only the non-zero elements and their locations are stored. PARCEL supports different sparse matrix formats.


4.1 Compressed Row Storage (CRS) Format

The CRS format compresses the row indices of the non-zero elements of the sparse matrix. Together with the compressed rows, the column index and the value for each non-zero element are stored in one dimensional arrays. Compresed row indeces are stored in an incremental order.

A=(abc00d00e0fg0hij) A = \left( \begin{array}{ccc} a & b & c & 0\\ 0 & d & 0 & 0\\ e & 0 & f & g\\ 0 & h & i & j \end{array} \right)


One process case

Values of non-zero elements :: crsA={a,b,c,d,e,f,g,h,i,j}crsA=\{a,b,c,d,e,f,g,h,i,j\}

Compressed row indices :: crsRow_ptr={1,4,5,8,11}crsRow\_ptr=\{1,4,5,8,11\}

Column indices of non-zero elements :: crsCol={1,2,3,2,1,3,4,2,3,4}crsCol=\{1,2,3,2,1,3,4,2,3,4\}


Two process case

Values of non-zero elements :: crsA={a,b,c,d}crsA=\{a,b,c,d\}

Compressed row indices :: crsRow_ptr={1,4,5}crsRow\_ptr=\{1,4,5\}

Column indices of non-zero elements :: crsCol={1,2,3,2}crsCol=\{1,2,3,2\}

Values of non-zero elements :: crsA={e,f,g,h,i,j}crsA=\{e,f,g,h,i,j\}

Compressed row indices :: crsRow_ptr={1,4,7}crsRow\_ptr=\{1,4,7\}

Column indices of non-zero elements :: crsCol={1,3,4,2,3,4}crsCol=\{1,3,4,2,3,4\}



5 Installation

This section explains how to install the SYCL version of PARCEL.
You can use it by including the header file ./Include/parcel_sycl.hpp and compiling your program with a SYCL-compatible compiler. Examples of compilation on SGI8600 (Intel Xeon Gold 6248R + NVIDIA V100) are shown below.

  module purge
  module load gnu/9.5.0
  module load openmpi/cur
  . ~/spack/share/spack/setup-env.sh
  spack load hipsycl
  export sycl_root=`spack location -i hipsycl`
  export llvm_root=`spack location -i llvm`
  export LD_LIBRARY_PATH=$sycl_root/lib/:$llvm_root/lib/:${LD_LIBRARY_PATH}
  export MPICXX_CXX=${compiler_base}
  export OMPI_CXX=${compiler_base}
  src_name=sample.cpp
  compiler_base="acpp"
  compiler_option=" -O3 --acpp-targets=omp -mlong-double-128  -march=native -mtune=native  -Wpass-failed=transform-warning"
  compiler=mpicxx
  ${compiler} ${compiler_option} -o a.out ${src_name}

  module purge
  module load gnu/9.5.0
  module load cuda/12.6
  module load openmpi/cur
  . ~/spack/share/spack/setup-env.sh
  spack load hipsycl
  export sycl_root=`spack location -i hipsycl`
  export llvm_root=`spack location -i llvm`
  export cuda_root=/home/app/cuda/12.6
  export ACPP_CUDA_PATH=$cuda_root
  export LD_LIBRARY_PATH=$sycl_root/lib/:$llvm_root/lib/:$cuda_root/lib64:${LD_LIBRARY_PATH}
  export MPICXX_CXX=${compiler_base}
  export OMPI_CXX=${compiler_base}
  src_name=sample.cpp
  compiler_base="acpp"
  compiler_option=" -O3 --acpp-targets=cuda:sm_70 -march=native -mtune=native -v"
  compiler=mpicxx
  ${compiler} ${compiler_option} -o a.out ${src_name}

6 How to Install a SYCL-Compatible Compiler

6.1 AdaptiveCpp

AdaptiveCpp can be installed from source code. For more details, please refer to the following page: https://github.com/AdaptiveCpp/AdaptiveCpp

One can also use spack to install AdaptiveCpp. Examples on SGI8600 (Intel Xeon Gold 6248R + NVIDIA V100) are shown below.

  module purge
  module load gnu/9.5.0
  module load openmpi/cur
  . ~/spack/share/spack/setup-env.sh
  spack compiler find
  spack external find
  spack install llvm@18
  spack install hipsycl % llvm@18

When the CPU architectures differ between login nodes and compute nodes, as in Fugaku or Wisteria Odyssey, the installation should be performed on the compute nodes.

  module purge
  module load gnu/9.5.0
  module load cuda/12.6
  module load openmpi/cur
  . ~/spack/share/spack/setup-env.sh
  spack compiler find
  spack external find
  spack external find cuda
  spack install llvm@18
  spack install hipsycl +cuda % llvm@18


7 SYCL Version of the PARCEL Class

This section describes the classes provided in the SYCL version of PARCEL.

7.1 PARCEL Data Types

The SYCL version of PARCEL uses the following data types (parceltype):


7.2 parcel::comm

An MPI communicator class for the SYCL version of PARCEL.

*Syntax

Definition Arguments
parcel::comm(MPI_Comm comm, sycl::queue* q) class parcel_comm; comm: MPI communicator
q: pointer to a sycl::queue


7.3 parcel::matrix

Class for Sparse Matrix Manipulation in SYCL-based PARCEL

Definition Arguments
template<
class parceltype
> matrix(sycl::queue* q) class parcel_matrix;
parceltype:Data precision of the CRS matrix
q:Pointer to a sycl::queue
Function Name Description Arguments
set_crsMat<T>(comm,nstart,nend,n,nnz,rowptr,crsCol,mem) Define a CRS-format matrix for MPI parallelization T:Floating-point type
parcel::comm* comm:MPI communicator for the SYCL version of PARCEL
PARCEL_INT nstart:Starting row of the matrix handled by each process
PARCEL_INT nend:Ending row of the matrix handled by each process
size_t n:Size of the vector handled by each process
size_t nnz:Number of non-zero elements in the coefficient matrix handled by each process
PARCEL_INT* rowptr:Pointer table in CRS format
PARCEL_INT* crsCol:Column indices of non-zero elements in the partitioned matrix stored in CRS format
T* mem:Non-zero elements of the coefficient matrix


7.4 parcel::vector

A class for manipulating vectors in the SYCL version of PARCEL.

Definition Arguments
template vector(size_t n, sycl::queue* q) parceltype: Data precision of the vector
n: Number of array elements
q: Pointer to a sycl::queue
Function Name Description Arguments
queue() Retrieves the sycl::queue pointer
resize(n, q) Changes the number of array elements and the sycl::queue size_t n: Number of array elements
sycl::queue* q: Pointer to a sycl::queue
setValue(mem) Copies array elements T: Floating-point type
T* mem: Pointer to array elements
setValue_hi(mem) Copies array elements to the high part of a double-double array double* mem: Pointer to array elements
setValue_lo(mem) Copies array elements to the low part of a double-double array double* mem: Pointer to array elements
getValue(mem) Retrieves array elements T: Floating-point type
T* mem: Pointer to array elements
getValue(mem_hi, mem_lo) Retrieves double-double array elements T: Floating-point type
T* mem_hi: Pointer to high part
T* mem_lo: Pointer to low part
getValue_hi(T* mem) Retrieves the high part of double-double array elements T: Floating-point type
T* mem: Pointer to array elements
getValue_lo(T* mem) Retrieves the low part of double-double array elements T: Floating-point type
T* mem: Pointer to array elements
size() Retrieves the number of array elements
data() Retrieves the pointer to the beginning of the array
data_hi() Retrieves the pointer to the beginning of the high part of a double-double array
data_lo() Retrieves the pointer to the beginning of the low part of a double-double array


7.5 parcel::krylov::pcg

A class for the Preconditioned Conjugate Gradient (PCG) method.

Definition Arguments
template<
class Tspmv = parcel_double,
class Tprecon = parcel_double,
class Taxpy = parcel_double,
class Tdot = parcel_double
> class pcg;
parceltype Tspmv: Precision for SpMV computation
parceltype Tprecon: Precision for preconditioning
parceltype Taxpy: Precision for Axpy computation
parceltype Tdot: Precision for dot product computation
Function Name Description Arguments
solver<
Tmat,
Tvecx,
Tvecb,
Tvec,
Tprecon_mat,
Tprecon_vec
>(Mat, x, b)
Executes the Conjugate Gradient method parceltype Tmat: Precision of the coefficient matrix
parceltype Tvecx: Precision of the solution vector
parceltype Tvecb: Precision of the right-hand side vector
parceltype Tvec: Precision for internal Krylov subspace data
parceltype Tprecon_mat: Precision of the preconditioner matrix
parceltype Tprecon_vec: Precision of the preconditioner vector
parcel::matrix Mat: CRS matrix
parcel::vector x: Solution vector
parcel::vector b: Right-hand side vector
set_precon_algo(algo) Sets the preconditioner for the Krylov subspace method
(0: No preconditioner, 1: Point Jacobi, 2: Block Jacobi, 4: Jacobi)
PARCEL_INT algo
set_jacobi(iter) Sets the number of iterations for the Jacobi method PARCEL_INT iter
set_blockjacobi(nblock) Specifies the number of blocks for the Block Jacobi method PARCEL_INT nblock
set_scaling_matrix(flag) Enables or disables scaling of the preconditioner matrix
(false: disabled, true: enabled)
bool flag
set_scaling_vector(flag) Enables or disables scaling of the preconditioner vector
(false: disabled, true: enabled)
bool flag
set_tolerance(tol) Sets the convergence threshold for the relative residual norm double tol
set_max_iteration(itr) Sets the maximum number of iterations for the Krylov subspace method PARCEL_INT itr


7.6 parcel::krylov::pbicgstab

A class for the Preconditioned Bi-Conjugate Gradient Stabilized (BiCGStab) method.

Definition Arguments
template<
class Tspmv = parcel_double,
class Tprecon = parcel_double,
class Taxpy = parcel_double,
class Tdot = parcel_double
> class pbicgstab;
parceltype Tspmv: Precision for SpMV computation
parceltype Tprecon: Precision for preconditioning
parceltype Taxpy: Precision for Axpy computation
parceltype Tdot: Precision for dot product computation
Function Name Description Arguments
solver<
Tmat,
Tvecx,
Tvecb,
Tvec,
Tprecon_mat,
Tprecon_vec
>(Mat, x, b)
Executes the Preconditioned BiCGStab method parceltype Tmat: Precision of the coefficient matrix
parceltype Tvecx: Precision of the solution vector
parceltype Tvecb: Precision of the right-hand side vector
parceltype Tvec: Precision for internal Krylov subspace data
parceltype Tprecon_mat: Precision of the preconditioner matrix
parceltype Tprecon_vec: Precision of the preconditioner vector
parcel::matrix Mat: CRS matrix
parcel::vector x: Solution vector
parcel::vector b: Right-hand side vector
set_precon_algo(algo) Sets the preconditioner for the Krylov subspace method
(0: No preconditioner, 1: Point Jacobi, 2: Block Jacobi, 4: Jacobi)
PARCEL_INT algo
set_jacobi(iter) Sets the number of iterations for the Jacobi method PARCEL_INT iter
set_blockjacobi(nblock) Specifies the number of blocks for the Block Jacobi method PARCEL_INT nblock
set_scaling_matrix(flag) Enables or disables scaling of the preconditioner matrix
(false: disabled, true: enabled)
bool flag
set_scaling_vector(flag) Enables or disables scaling of the preconditioner vector
(false: disabled, true: enabled)
bool flag
set_tolerance(tol) Sets the convergence threshold for the relative residual norm double tol
set_max_iteration(itr) Sets the maximum number of iterations for the Krylov subspace method PARCEL_INT itr


7.7 parcel::krylov::pgmres

A class for the Preconditioned Generalized Minimal Residual (GMRES) method.

Definition Arguments
template<
class Tspmv = parcel_double,
class Tprecon = parcel_double,
class Taxpy = parcel_double,
class Tdot = parcel_double
> class pgmres;
parceltype Tspmv: Precision for SpMV computation
parceltype Tprecon: Precision for preconditioning
parceltype Taxpy: Precision for Axpy computation
parceltype Tdot: Precision for dot product computation
Function Name Description Arguments
solver<
Tmat,
Tvecx,
Tvecb,
Tvec,
Tprecon_mat,
Tprecon_vec,
Tgs,
Thess_mat
>(Mat, x, b)
Executes the Preconditioned GMRES method parceltype Tmat: Precision of the coefficient matrix
parceltype Tvecx: Precision of the solution vector
parceltype Tvecb: Precision of the right-hand side vector
parceltype Tvec: Precision for internal Krylov subspace data
parceltype Tprecon_mat: Precision of the preconditioner matrix
parceltype Tprecon_vec: Precision of the preconditioner vector
parceltype Tgs: Precision for Gram-Schmidt orthogonalization
parceltype Thess_mat: Precision of the Hessenberg matrix
parcel::matrix Mat: CRS matrix
parcel::vector x: Solution vector
parcel::vector b: Right-hand side vector
set_precon_algo(algo) Sets the preconditioner for the Krylov subspace method
(0: No preconditioner, 1: Point Jacobi, 2: Block Jacobi, 4: Jacobi)
PARCEL_INT algo
set_jacobi(iter) Sets the number of iterations for the Jacobi method PARCEL_INT iter
set_blockjacobi(nblock) Specifies the number of blocks for the Block Jacobi method PARCEL_INT nblock
set_scaling_matrix(flag) Enables or disables scaling of the preconditioner matrix
(false: disabled, true: enabled)
bool flag
set_scaling_vector(flag) Enables or disables scaling of the preconditioner vector
(false: disabled, true: enabled)
bool flag
set_tolerance(tol) Sets the convergence threshold for the relative residual norm double tol
set_max_iteration(itr) Sets the maximum number of iterations for the Krylov subspace method PARCEL_INT itr


7.8 parcel::krylov::pcagmres

A class for the Preconditioned Communication-Avoiding Generalized Minimal Residual (CA-GMRES) method.

Definition Arguments
template<
class Tspmv = parcel_double,
class Tprecon = parcel_double,
class Taxpy = parcel_double,
class Tdot = parcel_double
> class pcagmres;
parceltype Tspmv: Precision for SpMV computation
parceltype Tprecon: Precision for preconditioning
parceltype Taxpy: Precision for Axpy computation
parceltype Tdot: Precision for dot product computation
Function Name Description Arguments
solver<
Tmat,
Tvecx,
Tvecb,
Tvec,
Tprecon_mat,
Tprecon_vec,
Tbasis,
TQR,
Thess_mat
>(Mat, x, b)
Executes the Communication-Avoiding GMRES method parceltype Tmat: Precision of the coefficient matrix
parceltype Tvecx: Precision of the solution vector
parceltype Tvecb: Precision of the right-hand side vector
parceltype Tvec: Precision for internal Krylov subspace data
parceltype Tprecon_mat: Precision of the preconditioner matrix
parceltype Tprecon_vec: Precision of the preconditioner vector
parceltype Tbasis: Precision of the basis vectors
parceltype TQR: Precision for QR decomposition
parceltype Thess_mat: Precision of the Hessenberg matrix
parcel::matrix Mat: CRS matrix
parcel::vector x: Solution vector
parcel::vector b: Right-hand side vector
set_precon_algo(algo) Sets the preconditioner for the Krylov subspace method
(0: No preconditioner, 1: Point Jacobi, 2: Block Jacobi, 4: Jacobi)
PARCEL_INT algo
set_jacobi(iter) Sets the number of iterations for the Jacobi method PARCEL_INT iter
set_blockjacobi(nblock) Specifies the number of blocks for the Block Jacobi method PARCEL_INT nblock
set_scaling_matrix(flag) Enables or disables scaling of the preconditioner matrix
(false: disabled, true: enabled)
bool flag
set_scaling_vector(flag) Enables or disables scaling of the preconditioner vector
(false: disabled, true: enabled)
bool flag
set_tolerance(tol) Sets the convergence threshold for the relative residual norm double tol
set_max_iteration(itr) Sets the maximum number of iterations for the Krylov subspace method PARCEL_INT itr


8 Usage

This section shows a sample program using the SYCL version of PARCEL.

8.1 C++


  #include <mpi.h>
 
  #include "./Include/parcel_sycl.hpp"
  #include "./parcel_default.hpp"
 
  void run_parcel_sycl_main(
              MPI_Comm mpi_comm_c,
              int n,
              int nnz,
              double* crsA,
              int* crsCol,
              int* crsRow_ptr,
              double* vecx,
              double* vecb
               ) {
 
    sycl::device device;
    sycl::property::queue::in_order property;
    sycl::queue q{device,property};
    q = sycl::default_selector{};
 
    parcel::comm comm(mpi_comm_c,&q);
    int myrank   = comm.mpi_rank();
    int nprocess = comm.mpi_size();
 
    if(myrank == 0)std::cout << q.get_device().get_info() << std::endl;
 
    parcel::matrix parcel_matrix(&q);
    PARCEL_INT nlocal = n;
    PARCEL_INT nstart = nlocal*myrank;
    PARCEL_INT nend   = nlocal*(myrank+1)-1;
 
    q.wait();
    parcel_matrix.set_crsMat(
                     &comm,nstart,nend,
                     n,nnz,
                     crsRow_ptr,
                     crsCol,
                     crsA
                     );
    q.wait();
 
    parcel::vector vx(n,&q);
    parcel::vector vb(n,&q);
    vx.setValue(vecx);
    vb.setValue(vecb);
 
 
    parcel::krylov::pcg<
      parcel_solver_calc_type,
      parcel_solver_calc_precon_type,
      parcel_solver_calc_type,
      parcel_solver_calc_type
      > solver;
    solver.set_tolerance( 1.0e-9 );
    solver.set_max_iteration( MAX_ITERATION );
 
    PARCEL_INT precon_algo = parcel_solver_precon_algo;
    switch (precon_algo) {
    case parcel::PRECON_NONE:
      // none
      solver.set_precon_algo(parcel::PRECON_NONE);
      break;
    case parcel::PRECON_POINTJACOBI:
      // point jacobi precon
      solver.set_precon_algo(parcel::PRECON_POINTJACOBI);
      break;
    case parcel::PRECON_BLOCKJACOBI:
      // block jacobi precon
      solver.set_precon_algo(parcel::PRECON_BLOCKJACOBI);
      solver.set_scaling_matrix(true);
      solver.set_scaling_vector(true);
      solver.set_blockjacobi(BLOCKJACOBI_BLOCK);
      break;
    case parcel::PRECON_JACOBI:
      // jacobi precon
      solver.set_precon_algo(parcel::PRECON_JACOBI);
      solver.set_scaling_matrix(true);
      solver.set_scaling_vector(true);
      solver.set_jacobi(JACOBI_ITER);
      break;
    }
    q.wait();
 
    solver.solver<
      parcel_solver_data_type,
      parcel_solver_data_type,
      parcel_solver_data_type,
      parcel_solver_data_type,
      parcel_solver_data_precon_mat_type,
      parcel_solver_data_precon_vec_type
      >(parcel_matrix,vx,vb);
    vx.getValue(vecx);  q.wait();
    q.wait();
 
    return ;
  }


8.2 Fortran



  extern "C" {
    void run_parcel_sycl_f(
             int* mpi_comm_f,
             int n, int nnz,
             double* crsA,
             int* crsCol,
             int* crsRow_ptr,
             double* vecx,
             double* vecb
              ) {
      MPI_Comm mpi_comm_c = MPI_Comm_f2c(mpi_comm_f[0]);
      run_parcel_sycl_main(
             mpi_comm_c,
             n, nnz,
             crsA,
             crsCol,
             crsRow_ptr,
             vecx,
             vecb );
      return ;
    }
  }
 


make_poisson3d_crs_get_nnz and make_poisson3d_crs are subroutines that generate CRS-format matrices with support for matrix partitioning.



  module parcel_sycl_wrapper
 
    interface
       subroutine run_parcel_sycl( &
            mpi_comm_f, &
            n,nnz, &
            crsA, &
            crsCol, &
            crsRow_ptr, &
            vecx, &
            vecb &
            ) &
            bind(c,name="run_parcel_sycl_f")
         implicit none
         integer mpi_comm_f
         integer(kind=4),value :: n
         integer(kind=4),value :: nnz
         real*8 crsA(*)
         integer(kind=4) crsCol(*)
         integer(kind=4) crsRow_ptr(*)
         real*8 vecx(*)
         real*8 vecb(*)
 
       end subroutine run_parcel_sycl
    end interface
  contains
 
  end module parcel_sycl_wrapper
 
  program main
    use mpi
    use parcel_sycl_wrapper
    implicit none
    integer(kind=4) nx,ny,nz
    integer(kind=4) nx0,ny0,nz0
 
    integer(kind=4),parameter :: gnx = 16
    integer(kind=4),parameter :: gny = 16
    integer(kind=4),parameter :: gnz = 16
 
    integer(kind=4) gn,n
    integer(kind=4) nnz
    integer(kind=4) istart
 
    real*8,allocatable :: crsA(:)
    integer(kind=4),allocatable :: crsCol(:)
    integer(kind=4),allocatable :: crsRow_ptr(:)
 
    real*8,allocatable :: vecx(:)
    real*8,allocatable :: vecb(:)
 
    integer ierr
    integer npes,myrank
 
    integer i
    integer mpi_comm_f
 
    mpi_comm_f = MPI_COMM_WORLD
 
    ! MPI Init
    call MPI_Init(ierr)
    call MPI_Comm_size(mpi_comm_f,npes,ierr)
    call MPI_Comm_rank(mpi_comm_f,myrank,ierr)
 
    nx     = gnx
    ny     = gny
    nz     = gnz/npes
 
    nx0    = 1
    ny0    = 1
    nz0    = 1 + nz*myrank
 
    gn     = gnx*gny*gnz
    n      = nx * ny * nz
    istart = 1+n*myrank
 
    call make_poisson3d_crs_get_nnz( &
         nx,ny,nz,   &
         nx0,ny0,nz0,&
         gnx,gny,gnz,&
         nnz, &
         istart)
 
    allocate(crsA(nnz))
    allocate(crsCol(nnz))
    allocate(crsRow_ptr(n+1))
    allocate(vecx(n))
    allocate(vecb(n))
 
    vecx = 0.0d0
    vecb = 1.0d0
    call make_poisson3d_crs( &
         nx,ny,nz,   &
         nx0,ny0,nz0,&
         gnx,gny,gnz,&
         crsA,crsCol,crsRow_ptr,n,nnz, &
         istart)
 
    call run_parcel_sycl( &
         mpi_comm_f, &
         n,nnz, &
         crsA, &
         crsCol, &
         crsRow_ptr, &
         vecx, &
         vecb &
         )
 
    deallocate(crsA)
    deallocate(crsCol)
    deallocate(crsRow_ptr)
    deallocate(vecx)
    deallocate(vecb)
 
    call MPI_Finalize(ierr);
 
  end program main


8.3 C



  extern "C" {
    void run_parcel_sycl_c(
             MPI_Comm mpi_comm_c,
             int n, int nnz,
             double* crsA,
             int* crsCol,
             int* crsRow_ptr,
             double* vecx,
             double* vecb
              ) {
      run_parcel_sycl_main(
             mpi_comm_c,
             n, nnz,
             crsA,
             crsCol,
             crsRow_ptr,
             vecx,
             vecb );
      return ;
    }
  }


make_poisson3d_crs_get_nnz_ and make_poisson3d_crs_ are functions that generate CRS-format matrices with support for matrix partitioning.



  #include <stdio.h>
  #include <stdlib.h>
  #include <mpi.h>
 
  extern void make_poisson3d_crs_get_nnz_(
                       int* nx,
                       int* ny,
                       int* nz,
                       int* nx0,
                       int* ny0,
                       int* nz0,
                       int* gnx,
                       int* gny,
                       int* gnz,
                       int* nnz,
                       int* istart);
  extern void make_poisson3d_crs_(
                   int* nx,
                   int* ny,
                   int* nz,
                   int* nx0,
                   int* ny0,
                   int* nz0,
                   int* gnx,
                   int* gny,
                   int* gnz,
                   double* crsA,
                   int* crsCol,
                   int* crsRow_ptr,
                   int* n,
                   int* nnz0,
                   int* istart);
  extern void run_parcel_sycl_c(
               MPI_Comm mpi_comm_c,
               int n, int nnz,
               double* crsA,
               int* crsCol,
               int* crsRow_ptr,
               double* vecx,
               double* vecb
            );
 
  int main(int argc, char* argv[]) {
 
    int gnx = 16;
    int gny = 16;
    int gnz = 16;
 
    MPI_Init(&argc, &argv);
    MPI_Comm mpi_comm_c = MPI_COMM_WORLD;
    int npes,myrank;
    MPI_Comm_size(MPI_COMM_WORLD,&npes);
    MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
 
    int nx     = gnx;
    int ny     = gny;
    int nz     = gnz/npes;
 
    int nx0    = 1;
    int ny0    = 1 ;
    int nz0    = 1 + nz*myrank;
 
    int gn     = gnx*gny*gnz;
    int n      = nx * ny * nz;
    int istart = 1+n*myrank;
    int nnz;
 
    make_poisson3d_crs_get_nnz_(
         &nx,&ny,&nz,
         &nx0,&ny0,&nz0,
         &gnx,&gny,&gnz,
         &nnz,
         &istart);
 
    double* crsA = (double*)malloc(sizeof(double)*nnz);
    int* crsCol = (int*)malloc(sizeof(int)*nnz);
    int* crsRow_ptr = (int*)malloc(sizeof(int)*(n+1));
    double* vecx = (double*)malloc(sizeof(double)*n);
    double* vecb = (double*)malloc(sizeof(double)*n);
 
    make_poisson3d_crs_(
             &nx,&ny,&nz,
             &nx0,&ny0,&nz0,
             &gnx,&gny,&gnz,
             crsA,crsCol,crsRow_ptr,
             &n,&nnz,
             &istart);
    for(int i = 0;i<n;i++){
      vecx[i] = 0.0;
      vecb[i] = 1.0;
    }
 
    run_parcel_sycl_c(
            mpi_comm_c,
            n,nnz,
            crsA,
            crsCol,
            crsRow_ptr,
            vecx,
            vecb
             );
 
    free(crsA);
    free(crsCol);
    free(crsRow_ptr);
    free(vecx);
    free(vecb);
 
    MPI_Finalize();
 
    return 0;
  }



9 REFERENCES

[1]R. Barret, M. Berry, T. F. Chan, et al., “Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods”, SIAM(1994)
[2]M. Hoemmen, “Communication-avoiding Krylov subspace methods”. Ph.D.thesis, University of California, Berkeley(2010)
[3]Y. Idomura, T. Ina, A. Mayumi, et al., “Application of a communication-avoiding generalized minimal residual method to a gyrokinetic five dimensional eulerian code on many core platforms”,ScalA 17:8th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems, pp.1-8, (2017).
[4] A. Stathopoulos, K. Wu, “A block orthogonalization procedure with constant synchronization requirements”. SIAM J. Sci. Comput. 23, 2165–2182 (2002)