Matrix Solvers

Direct vs. Iterative Matrix Solvers. Direct solvers compute the final solution within a finite number of steps. An example of a direct solver algorithm is to compute the inverse of the sparse matrix and then multiply it by the right-hand-side to obtain the solution vector. However, in practice the inverse is almost never computed. Other more commonly used approaches for a direct solver are Gaussian elimination, and various type of decompositions or factorizations.

Direct solvers theoretically give the exact solution in a finite number of steps. Direct solvers factor the coefficient matrix into two triangular matrices and then perform and forward and backward triangular solves. This makes estimating the computational time for direct solvers relatively predictable. One drawback of direct solvers is that for large problems round-off errors can lead to erroneous results. Round-off errors from one time step can propagate into subsequent time steps leading to solution creep (divergence). In addition direct solvers cannot solve nearly singular matrices. For large systems direct methods can be very computationally demanding. In order to overcome these issues, modern direct solvers use a combination of direct and iterative algorithms. The only direct solver available is the PARDISO solver.

Iterative solvers require an initial guess to the solution. Iterative solvers generally require less memory because unlike with direct solvers, the structure of the matrix does not change during the iteration process. In addition, iterative solvers utilize matrix-vector multiplications which can be efficiently parallelized. The main drawback of iterative solvers is that the rate of convergence depends greatly on the condition number of the coefficient matrix. For poorly conditioned matrices, the iterative solver may not converge at all. Therefore, the efficiency of iterative solvers greatly depends on the size and condition number of the coefficient matrix.

Iterative solvers may be classified into stationary and projection methods. In stationary methods the solution for each iteration is expressed as finding a stationary point for the iteration. The number of operations for iteration step for stationary methods is always the same. Stationary methods work well for small problems but generally converge slowly for large problems. Projection methods extract an approximate solution from a subspace. Generally, projection methods have better convergence properties than stationary methods but because each iteration is generally more computationally demanding than stationary methods, they tend to be more efficient for medium to large systems of equations. The main disadvantage of iterative solvers is their lack of robustness.

PARDISO. The PARDISO solver is a high-performance, robust, memory efficient and easy to use solver for solving large sparse symmetric and non-symmetric linear systems of equations on shared memory and distributed-memory architectures. For large sparse linear systems, this solver uses a combination of parallel left- and right- looking supernode pivoting techniques to improve its LDU factorization process (Schenk and Gartner 2004, 2011). Here the Intel Math Kernal Libraries (MKL) PARDISO solver is utilized.

SOR. The Successive-Over-Relaxation (SOR) is a stationary iteration method based on the Gauss-Seidel (GS) method. The method utilizes a relaxation coefficient . When = 1, the SOR method reduces to the Gauss-Seidel method. In addition, for < 1, the method technically applies under-relaxation and not over-relaxation. However, for simplicity and convenience the method is referred to as SOR for all values of . Kahan (1958) showed that the SOR method is unstable for relaxation values outside of 0 < < 2. The optimal value of the relaxation factor is problem specific. The SOR method is guaranteed to converge if either (1) if 0 < < 2, and (2) the matrix is symmetric positive-definite, or strictly or irreducibly diagonally dominant. However, the method sometimes converges even if the second condition is not satisfied. A simple parallel version of the SOR is utilized here which is referred to as the Asynchronous SOR (ASOR) which uses new values of unknowns in each iteration/updates as soon as they are computed in the same iteration (see Chazan and Miranker 1969; Baudet 1978; Leone and Mangasarian 1988). The ASOR is part of a class of iterative solvers known as chaotic relaxation methods. Since the order of relaxation is unconstrained, synchronization is avoided at all stages of the solution. However, the convergence behavior can be slightly different for different number of threads. The ASOR solver has been parallelized with OpenMP. For typical applications with 1x10⁴-1x10⁶ rows and 1-8 threads, it has been found that the method’s convergence is not significantly affected by the number of threads. The SOR method works best for small to medium-sized problems. The SOR method is utilized as both an iterative solver and a preconditioner.

FGMRES-SOR. The Flexible Generalized Minimal RESidual (FGMRES) solver is a projection method which is applicable to coefficient matrices which are non-symmetric indefinite (Saad 1993; Saad 2003). The FGMRES solver may fail if the matrix is degenerate. The “flexible” variant of the Generalized Minimal RESidual method (GMRES) allows for the preconditioner to vary from iteration to iteration. The flexible variant requires more memory than the standard version, but the extra memory is worth the cost since any iterative method can be used as a preconditioner. For example, the SOR could be used as a preconditioner with different relaxation parameter values each time it is applied. The FGMRES (and GMRES) method becomes impractical for large number of iterations because memory and computational requirements increase linearly as the number iterations increases. To remediate this the algorithm is restarted after iterations with the last solution used as an initial guess to the new iterative solution (Saad and Schultz 1986). This procedure is repeated until convergence is achieved. The FGMRES solver with restart is referred to as FGMRES(m). If is too small, the solver may converge too slowly are even fail completely. A value of m that is larger than necessary involves excessive work and memory storage. Typical restart values are between 5 and 20. The FGMRES algorithm is described in the figure below. Here the Intel MKL PFGMRES solver is implemented with the Reverse Communication Interface.

The FGMRES-SOR utilizes SOR as a preconditioner. A preconditioner is a matrix which allows the transformation of coefficient matrix in such a way that it is easier to solve by an iterative solver. This is done by reducing the condition number. In practice when the preconditioner is applied to Krylov subspace methods the iterative solver is formulated in such a way that the preconditioner is applied in its entirety in solving in auxiliary sparse linear system of equations. It is this auxiliary system of equations that SOR is applied to. The SOR preconditioner is based on the SOR solver except that no convergence checking is done during the iteration process (DeLong, 1997). This is done for simplicity and to avoid additional computations associated with the determining the convergence status. The SOR preconditioner is utilized for non-symmetric matrices. Testing and comparisons with other preconditioner such as ILU0 and ILUT with HEC-RAS has shown that the SOR preconditioner has a very good performance and computational efficiency. The reasons are that SOR preconditioner is not expensive to initialize compared to the incomplete factorizations and the parallelization is also very efficient.

Stopping Criteria. For the iterative solvers, the convergence is monitored using a normalized backward error estimate. The error estimate is defined as:

1)	$\begin{array}{l}\displaystyle E^m = \frac{\Vert \boldsymbol{D}^{-1} \left ( \boldsymbol{Ax}^m - \boldsymbol{b} \right ) \Vert_2}{\sqrt N}\end{array}$

where $\begin{array}{l}m\end{array}$ is the iteration number, $\begin{array}{l}E^m\end{array}$ is the error estimate, $\begin{array}{l}\boldsymbol{D}\end{array}$ is the diagonal matrix containing the diagonal elements of , $\begin{array}{l}\boldsymbol{x}\end{array}$ is the solution vector, $\begin{array}{l}\Vert \: \Vert_2\end{array}$ is the L² norm (Euclidean norm) operator, and $\begin{array}{l}N\end{array}$ is the number of rows in the sparse matrix $\begin{array}{l}\boldsymbol{A}\end{array}$ . During the iteration process, the iterative solver computes various criteria in order to determine if the solver is should continue iterating or stop. The various iterative solver status are shown in the table below. The solver may stop because it has converged, stalled, diverged or simply reached the maximum number of iterations. The user can check the convergence status of specific time steps for each 2D area in the *.bco* file.

Table 1. Iterative Solver Stopping Criteria.

Iterative Solver Status	Criteria	Description
Iterating	$\begin{array}{l}N_{min} < m < N_{max}\end{array}$ and $\begin{array}{l}E^m > T_{C}\end{array}$ and $\begin{array}{l}\frac{ E^{m-1} - E^m }{E^1 - E^m} > T_S\end{array}$ and $\begin{array}{l}E^m > E^1\end{array}$	Iterative solver is converging and will continue to iterate.
Converged	$\begin{array}{l}E^m \leq T_{C}\end{array}$	Convergence criteria met. Solution accepted.
Stalled	$\begin{array}{l}\frac{ E^{m-1} - E^m }{E^1 - E^m} \leq T_S\end{array}$	Convergence rate has decreased to an insignificant level without satisfying the converged criteria. The current solution is accepted, and the iteration loop is exited.
Max Iterations	$\begin{array}{l}m = N_{max}\end{array}$	Maximum number of iterations reached without reaching the converged criteria. The current solution is accepted, and the iteration loop is exited.
Divergent	$\begin{array}{l}E^m \leq E^1\end{array}$	Iterative solution is divergent. Either the normalized residuals are getting larger, or a Not a Number (NaN) has been detected.