Chapter 16 Numerical Linear Algebra

In your algebra courses you would write \(Ax=b\) and solve \(x=A^{-1}b\). This is useful to understand the algebraic properties of \(x\), but a computer would never recover \(x\) that way. Even the computation of the sample variance, \(S^2(x)=(n-1)^{-1}\sum (x_i-\bar x )^2\) is not solved that way in a computer, because of numerical and speed considerations.

In this chapter, we discuss several ways a computer solves systems of linear equations, with their application to statistics, namely, to OLS problems.

16.1 LU Factorization

Definition 16.1 (LU Factorization) For some matrix \(A\), the LU factorization is defined as \[\begin{align} A = L U \end{align}\] where \(L\) is unit lower triangular and \(U\) is upper triangular.

The LU factorization is essentially the matrix notation for the Gaussian elimination you did in your introductory algebra courses.

For a square \(n \times n\) matrix, the LU factorization requires \(n^3/3\) operations, and stores \(n^2+n\) elements in memory.

16.2 Cholesky Factorization

Definition 16.2 (Non Negative Matrix) A matrix \(A\) is said to be non-negative if \(x'Ax \geq 0\) for all \(x\).

Seeing the matrix \(A\) as a function, non-negative matrices can be thought of as functions that generalize the squaring operation.

Definition 16.3 (Cholesky Factorization) For some non-negative matrix \(A\), the Cholesky factorization is defined as \[\begin{align} A = T' T \end{align}\] where \(T\) is upper triangular with positive diagonal elements.

For obvious reasons, the Cholesky factorization is known as the square root of a matrix.

Because Cholesky is less general than LU, it is also more efficient. It can be computed in \(n^3/6\) operations, and requires storing \(n(n+1)/2\) elements.

16.3 QR Factorization

Definition 16.4 (QR Factorization) For some matrix \(A\), the QR factorization is defined as \[\begin{align} A = Q R \end{align}\] where \(Q\) is orthogonal and \(R\) is upper triangular.

The QR factorization is very useful to solve the OLS problem as we will see in 16.6. The QR factorization takes \(2n^3/3\) operations to compute. Three major methods for computing the QR factorization exist. These rely on Householder transformations, Givens transformations, and a (modified) Gram-Schmidt procedure (Gentle 2012).

16.4 Singular Value Factorization

Definition 16.5 (SVD) For an arbitrary \(n\times m\) matrix \(A\), the singular valued decomposition (SVD), is defined as \[\begin{align} A = U \Sigma V' \end{align}\] where \(U\) is an orthonormal \(n \times n\) matrix, \(V\) is an \(m \times m\) orthonormal matrix, and \(\Sigma\) is diagonal.

The SVD factorization is very useful for algebraic analysis, but less so for computations. This is because it is (typically) solved via the QR factorization.

16.5 Iterative Methods

The various matrix factorizations above may be used to solve a system of linear equations, and in particular, the OLS problem. There is, however, a very different approach to solving systems of linear equations. This approach relies on the fact that solutions of linear systems of equations, can be cast as optimization problems: simply find \(x\) by minimizing \(\Vert Ax-b \Vert\).

Some methods for solving (convex) optimization problems are reviewed in the Convex Optimization Chapter @(convex). For our purposes we will just mention that historically (this means in the lm function, and in the LAPACK numerical libraries) the factorization approach was preferred, and now optimization approaches are preferred. This is because the optimization approach is more numerically stable, and easier to parallelize.

16.6 Solving the OLS Problem

Recalling the OLS problem in Eq.(5.5): we wish to find \(\beta\) such that
\[\begin{align} \hat \beta:= argmin_\beta \{ \Vert y-X\beta \Vert^2_2 \}. \end{align}\] The solution, \(\hat \beta\) that solves this problem has to satisfy \[\begin{align} X'X \beta = X'y. \tag{16.1} \end{align}\]

Eq.(16.1) are known as the normal equations. The normal equations are the link between the OLS problem, and the matrix factorization discussed above.

Using the QR decomposition in the normal equations we have that \[\begin{align*} \hat \beta = R_{(1:p,1:p)}^{-1} y, \end{align*}\]

where \((R_{n\times p})=(R_{(1:p,1:p)},0_{(p+1:n,1:p)})\) is the

16.7 Bibliographic Notes

For an excellent introduction to numerical algorithms in statistics, see Weihs, Mersmann, and Ligges (2013). For an emphasis on numerical linear algebra, see Gentle (2012), and Golub and Van Loan (2012).

16.8 Practice Yourself

References

Gentle, James E. 2012. Numerical Linear Algebra for Applications in Statistics. Springer Science & Business Media.

Weihs, Claus, Olaf Mersmann, and Uwe Ligges. 2013. Foundations of Statistical Algorithms: With References to R Packages. CRC Press.

Golub, Gene H, and Charles F Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.