Gram–Schmidt process

The first two steps of the Gram–Schmidt process

In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process is a method for orthonormalising a set of vectors in an inner product space, most commonly the Euclidean space Rⁿ. The Gram–Schmidt process takes a finite, linearly independent set S = {v₁, ..., v_k} for k ≤ n and generates an orthogonal set S′ = {u₁, ..., u_k} that spans the same k-dimensional subspace of Rⁿ as S.

The method is named after Jørgen Pedersen Gram and Erhard Schmidt but it appeared earlier in the work of Laplace and Cauchy. In the theory of Lie group decompositions it is generalized by the Iwasawa decomposition.^[1]

The application of the Gram–Schmidt process to the column vectors of a full column rank matrix yields the QR decomposition (it is decomposed into an orthogonal and a triangular matrix).

The Gram–Schmidt process

The modified Gram-Schmidt process being executed on three linearly independent, non-orthogonal vectors of a basis for R³. Click on image for details. Modification is explained in the next section of this article.

We define the projection operator by

\mathrm{proj}_{\mathbf{u}}\,(\mathbf{v}) = {\langle \mathbf{v}, \mathbf{u}\rangle\over\langle \mathbf{u}, \mathbf{u}\rangle}\mathbf{u} ,

where $\langle \mathbf{v}, \mathbf{u}\rangle$ denotes the inner product of the vectors v and u. This operator projects the vector v orthogonally onto the line spanned by vector u. If u=0, we define $\mathrm{proj}_0\,(\mathbf{v}) := 0$ . i.e., the projection map $\mathrm{proj}_0$ is the zero map, sending every vector to the zero vector.

The Gram–Schmidt process then works as follows:

\begin{align} \mathbf{u}_1 & = \mathbf{v}_1, & \mathbf{e}_1 & = {\mathbf{u}_1 \over \|\mathbf{u}_1\|} \\ \mathbf{u}_2 & = \mathbf{v}_2-\mathrm{proj}_{\mathbf{u}_1}\,(\mathbf{v}_2), & \mathbf{e}_2 & = {\mathbf{u}_2 \over \|\mathbf{u}_2\|} \\ \mathbf{u}_3 & = \mathbf{v}_3-\mathrm{proj}_{\mathbf{u}_1}\,(\mathbf{v}_3)-\mathrm{proj}_{\mathbf{u}_2}\,(\mathbf{v}_3), & \mathbf{e}_3 & = {\mathbf{u}_3 \over \|\mathbf{u}_3\|} \\ \mathbf{u}_4 & = \mathbf{v}_4-\mathrm{proj}_{\mathbf{u}_1}\,(\mathbf{v}_4)-\mathrm{proj}_{\mathbf{u}_2}\,(\mathbf{v}_4)-\mathrm{proj}_{\mathbf{u}_3}\,(\mathbf{v}_4), & \mathbf{e}_4 & = {\mathbf{u}_4 \over \|\mathbf{u}_4\|} \\ & {}\ \ \vdots & & {}\ \ \vdots \\ \mathbf{u}_k & = \mathbf{v}_k-\sum_{j=1}^{k-1}\mathrm{proj}_{\mathbf{u}_j}\,(\mathbf{v}_k), & \mathbf{e}_k & = {\mathbf{u}_k\over \|\mathbf{u}_k \|}. \end{align}

The sequence u₁, ..., u_k is the required system of orthogonal vectors, and the normalized vectors e₁, ..., e_k form an orthonormal set. The calculation of the sequence u₁, ..., u_k is known as Gram–Schmidt orthogonalization, while the calculation of the sequence e₁, ..., e_k is known as Gram–Schmidt orthonormalization as the vectors are normalized.

To check that these formulas yield an orthogonal sequence, first compute ‹ u₁,u₂ › by substituting the above formula for u₂: we get zero. Then use this to compute ‹ u₁,u₃ › again by substituting the formula for u₃: we get zero. The general proof proceeds by mathematical induction.

Geometrically, this method proceeds as follows: to compute u_i, it projects v_i orthogonally onto the subspace U generated by u₁, ..., u_i−1, which is the same as the subspace generated by v₁, ..., v_i−1. The vector u_i is then defined to be the difference between v_i and this projection, guaranteed to be orthogonal to all of the vectors in the subspace U.

The Gram–Schmidt process also applies to a linearly independent countably infinite sequence {v_i}_i. The result is an orthogonal (or orthonormal) sequence {u_i}_i such that for natural number n: the algebraic span of v₁, ..., v_n is the same as that of u₁, ..., u_n.

If the Gram–Schmidt process is applied to a linearly dependent sequence, it outputs the 0 vector on the ith step, assuming that v_i is a linear combination of v₁, ..., v_i−1. If an orthonormal basis is to be produced, then the algorithm should test for zero vectors in the output and discard them because no multiple of a zero vector can have a length of 1. The number of vectors output by the algorithm will then be the dimension of the space spanned by the original inputs.

A variant of the Gram–Schmidt process using transfinite recursion applied to a (possibly uncountably) infinite sequence of vectors $(v_\alpha)_{\alpha<\lambda}$ yields a set of orthonormal vectors $(u_\alpha)_{\alpha<\kappa}$ with $\kappa\leq\lambda$ such that for any $\alpha\leq\lambda$ , the completion of the span of $\lbrace u_\beta : \beta<\min(\alpha,\kappa)\rbrace$ is the same as that of $\lbrace v_\beta:\beta<\alpha\rbrace$ . In particular, when applied to a (algebraic) basis of a Hilbert space (or, more generally, a basis of any dense subspace), it yields a (functional-analytic) orthonormal basis. Note that in the general case often the strict inequality $\kappa<\lambda$ holds, even if the starting set was linearly independent, and the span of $(u_\alpha)_{\alpha<\kappa}$ need not be a subspace of the span of $(v_\alpha)_{\alpha<\lambda}$ (rather, it's a subspace of its completion).

Example

Consider the following set of vectors in R² (with the conventional inner product)

S = \left\lbrace\mathbf{v}_1=\begin{pmatrix} 3 \\ 1\end{pmatrix}, \mathbf{v}_2=\begin{pmatrix}2 \\2\end{pmatrix}\right\rbrace.

Now, perform Gram–Schmidt, to obtain an orthogonal set of vectors:

\mathbf{u}_1=\mathbf{v}_1=\begin{pmatrix}3\\1\end{pmatrix}

\mathbf{u}_2 = \mathbf{v}_2 - \mathrm{proj}_{\mathbf{u}_1} \, (\mathbf{v}_2) = \begin{pmatrix}2\\2\end{pmatrix} - \mathrm{proj}_{({3 \atop 1})} \, ({\begin{pmatrix}2\\2\end{pmatrix})} = \begin{pmatrix}2\\2\end{pmatrix} - \begin{pmatrix} 4/5 \end{pmatrix} \begin{pmatrix} 3 \\1 \end{pmatrix} = \begin{pmatrix} -2/5 \\6/5 \end{pmatrix}.

We check that the vectors u₁ and u₂ are indeed orthogonal:

\langle\mathbf{u}_1,\mathbf{u}_2\rangle = \left\langle \begin{pmatrix}3\\1\end{pmatrix}, \begin{pmatrix}-2/5\\6/5\end{pmatrix} \right\rangle = -\frac65 + \frac65 = 0,

noting that if the dot product of two vectors is 0 then they are orthogonal.

For non-zero vectors, we can then normalize the vectors by dividing out their sizes as shown above:

\mathbf{e}_1 = {1 \over \sqrt {10}}\begin{pmatrix}3\\1\end{pmatrix}

\mathbf{e}_2 = {1 \over \sqrt{40 \over 25}} \begin{pmatrix}-2/5\\6/5\end{pmatrix} = {1\over\sqrt{10}} \begin{pmatrix}-1\\3\end{pmatrix}.

Numerical stability

When this process is implemented on a computer, the vectors $\mathbf{u}_k$ are often not quite orthogonal, due to rounding errors. For the Gram–Schmidt process as described above (sometimes referred to as "classical Gram–Schmidt") this loss of orthogonality is particularly bad; therefore, it is said that the (classical) Gram–Schmidt process is numerically unstable.

The Gram–Schmidt process can be stabilized by a small modification; this version is sometimes referred to as modified Gram-Schmidt or MGS. This approach gives the same result as the original formula in exact arithmetic and introduces smaller errors in finite-precision arithmetic. Instead of computing the vector u_k as

\mathbf{u}_k = \mathbf{v}_k - \mathrm{proj}_{\mathbf{u}_1}\,(\mathbf{v}_k) - \mathrm{proj}_{\mathbf{u}_2}\,(\mathbf{v}_k) - \cdots - \mathrm{proj}_{\mathbf{u}_{k-1}}\,(\mathbf{v}_k),

it is computed as

\begin{align} \mathbf{u}_k^{(1)} &= \mathbf{v}_k - \mathrm{proj}_{\mathbf{u}_1}\,(\mathbf{v}_k), \\ \mathbf{u}_k^{(2)} &= \mathbf{u}_k^{(1)} - \mathrm{proj}_{\mathbf{u}_2} \, (\mathbf{u}_k^{(1)}), \\ & \,\,\, \vdots \\ \mathbf{u}_k^{(k-2)} &= \mathbf{u}_k^{(k-3)} - \mathrm{proj}_{\mathbf{u}_{k-2}} \, (\mathbf{u}_k^{(k-3)}), \\ \mathbf{u}_k^{(k-1)} &= \mathbf{u}_k^{(k-2)} - \mathrm{proj}_{\mathbf{u}_{k-1}} \, (\mathbf{u}_k^{(k-2)}). \end{align}

Each step finds a vector $\mathbf{u}_k^{(i)}$ orthogonal to $\mathbf{u}_k^{(i-1)}$ . Thus $\mathbf{u}_k^{(i)}$ is also orthogonalized against any errors introduced in computation of $\mathbf{u}_k^{(i-1)}$ .

This method is used in the previous animation, when the intermediate v'₃ vector is used when orthogonalizing the blue vector v₃.

Algorithm

The following algorithm implements the stabilized Gram–Schmidt orthonormalization. The vectors v₁, ..., v_k are replaced by orthonormal vectors which span the same subspace.

for i from 1 to k do

\mathbf{v}_i \leftarrow \frac{\mathbf{v}_i}{\|\mathbf{v}_i\|}

(normalize)

for j from i+1 to k do

\mathbf{v}_j \leftarrow \mathbf{v}_j - \mathrm{proj}_{\mathbf{v}_{i}} \, (\mathbf{v}_j)

(remove component in direction v_i)

next j

next i

The cost of this algorithm is asymptotically 2nk² floating point operations, where n is the dimensionality of the vectors (Golub & Van Loan 1996, §5.2.8).

Determinant formula

The result of the Gram–Schmidt process may be expressed in a non-recursive formula using determinants.

\mathbf{e}_j = \frac{1}{\sqrt{D_{j-1} D_j}} \begin{vmatrix} \langle \mathbf{v}_1, \mathbf{v}_1 \rangle & \langle \mathbf{v}_2, \mathbf{v}_1 \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_1 \rangle \\ \langle \mathbf{v}_1, \mathbf{v}_2 \rangle & \langle \mathbf{v}_2, \mathbf{v}_2 \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_2 \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle \mathbf{v}_1, \mathbf{v}_{j-1} \rangle & \langle \mathbf{v}_2, \mathbf{v}_{j-1} \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_{j-1} \rangle \\ \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_j \end{vmatrix}

\mathbf{u}_j = \frac{1}{D_{j-1} } \begin{vmatrix} \langle \mathbf{v}_1, \mathbf{v}_1 \rangle & \langle \mathbf{v}_2, \mathbf{v}_1 \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_1 \rangle \\ \langle \mathbf{v}_1, \mathbf{v}_2 \rangle & \langle \mathbf{v}_2, \mathbf{v}_2 \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_2 \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle \mathbf{v}_1, \mathbf{v}_{j-1} \rangle & \langle \mathbf{v}_2, \mathbf{v}_{j-1} \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_{j-1} \rangle \\ \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_j \end{vmatrix}

where D ₀=1 and, for j ≥ 1, D _j is the Gram determinant

D_j = \begin{vmatrix} \langle \mathbf{v}_1, \mathbf{v}_1 \rangle & \langle \mathbf{v}_2, \mathbf{v}_1 \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_1 \rangle \\ \langle \mathbf{v}_1, \mathbf{v}_2 \rangle & \langle \mathbf{v}_2, \mathbf{v}_2 \rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_2 \rangle \\ \vdots & \vdots & \ddots & \vdots \\ \langle \mathbf{v}_1, \mathbf{v}_j \rangle & \langle \mathbf{v}_2, \mathbf{v}_j\rangle & \dots & \langle \mathbf{v}_j, \mathbf{v}_j \rangle \end{vmatrix}.

Note that the expression for u_k is a "formal" determinant, i.e. the matrix contains both scalars and vectors; the meaning of this expression is defined to be the result of a cofactor expansion along the row of vectors.

The determinant formula for the Gram-Schmidt is computationally slower (exponentially slower) than the recursive algorithms described above; it is mainly of theoretical interest.

Alternatives

Other orthogonalization algorithms use Householder transformations or Givens rotations. The algorithms using Householder transformations are more stable than the stabilized Gram–Schmidt process. On the other hand, the Gram–Schmidt process produces the $j$ th orthogonalized vector after the $j$ th iteration, while orthogonalization using Householder reflections produces all the vectors only at the end. This makes only the Gram–Schmidt process applicable for iterative methods like the Arnoldi iteration.

Yet another alternative is motivated by the use of Cholesky decomposition for inverting the matrix of the normal equations in linear least squares. Let $\mathbf{V}$ be a full column rank matrix, which columns need to be orthogonalized. The matrix $\mathbf{V}^{*} \mathbf{V}$ is Hermitian and positive definite, so it can be written as $\mathbf{V}^{*} \mathbf{V} = \mathbf{L} \mathbf{L}^{*},$ using the Cholesky decomposition. The lower triangular matrix $\mathbf{L}$ with strictly positive diagonal entries is invertible. Then columns of the matrix $\mathbf{U}= \mathbf{V}(\mathbf{L}^{-1})^{*}$ are orthonormal and span the same subspace as the columns of the original matrix $\mathbf{V}$ . The explicit use of the product $\mathbf{V}^{*} \mathbf{V}$ makes the algorithm unstable, especially if the product's condition number is large. Nevertheless, this algorithm is used in practice and implemented in some software packages because of its high efficiency and simplicity.

In quantum mechanics there are several orthogonalization schemes with characteristics better suited for applications than the Gram–Schmidt one. The most important among them are the symmetric and the canonical orthonormalization (see Solivérez & Gagliano).^{[clarification needed]}

References

↑ Lua error in package.lua at line 80: module 'strict' not found.

Lua error in package.lua at line 80: module 'strict' not found..
Lua error in package.lua at line 80: module 'strict' not found..
Lua error in package.lua at line 80: module 'strict' not found..
Lua error in package.lua at line 80: module 'strict' not found..

External links

Lua error in package.lua at line 80: module 'strict' not found.
Harvey Mudd College Math Tutorial on the Gram-Schmidt algorithm
Earliest known uses of some of the words of mathematics: G The entry "Gram-Schmidt orthogonalization" has some information and references on the origins of the method.
Demos: Gram Schmidt process in plane and Gram Schmidt process in space
Gram-Schmidt orthogonalization applet
NAG Gram–Schmidt orthogonalization of n vectors of order m routine
Proof: Raymond Puzio, Keenan Kidwell. "proof of Gram-Schmidt orthogonalization algorithm" (version 8). PlanetMath.org.

[1] Lua error in package.lua at line 80: module 'strict' not found.

[1]

v t e Linear algebra
Basic concepts	Scalar Vector Vector space Vector projection Linear span Linear map Linear projection Linear independence Linear combination Basis Column space Row space Dual space Orthogonality Kernel Eigenvalues and eigenvectors Least squares regressions Outer product Inner product space Dot product Transpose Gram–Schmidt process Linear equations
Matrices	Block Decomposition Invertible Minor Multiplication Rank Transformation Cramer's rule Gaussian elimination
Numerical	Floating point Matrix Laboratory Numerical stability Basic Linear Algebra Subprograms (BLAS) Sparse matrix Comparison of linear algebra libraries Comparison of numerical analysis software

Gram–Schmidt process

Contents

The Gram–Schmidt process

Example

Numerical stability

Algorithm

Determinant formula

Alternatives

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools