Low-rank approximation

Lua error in package.lua at line 80: module 'strict' not found. In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

Low-rank approximation is closely related to:

1 Definition
2 Applications
3 Basic low-rank approximation problem
4 Proof of Eckart–Young–Mirsky theorem (for spectral norm)
5 Proof of Eckart–Young–Mirsky theorem (for Frobenius norm)
6 Weighted low-rank approximation problems
7 Image and kernel representations of the rank constraints
8 Alternating projections algorithm
9 Variable projections algorithm
10 A Variant: convex-restricted low rank approximation
11 See also
12 References
13 External links

Definition

Given

structure specification $\mathcal{S} : \mathbb{R}^{n_p} \to \mathbb{R}^{m\times n}$ ,
vector of structure parameters $p\in\mathbb{R}^{n_p}$ ,
norm $\| \cdot \|$ , and
desired rank $r$ ,

\text{minimize} \quad \text{over } \widehat p \quad \|p - \widehat p\| \quad\text{subject to}\quad \operatorname{rank}\big(\mathcal{S}(\widehat p)\big) \leq r.

Applications

Linear system identification, in which case the approximating matrix is Hankel structured.^[1]^[2]
Machine learning, in which case the approximating matrix is nonlinearly structured.^[3]
Recommender system, in which case the data matrix has missing values and the approximation is categorical.
Distance matrix completion, in which case there is a positive definiteness constraint.
Natural language processing, in which case the approximation is nonnegative.
Computer algebra, in which case the approximation is Sylvester structured.

Basic low-rank approximation problem

The unstructured problem with fit measured by the Frobenius norm, i.e.,

\text{minimize} \quad \text{over } \widehat D \quad \|D - \widehat D\|_{\text{F}} \quad\text{subject to}\quad \operatorname{rank}\big(\widehat D\big) \leq r

has analytic solution in terms of the singular value decomposition of the data matrix. The result is referred to as the matrix approximation lemma or Eckart–Young–Mirsky theorem.^[4] Let

D = U\Sigma V^{\top} \in \mathbb{R}^{m\times n}, \quad m \leq n

be the singular value decomposition of $D$ and partition $U$ , $\Sigma=:\operatorname{diag}(\sigma_1,\ldots,\sigma_m)$ , and $V$ as follows:

U =: \begin{bmatrix} U_1 & U_2\end{bmatrix}, \quad \Sigma =: \begin{bmatrix} \Sigma_1 & 0 \\ 0 & \Sigma_2 \end{bmatrix}, \quad\text{and}\quad V =: \begin{bmatrix} V_1 & V_2 \end{bmatrix},

where $\Sigma_1$ is $r\times r$ , $U_1$ is $m\times r$ , and $V_1$ is $n\times r$ . Then the rank- $r$ matrix, obtained from the truncated singular value decomposition

\widehat D^* = U_1 \Sigma_1 V_1^{\top},

is such that

\|D-\widehat D^*\|_{\text{F}} = \min_{\operatorname{rank}(\widehat D) \leq r} \|D-\widehat D\|_{\text{F}} = \sqrt{\sigma^2_{r+1} + \cdots + \sigma^2_m}.

The minimizer $\widehat D^*$ is unique if and only if $\sigma_{r+1}\neq\sigma_{r}$ .

Proof of Eckart–Young–Mirsky theorem (for spectral norm)

A = U_{n}\Sigma_{n} V^{\top}_{n}

where $U_n\quad$ and $\quad V^{\top}_{n}$ are orthogonal matrices, and $\Sigma_n$ is a diagonal matrix with entries $(\sigma_{1}, \sigma_{2}, \cdots ,\sigma_{n})$

s.t $(\sigma_{n} \leq \sigma_{n-1} \leq \cdots \leq \sigma_{1})$ .

We claim that the best approximation for $A$ be given by

A^k = \Sigma^k_{i=1} u_i \sigma_i v_i^t

To Prove: $A^{k}\quad$ is indeed the Best approximation, i.e., $\|A - A^k\|_2 \quad \text{is minimum.}$

Proof by Contradiction:

Let us suppose $\quad \exists B \quad \text{s.t.} \quad \|A - B\|^2_2 < \|A - A^k\|^2_2 \quad = \quad \sigma^2_{k+1} ,$ where $\operatorname{rank}(B) \leq k . \quad \text{(Assuming in Low Rank Approximation, we are approximating via a matrix whose rank} \leq k .)$

Then $\operatorname{dim}(\operatorname{null}(B)) + \operatorname{rank}(B) = n \rightarrow \operatorname{dim(\operatorname{null}(B))} \geq n-k .$

If $w\in\operatorname{null}(B)$ , then $\|(A - B)w\|_2 = \|Aw\|_2 < \sigma_{k+1} .$

We know that $\exists(k+1)\quad$ dimension space $(v_1,v_2, \cdots, v_n)\quad$
s.t. $V \in \operatorname{span}(v_1,v_2, \cdots, v_n) \quad\text{and} \quad \|AV\|_2 \geq \sigma_{k+1} .$

Since $n-k+k+1 > n ,\quad$ hence a contradiction. So we get that $A^k$ is the best approximation.

Proof of Eckart–Young–Mirsky theorem (for Frobenius norm)

For a given (real or complex valued) matrix $A$ , denote its SVD by

A = U \Sigma V^{\top} ,

where $U$ and $V$ are unitary matrices, and $\Sigma$ is a (often rectangular) diagonal matrix with entries $(\sigma_{1}, \sigma_{2}, \cdots ,\sigma_{n})$

s.t $(\sigma_{n} \leq \sigma_{n-1} \leq \cdots \leq \sigma_{1})$ . The ith column of $U$ is $u_i$ and the ith row of $V^{\top}$ is $v_i^{\top}$ .

Claim: the rank k approximation for $A$ that minimizes the Frobenius norm $\|A - A^k\|_F$ is given by

A^k = \Sigma^k_{i=1} u_i \sigma_i v_i^{\top} .

Proof:

For any matrix $B$ the same size as $A$ , define the (generally non-diagonal) matrix $D = U^{\top} B V$ , so $B = U D V^{\top}$ . Then

\|A - B\|_F^2 = \| U \Sigma V' - U D V' \|_F^2 = \| \Sigma - D \|_F^2 = \sum_i |\sigma_i - D_{ii}|^2 + \sum_{i \neq j} |D_{ij}|^2 \geq \sum_i |\sigma_i - D_{ii}|^2 \geq \sum_{i > k} |\sigma_i - D_{ii}|^2 .

Thus clearly the rank k choice of $D$ that minimizes $\|A - B\|_F^2$ is when $D$ is diagonal with nonzero diagonal elements $(\sigma_{1}, \sigma_{2}, \cdots ,\sigma_{k})$ . The corresponding $B$ matches $A^k$ above.

Weighted low-rank approximation problems

The Frobenius norm weights uniformly all elements of the approximation error $D - \widehat D$ . Prior knowledge about distribution of the errors can be taken into account by considering the weighted low-rank approximation problem

\text{minimize} \quad \text{over } \widehat D \quad \operatorname{vec}^{\top}(D - \widehat D) W \operatorname{vec}(D - \widehat D) \quad\text{subject to}\quad \operatorname{rank}(\widehat D) \leq r,

where $vec(A)$ vectorizes the matrix $A$ column wise and $W$ is a given positive (semi)definite weight matrix.

The general weighted low-rank approximation problem does not admit an analytic solution in terms of the singular value decomposition and is solved by local optimization methods, which provide no guarantee that a globally optimal solution is found.

Image and kernel representations of the rank constraints

Using the equivalences

\operatorname{rank}(\widehat D) \leq r \quad\iff\quad \text{there are } P\in\R^{m\times r} \text{ and } L\in\R^{r\times n} \text{ such that } \widehat D = PL

and

\operatorname{rank}(\widehat D) \leq r \quad\iff\quad \text{there is full row rank } R\in\R^{m - r\times m} \text{ such that } R \widehat D = 0

the weighted low-rank approximation problem becomes equivalent to the parameter optimization problems

\text{minimize} \quad \text{over } \widehat D, P \text{ and } L \quad \operatorname{vec}^{\top}(D - \widehat D) W \operatorname{vec}(D - \widehat D) \quad\text{subject to}\quad \widehat D = PL

and

\text{minimize} \quad \text{over } \widehat D \text{ and } R \quad \operatorname{vec}^{\top}(D - \widehat D) W \operatorname{vec}(D - \widehat D) \quad\text{subject to}\quad R \widehat D = 0 \quad\text{and}\quad RR^{\top} = I_r,

where $I_r$ is the identity matrix of size $r$ .

Alternating projections algorithm

The image representation of the rank constraint suggests a parameter optimization methods, in which the cost function is minimized alternatively over one of the variables ( $P$ or $L$ ) with the other one fixed. Although simultaneous minimization over both $P$ and $L$ is a difficult biconvex optimization problem, minimization over one of the variables alone is a linear least squares problem and can be solved globally and efficiently.

The resulting optimization algorithm (called alternating projections) is globally convergent with a linear convergence rate to a locally optimal solution of the weighted low-rank approximation problem. Starting value for the $P$ (or $L$ ) parameter should be given. The iteration is stopped when a user defined convergence condition is satisfied.

Matlab implementation of the alternating projections algorithm for weighted low-rank approximation:

function [dh, f] = wlra_ap(d, w, p, tol, maxiter)
[m, n] = size(d); r = size(p, 2); f = inf;
for i = 2:maxiter
    % minimization over L
    bp = kron(eye(n), p);
    vl = (bp' * w * bp) \ bp' * w * d(:);
    l  = reshape(vl, r, n);
    % minimization over P
    bl = kron(l', eye(m));
    vp = (bl' * w * bl) \ bl' * w * d(:);
    p  = reshape(vp, m, r);
    % check exit condition
    dh = p * l; dd = d - dh;
    f(i) = dd(:)' * w * dd(:);
    if abs(f(i - 1) - f(i)) < tol, break, end
end

Variable projections algorithm

The alternating projections algorithm exploits the fact that the low rank approximation problem, parameterized in the image form, is bilinear in the variables $P$ or $L$ . The bilinear nature of the problem is effectively used in an alternative approach, called variable projections.^[5]

Consider again the weighted low rank approximation problem, parameterized in the image form. Minimization with respect to the $L$ variable (a linear least squares problem) leads to the closed form expression of the approximation error as a function of $P$

f(P) = \sqrt{\operatorname{vec}^{\top}(D)\Big( W - W (I_n \otimes P) \big( (I_n \otimes P)^{\top} W (I_n \otimes P) \big)^{-1} (I_n \otimes P)^{\top} W \Big) \operatorname{vec}(D)}.

The original problem is therefore equivalent to the nonlinear least squares problem of minimizing $f(P)$ with respect to $P$ . For this purpose standard optimization methods, e.g. the Levenberg-Marquardt algorithm can be used.

Matlab implementation of the variable projections algorithm for weighted low-rank approximation:

function [dh, f] = wlra_varpro(d, w, p, tol, maxiter)
prob = optimset(); prob.solver = 'lsqnonlin';
prob.options = optimset('MaxIter', maxiter, 'TolFun', tol); 
prob.x0 = p; prob.objective = @(p) cost_fun(p, d, w);
[p, f ] = lsqnonlin(prob); 
[f, vl] = cost_fun(p, d, w); 
dh = p * reshape(vl, size(p, 2), size(d, 2));

function [f, vl] = cost_fun(p, d, w)
bp = kron(eye(size(d, 2)), p);
vl = (bp' * w * bp) \ bp' * w * d(:);
f = d(:)' * w * (d(:) - bp * vl);

The variable projections approach can be applied also to low rank approximation problems parameterized in the kernel form. The method is effective when the number of eliminated variables is much larger than the number of optimization variables left at the stage of the nonlinear least squares minimization. Such problems occur in system identification, parameterized in the kernel form, where the eliminated variables are the approximating trajectory and the remaining variables are the model parameters. In the context of linear time-invariant systems, the elimination step is equivalent to Kalman smoothing.

A Variant: convex-restricted low rank approximation

Usually, we want our new solution not only to be of low rank, but also satisfy other convex constraints due to application requirements. Our interested problem would be as follows,

$\text{minimize} \quad \text{over } \widehat p \quad \|p - \widehat p\| \quad\text{subject to}\quad \operatorname{rank}\big(\mathcal{S}(\widehat p)\big) \leq r \text{ and } g(\widehat p) \leq 0$

This problem can find tons of real applications, including to recover a good solution from a inexact (semidefinite programming) relaxation. If additional constraint $g(\widehat p) \leq 0$ is linear, like we require all elements to be nonnegative, the problem is called structured low rank approximation.^[6] And the more general form is named as convex-restricted low rank approximation.^[7]

This problem is helpful in solving many problems. However, it is challenging due to the combination of the convex and nonconvex (low-rank) constraints. Different techniques were developed based on different realizations of $g(\widehat p) \leq 0$ . However, the Alternating Direction Method of Multipliers (ADMM) can be applied to solve the nonconvex problem with convex objective function, rank constraints and other convex constraints,^[8] and is thus suitable to solve our above problem. Moreover, unlike the general nonconvex problems, ADMM will guarantee to converge a feasible solution as long as its dual variable converges in the iterations

References

↑ I. Markovsky, Structured low-rank approximation and its applications, Automatica, Volume 44, Issue 4, April 2008, Pages 891–909. doi:10.1016/j.automatica.2007.09.011
↑ I. Markovsky, J. C. Willems, S. Van Huffel, B. De Moor, and R. Pintelon, Application of structured total least squares for system identification and model reduction. IEEE Transactions on Automatic Control, Volume 50, Number 10, 2005, pages 1490–1500.
↑ I. Markovsky, Low-Rank Approximation: Algorithms, Implementation, Applications, Springer, 2012, ISBN 978-1-4471-2226-5
↑ C. Eckart, G. Young, The approximation of one matrix by another of lower rank. Psychometrika, Volume 1, 1936, Pages 211–8. doi:10.1007/BF02288367
↑ G. Golub and V. Pereyra, Separable nonlinear least squares: the variable projection method and its applications, Institute of Physics, Inverse Problems, Volume 19, 2003, Pages 1-26.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

M. T. Chu, R. E. Funderlic, R. J. Plemmons, Structured low-rank approximation, Linear Algebra and its Applications, Volume 366, 1 June 2003, Pages 157–172 doi:10.1016/S0024-3795(02)00505-0

External links

C++ package for structured-low rank approximation

[sysid-aut-1] I. Markovsky, Structured low-rank approximation and its applications, Automatica, Volume 44, Issue 4, April 2008, Pages 891–909. doi:10.1016/j.automatica.2007.09.011

[sysid-ac-2] I. Markovsky, J. C. Willems, S. Van Huffel, B. De Moor, and R. Pintelon, Application of structured total least squares for system identification and model reduction. IEEE Transactions on Automatic Control, Volume 50, Number 10, 2005, pages 1490–1500.

[book-springer-3] I. Markovsky, Low-Rank Approximation: Algorithms, Implementation, Applications, Springer, 2012, ISBN 978-1-4471-2226-5

[EYM-thm-4] C. Eckart, G. Young, The approximation of one matrix by another of lower rank. Psychometrika, Volume 1, 1936, Pages 211–8. doi:10.1007/BF02288367

[5] G. Golub and V. Pereyra, Separable nonlinear least squares: the variable projection method and its applications, Institute of Physics, Inverse Problems, Volume 19, 2003, Pages 1-26.

[6] Lua error in package.lua at line 80: module 'strict' not found.

[7] Lua error in package.lua at line 80: module 'strict' not found.

[8] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Low-rank approximation

Contents

Definition

Applications

Basic low-rank approximation problem

Proof of Eckart–Young–Mirsky theorem (for spectral norm)

Proof of Eckart–Young–Mirsky theorem (for Frobenius norm)

Weighted low-rank approximation problems

Image and kernel representations of the rank constraints

Alternating projections algorithm

Variable projections algorithm

A Variant: convex-restricted low rank approximation

See also

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools