Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi*, and Benjamin Recht.
UC Berkeley, *USC
$\newcommand{\T}{\mathsf{T}}$ $\newcommand{\R}{\mathbb{R}}$ $\newcommand{\E}{\mathbb{E}}$ $\newcommand{\X}{\mathcal{X}}$ $\newcommand{\H}{\mathcal{H}}$ $\newcommand{\twonorm}[1]{ \left\| #1 \right\|_{\ell_2} }$ $\newcommand{\norm}[1]{ \left\| #1 \right\| }$ $\newcommand{\ip}[2]{ \left\langle #1, #2 \right\rangle}$ $\newcommand{\abs}[1]{ \left| #1 \right| }$ $\newcommand{\ind}{ \mathbf{1} }$ $\newcommand{\A}{\mathcal{A}}$ $\newcommand{\B}{\mathcal{B}}$ $\DeclareMathOperator*{\argmin}{arg\,min}$ $\newcommand{\dist}{\mathrm{dist}}$ $\newcommand{\Tr}{\mathbf{Tr}}$
Low-rank matrix recovery: Given linear operator $\A(\cdot)$ and measurements $b = \A(M) \in \R^{m}$, would like to recover $M \in \R^{n_1 \times n_2}$.
$$\min_{X \in \R^{n_1 \times n_2}} \mathrm{rank}(X) \;\;\mathrm{s.t.}\;\; \A(X) = b \:.$$
$$\min_{X \in \R^{n_1 \times n_2}} \mathrm{rank}(X) \;\;\mathrm{s.t.}\;\; \A(X) = b \:.$$
Framework is general: matrix completion, phase retrieval, metric embedding, etc.
Low-rank matrix recovery is NP-hard in general.
Procrustes flow: A procedure to estimate rank-$r$ $M$ to arbitrary accuracy from $m=\widetilde{\Omega}(nr)$ samples for special types of $\A(\cdot)$'s.
(Also known as gradient descent).
Restricted isometry: For all matrices $M \in \R^{n \times n}$ with rank at most $r$, there exists $\delta_r$ s.t.
$$ (1-\delta_r) \norm{M}^2_F \leq \twonorm{\A(M)}^2 \leq (1+\delta_r) \norm{M}^2_F \:. $$
What operators are RIP (for constant $\delta$)?
Gaussian ensemble with $m = \Omega(nr)$.
Subsampled 2D-Fourier basis [1] with $m = \Omega(nr \log^4{n})$.
Subsampled Pauli basis [2] with $m = \Omega(nr \log^6{n})$.
Algorithm | Workhorse | Convergence | Sample complexity |
---|---|---|---|
Nuclear norm minimization |
SDP
SDP
|
- | $nr$ |
Iterative hard thresholding |
Gradient + rank-$r$ SVD
Gradient + rank-$r$ SVD
Gradient +
rank-$r$ SVD |
$\log(1/\varepsilon)$ | $nr$ |
Alternating minimization | Least squares | $\log(1/\varepsilon)$ | $nr^2\kappa^4$ |
Low-rank recovery problem: $$ \min_{X \in \R^{n \times n}} \frac{1}{2} \twonorm{\A(X) - b}^2 \;\;\mathrm{s.t.}\;\; \mathrm{rank}(X) \leq r \:. $$
Procrustes flow: run gradient descent on the following
$$ \begin{align*} \min_{U \in \R^{n \times r}} f(U) &:= \frac{1}{4} \twonorm{ \A(UU^\T) - b }^2 \:. \end{align*} $$
Instance of popular Burer-Monteiro heuristic.
The surface of $f(U)$ in a $\R^{2 \times 1}$ case.
A complication: $f(U) = f(U {\color{red}R})$ for any orthogonal matrix $\color{red}R$.
Define distance using orthogonal Procrustes distance
$$ \dist(U, X) := \min_{\substack{{\color{red}R} \in \R^{r \times r} \\ {\color{red}RR^\T} = {\color{red}R^\T R} = I}} \norm{U - X{\color{red}R}}_F \:. $$
$\dist(U, X) \longrightarrow 0$ implies $\norm{UU^\T - XX^\T}_F \longrightarrow 0$.
Proof idea: Show strong convexity along trajectories given by optimal Procrustes rotation.
Main theorem: ($C_1, C_2$ below are absolute constants)
Let $U_0$ satisfy
$$ \dist(U_0, X) \leq \sigma_r^{1/2}/4 \:. $$(Radius)
Iterating
$$ U_{t+1} \gets U_t - {\color{blue} \frac{C_1}{\sigma_1(U_0)^2}} \underbrace{ \color{green} \A^*( \A(U_t U_t^\T) - b ) U_t}_{\nabla f(U_t)} \:, $$(Update)
achieves
$$ \definecolor{rate}{rgb}{0.59, 0.29, 0.0} \dist^2(U_\tau, X) \leq \left( 1 - \frac{C_2}{\color{rate} \kappa} \right)^{\tau} \dist^2(U_0, X) \:. $$
Use $\log(r^{1/2} \kappa)$ iterations of hard-thresholding.
Can use approx SVD (see Hegde, Indyk, and Schmidt (2016)).
You don't even need to! All local min are global min, and all saddle points have negative curvature.
Working with the factorized $U$ requires converting from $\dist(U, X)$ to $\norm{UU^\T - XX^\T}_F$ and vice versa.
(1) $\dist(U, X)$ small $\Longrightarrow$ $\norm{UU^\T - XX^\T}_F$ small is easy.
(2) $\norm{UU^\T - XX^\T}_F$ small $\Longrightarrow$ $\dist(U, X)$ small requires more work.
(Standard matrix perturbation arguments are pessimistic by a condition number).
We establish that
$$ \dist^2(U, X) \lesssim {\color{green} \frac{1}{\sigma_r(X)^2}} \norm{UU^\T - XX^\T}^2_F \:. $$
(This inequality has come up many extensions of our work, e.g. Bhojanapalli et al.).
For $M_\ell = U_\ell \Sigma_\ell V_\ell^\T$, $\ell=1,2$,
$$ \dist^2\left(\begin{bmatrix} U_2\Sigma_2^{1/2} \\ V_2\Sigma_2^{1/2} \end{bmatrix}, \begin{bmatrix} U_1\Sigma_1^{1/2} \\ V_1\Sigma_1^{1/2} \end{bmatrix}\right) \lesssim {\color{green} \frac{1}{\sigma_r(M_1)}} \norm{M_2 - M_1}^2_F \:. $$
Gradient descent for RIP matrix sensing converges linearly to the unknown matrix $M$.
Sample complexity of $\Omega(nr)$ for the Gaussian ensemble, and $\widetilde{\Omega}(nr)$ for other structured ensembles.
Poster session Wed 10am-1pm.