This post will focus on the dual certificate argument for matrix recovery problems via nuclear norm minimization. \newcommand{\abs}[1]{| #1 |} \newcommand{\ind}{\mathbf{1}} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\R}{\mathbb{R}} \newcommand{\ip}[2]{\langle #1, #2 \rangle} \newcommand{\T}{\mathsf{T}} \newcommand{\Proj}{\mathcal{P}} \newcommand{\Tr}{\mathrm{Tr}} \newcommand{\A}{\mathcal{A}}

Setup

Fix an unknown M \in \R^{n_1 \times n_2} with rank r. Suppose that we have knowledge of both a linear operator \A : \R^{n_1 \times n_2} \longrightarrow \R^{m} and its measurements b := \A(M) of the unknown matrix. In what follows, for any matrix Z, let \sigma_1(Z) \geq \sigma_2(Z) \geq ... \geq \sigma_r(Z) > 0 denote the singular values of Z, \norm{Z}_* = \sum_{i=1}^{r} \sigma_i(Z) denote the nuclear norm, and \norm{Z} = \sigma_1(Z) denote the operator (spectral) norm.

When r \ll \min(n_1, n_2), a standard heuristic to recover M from the measurements is to solve the following optimization program \begin{equation} \min_{X \in \R^{n_1 \times n_2}} \;\; \norm{X}_* \text{ s.t. } \A(X) = b \:. \label{eq:opt_prog} \end{equation} A natural question to ask is: What properties of \A and M ensure that M is the unique minimizer of \eqref{eq:opt_prog}?

Many researchers have successfully answered this question for a variety of \A's and M's. In this post, I will outline the structural part of one type of proof strategy to answer such a question, known as coming up with a dual certificate. This should provide some context for reading the proofs.

Subdifferential of the nuclear norm

A starting point is to first characterize the subdifferential of the nuclear norm. This is important because the subdifferential encompasses the set of all descent directions of \norm{M}_*.

Let M = U\Sigma V^* denote the SVD of M, with \Sigma = \mathrm{diag}(\sigma_1(M), ..., \sigma_r(M)). We first argue that \begin{equation} \partial \norm{M}_* = \{ UV^* + W : P_U W = 0, W P_V = 0, \norm{W} \leq 1 \} := G_1 \:. \label{eq:nuc_norm_subdiff} \end{equation} Here, P_U is the orthogonal projector onto the columns of U and similarly for P_V. By the variational characterization of the nuclear norm, \norm{M}_* = \sup_{\norm{Z} \leq 1} \ip{M}{Z}, where \ip{M}{Z} := \Tr(Z^* M). Hence by the standard rules for subgradients of suprema, we know that \partial \norm{M}_* = \{ Z : \ip{M}{Z} = \norm{M}_*, \norm{Z} \leq 1 \} := G_2 \:. We now check that G_1 = G_2. If UV^* + W \in G_1, then \ip{M}{UV^* + W} = \ip{M}{UV^*} + \ip{M}{W} = \Tr(\Sigma) = \norm{M}_* \:. On the other hand, for any x \in \R^{n_2}, (UV^* + W)x = (UV^* + W)(P_V x + P_V^\perp x) = UV^* P_V x + WP_V^\perp x \:. Taking norms squared, \norm{UV^* P_V x + WP_V^\perp x}^2_2 = \norm{UV^* P_V x}^2_2 + \norm{W P_V^\perp x}^2_2 \leq \norm{P_V x}^2_2 + \norm{P_V^\perp x}^2_2 = \norm{x}^2_2 \:, which shows that \norm{UV^* + W} \leq 1, and hence G_1 \subseteq G_2.

The G_2 \subseteq G_1 direction is trickier. I do not know an elementary way to show it. Look here for a more complete proof.

The tangent space of M

We now give a more succient characterization of \partial \norm{M}_* in terms of the tangent space of M. Let u_1, ..., u_r denote the columns of U and u_{r+1}, ..., u_{n_1} denote an orthonormal basis for U^\perp. Also, let v_1, ..., v_r and v_{r+1}, ..., v_{n_2} denote the same thing for V and V^\perp, respectively. Define the subspace T^\perp \subseteq \R^{n_1 \times n_2} as T^\perp := \mathrm{span}\{ u_i v_j^* \}_{i,j=r+1}^{n_1,n_2} \:. Since \{u_i v_j\}_{i,j=1}^{n_1, n_2} is an orthonormal basis for \R^{n_1 \times n_2}, we have that \mathrm{dim}(T^\perp) = (n_1-r)(n_2-r) and the orthogonal projector \Proj_{T}^\perp is \Proj_{T}^\perp(Z) = P_{U^\perp} Z P_{V^\perp} = (I - P_U) Z (I - P_V) \:. The orthogonal complement of T^\perp, call it T, is the subspace spanned by T = \mathrm{span}\{ u_i y^* + x v_j^* : 1 \leq i,j \leq r, x \in \R^{n_1}, y \in \R^{n_2} \} \:. We have that \dim(T) = n_1n_2 - \dim(T^\perp) = (n_1+n_2)r - r^2, and also \Proj_{T}(Z) = Z - \Proj_{T}^\perp(Z) = P_U Z + Z P_V - P_U Z P_V \:. The tangent space T plays a key role in the analysis. To see this, we can rewrite \partial \norm{M}_* as \begin{equation} \partial \norm{M}_* = \{ Z : \Proj_{T}(Z) = UV^*, \norm{\Proj_{T}^\perp(Z)} \leq 1 \} \:. \label{eq:nuc_norm_subdiff_1} \end{equation} To see this, by \eqref{eq:nuc_norm_subdiff}, if Z = UV^* + W \in \partial \norm{M}_*, then \Proj_{T}(W) = 0 and hence \Proj_{T}(Z) = UV^* and \Proj_{T}^\perp(Z) = W. But we know \norm{W} \leq 1, so we get \norm{\Proj_{T}^\perp(Z)} \leq 1. That is, Z satisfies \eqref{eq:nuc_norm_subdiff_1}. On the other hand, showing that any Z satisfying \eqref{eq:nuc_norm_subdiff_1} satisfies \eqref{eq:nuc_norm_subdiff} as well goes along the same line of reasoning.

Sufficient conditions for unique recovery

The key structural argument is Lemma 3.1 of Candès and Recht.

Lemma: Suppose there exists a Y \in \mathrm{Im}(\A^*) such that \Proj_{T}(Y) = UV^*, \qquad \norm{\Proj_{T}^\perp(Y)} < 1 \:. Also suppose the restriction of \A on T, call it \A\big|_{T} : T \longrightarrow \R^{m} is injective. Then M is the unique minimizer of \eqref{eq:opt_prog}.

Proof: Write any feasible point as M + \Delta, where \A(\Delta) = 0. Let Z = UV^* + W be any element in \partial \norm{M}_*. Write Y = \Proj_{T}(Y) + \Proj_{T}^\perp(Y) = UV^* + \Proj_{T}^\perp(Y), and rearrange terms to conclude that UV^* = Y - \Proj_{T}^\perp(Y). By definition of subgradient, \begin{align} \norm{M + \Delta}_* &\geq \norm{M}_* + \ip{UV^* + W}{\Delta} \nonumber\\ &= \norm{M}_* + \ip{Y - \Proj_{T}^\perp(Y) + W}{\Delta} \nonumber\\ &= \norm{M}_* + \ip{Y}{\Delta} + \ip{W - \Proj_{T}^\perp(Y)}{\Delta} \nonumber\\ &\stackrel{(a)}{=} \norm{M}_* + \ip{W - \Proj_{T}^\perp(Y)}{\Delta} \nonumber\\ &\stackrel{(b)}{=} \norm{M}_* + \ip{\Proj_{T}^\perp(W) - \Proj_{T}^\perp(Y)}{\Delta} \nonumber\\ &= \norm{M}_* + \ip{W}{\Proj_{T}^\perp(\Delta)} + \ip{\Proj_{T}^\perp(Y)}{\Proj_{T}^\perp(\Delta)} \nonumber\\ &\geq \norm{M}_* + \ip{W}{\Proj_{T}^\perp(\Delta)} - \norm{\Proj_{T}^\perp(Y)} \norm{\Proj_{T}^\perp(\Delta)}_*, \label{eq:basic_inequality} \end{align} where (a) uses the fact that Y \in \mathrm{Im}(\A^*) and \Delta \in \mathrm{kern}(\A), and hence \ip{Y}{\Delta} = 0, and (b) uses the fact that W \in T^\perp.

By the duality of the nuclear norm with the operator norm, we can choose a matrix F which satisfies \norm{F} \leq 1 and \ip{F}{\Proj_{T}^\perp(\Delta)} = \norm{\Proj_{T}^\perp(\Delta)}_*. We can check that \norm{\Proj_{T}^\perp(F)} \leq 1, since \norm{\Proj_{T}^\perp(F)} = \norm{P_{U^\perp} F P_{V^\perp}} \leq \norm{P_{U^\perp}} \norm{F} \norm{P_{V^\perp}} \leq 1 \:. Hence, UV^* + \Proj_{T}^\perp(F) \in \partial \norm{M}_*. Using the inequality derived in \eqref{eq:basic_inequality}, we conclude \begin{align*} \norm{M + \Delta}_* - \norm{M_*} &\geq \ip{\Proj_{T}^\perp(F)}{\Proj_{T}^\perp(\Delta)} - \norm{\Proj_{T}^\perp(Y)} \norm{\Proj_{T}^\perp(\Delta)}_* \\ &= \ip{F}{\Proj_{T}^\perp(\Delta)} - \norm{\Proj_{T}^\perp(Y)} \norm{\Proj_{T}^\perp(\Delta)}_* \\ &= \norm{\Proj_{T}^\perp(\Delta)}_* - \norm{\Proj_{T}^\perp(Y)} \norm{\Proj_{T}^\perp(\Delta)}_* \\ &= (1 - \norm{\Proj_{T}^\perp(Y)} ) \norm{\Proj_{T}^\perp(\Delta)}_* \:. \end{align*} It remains to interpret this inequality. Since \norm{\Proj_{T}^\perp(Y)} < 1, whenever \Proj_{T}^\perp(\Delta) \neq 0, \norm{M + \Delta}_* > \norm{M}_* and hence M + \Delta is not a minimizer. On the other hand, whenever \Proj_{T}^\perp(\Delta) = 0, we have \Delta \in \mathrm{kern}(\A\big|_{T}). But we assumed that \A\big|_{T} was injective, and hence \Delta = 0. \square

Constructing the dual certificate

The lemma above gives a proof strategy. If one can come up with a \lambda \in \R^{m} such that \A^*(\lambda) satisfies the hypothesis of the lemma, and additionally prove the injectivity condition, then one has certified the optimality of M. This, of course, is much easier said than done. I will leave you with a few references where this is the proof strategy. This is nowhere near an exhaustive list.