Today's post will be about characterizing the subdifferential of a norm in an inner product space. Let be an arbitrary vector norm and let \partial (\cdot) denote the subdifferential of a function. We will show that \newcommand{\ip}[2]{\langle #1, #2 \rangle} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\abs}[1]{\left| #1 \right|} \partial \norm{x} = \{ v : \ip{v}{x} = \norm{x}, \norm{v}_* \leq 1 \} where \norm{x}_* := \sup_{z:\norm{z} \leq z} \ip{x}{z} is the dual norm to \norm{\cdot}.

To do this, we will prove two directions. First, define the set \mathcal{G}(x) := \{ v : \ip{v}{x} = \norm{x}, \norm{v}_* \leq 1 \} We first show that if v \in \mathcal{G}(x), then v \in \partial \norm{x}, showing that \mathcal{G}(x) \subset \partial\norm{x}. Let v \in \mathcal{G}(x), and let y be arbitrary. Then \begin{align*} \norm{x} + \ip{v}{y-x} = \norm{x} + \ip{v}{y} - \ip{v}{x} = \ip{v}{y} \stackrel{(a)}{\leq} \norm{v}_* \norm{y} \leq \norm{y} \end{align*} where in (a) we used Holder's inequality which states \abs{\ip{x}{y}} \leq \norm{x} \norm{y}_* for any dual pair of norms. Since y was arbitrary, this holds for all y and therefore v \in \mathcal{G}(x).

The other direction is slightly trickier. To make it easier, we first introduce the idea of a Fenchel conjugate of a function. Given a real valued function f(x) on an inner product space, define the conjugate f^{\star}(y) of f as f^{\star}(y) = \sup_{z} \ip{y}{z} - f(z) It turns out that the Fenchel conjugate of a norm is just the indicator function on the unit ball of the dual norm. That is, \norm{y}^{\star} = \begin{cases} 0 & \text{if } \norm{y}_* \leq 1 \\ + \infty & \text{o.w.} \end{cases} For a proof, turn to Section 1.4 of Bach's writeup on optimization with sparse penalties functions, or a quick google search (or try it yourself!).

Equipped with this, we are ready to proceed. Let v \in \partial \norm{x}. Then for every y we have \begin{align*} \norm{y} \geq \norm{x} + \ip{v}{y-x} \Leftrightarrow \ip{v}{y} - \norm{y} \leq \ip{v}{x} - \norm{x} \end{align*} Since this holds for every y, we can take the supremum over all y's \sup_{y} \ip{v}{y} - \norm{y} \leq \ip{v}{x} - \norm{x} But notice the LHS is simply \norm{v}^{\star}, and therefore \begin{cases} 0 & \text{if } \norm{v}_* \leq 1 \\ + \infty & \text{o.w.} \end{cases} \leq \ip{v}{x} - \norm{x} If \norm{v}_* > 1, then this cannot possibly hold (since the RHS will always be finite for a fixed v), so we have \norm{v}_* \leq 1. But since \norm{v}_* \leq 1, we have 0 \leq \ip{v}{x} - \norm{x} \stackrel{(a)}{\leq} \norm{v}_*\norm{x} - \norm{x} \leq 0 Where we used Holder's inequality again in (a). Therefore, all inequalities are strict, which yields \ip{v}{x} = \norm{x}. But this means v \in \mathcal{G}(x), which shows \mathcal{G}(x) \supset \partial \norm{x}. Therefore \mathcal{G}(x) = \partial \norm{x}, which yields the result.