Today's post will be about characterizing the subdifferential of a norm in an inner product space. Let $\lVert \cdot \rVert$ be an arbitrary vector norm and let $\partial (\cdot)$ denote the subdifferential of a function. We will show that $$ \newcommand{\ip}[2]{\langle #1, #2 \rangle} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\abs}[1]{\left| #1 \right|} \partial \norm{x} = \{ v : \ip{v}{x} = \norm{x}, \norm{v}_* \leq 1 \} $$ where $\norm{x}_* := \sup_{z:\norm{z} \leq z} \ip{x}{z}$ is the dual norm to $\norm{\cdot}$.

To do this, we will prove two directions. First, define the set $$ \mathcal{G}(x) := \{ v : \ip{v}{x} = \norm{x}, \norm{v}_* \leq 1 \} $$ We first show that if $v \in \mathcal{G}(x)$, then $v \in \partial \norm{x}$, showing that $\mathcal{G}(x) \subset \partial\norm{x}$. Let $v \in \mathcal{G}(x)$, and let $y$ be arbitrary. Then $$ \begin{align*} \norm{x} + \ip{v}{y-x} = \norm{x} + \ip{v}{y} - \ip{v}{x} = \ip{v}{y} \stackrel{(a)}{\leq} \norm{v}_* \norm{y} \leq \norm{y} \end{align*} $$ where in (a) we used Holder's inequality which states $\abs{\ip{x}{y}} \leq \norm{x} \norm{y}_*$ for any dual pair of norms. Since $y$ was arbitrary, this holds for all $y$ and therefore $v \in \mathcal{G}(x)$.

The other direction is slightly trickier. To make it easier, we first introduce the idea of a Fenchel conjugate of a function. Given a real valued function $f(x)$ on an inner product space, define the conjugate $f^{\star}(y)$ of $f$ as $$ f^{\star}(y) = \sup_{z} \ip{y}{z} - f(z) $$ It turns out that the Fenchel conjugate of a norm is just the indicator function on the unit ball of the dual norm. That is, $$ \norm{y}^{\star} = \begin{cases} 0 & \text{if } \norm{y}_* \leq 1 \\ + \infty & \text{o.w.} \end{cases} $$ For a proof, turn to Section 1.4 of Bach's writeup on optimization with sparse penalties functions, or a quick google search (or try it yourself!).

Equipped with this, we are ready to proceed. Let $v \in \partial \norm{x}$. Then for every $y$ we have $$ \begin{align*} \norm{y} \geq \norm{x} + \ip{v}{y-x} \Leftrightarrow \ip{v}{y} - \norm{y} \leq \ip{v}{x} - \norm{x} \end{align*} $$ Since this holds for every $y$, we can take the supremum over all $y$'s $$ \sup_{y} \ip{v}{y} - \norm{y} \leq \ip{v}{x} - \norm{x} $$ But notice the LHS is simply $\norm{v}^{\star}$, and therefore $$ \begin{cases} 0 & \text{if } \norm{v}_* \leq 1 \\ + \infty & \text{o.w.} \end{cases} \leq \ip{v}{x} - \norm{x} $$ If $\norm{v}_* > 1$, then this cannot possibly hold (since the RHS will always be finite for a fixed $v$), so we have $\norm{v}_* \leq 1$. But since $\norm{v}_* \leq 1$, we have $$ 0 \leq \ip{v}{x} - \norm{x} \stackrel{(a)}{\leq} \norm{v}_*\norm{x} - \norm{x} \leq 0 $$ Where we used Holder's inequality again in (a). Therefore, all inequalities are strict, which yields $\ip{v}{x} = \norm{x}$. But this means $v \in \mathcal{G}(x)$, which shows $\mathcal{G}(x) \supset \partial \norm{x}$. Therefore $\mathcal{G}(x) = \partial \norm{x}$, which yields the result.