Optional: Convergence of exp

Convergence

Weierstrass M-test

0.00 We will now prove that the power series $\exp(X)=\sum_{n\geq 0}\frac{1}{n!}X^{n}$ converges absolutely and uniformly on bounded sets. This will be relatively painless, but then I will show that the power series for the partial derivatives converge uniformly on bounded sets (which is needed to deduce that $\exp$ is continuously differentiable and that we can differentiate term by term). This will take a lot longer, and should only be watched by those desiring a course of analytic self-flagellation.

0.40 We will use the Weierstrass M-test:

Theorem:

If $f_{k}(X)$ is a sequence of functions such that $\lVert f_{k}(X)\rVert\leq M_{k}$ for all $k, X$ then, if $M_{k}$ converges, $f_{k}$ converges uniformly to a function $f(X)$ .

Uniform convergence on bounded sets

2.35 So our strategy will be to find an upper bound for the operator norm of $f_{k}(X)=\sum_{n=0}^{k}\frac{1}{n!}X^{n}$ which is independent of $X$ . This will fail because $\exp$ doesn't converge uniformly everywhere: we will need to assume $\lVert X\rVert_{op}\leq R$ for some $R$ . We will then deduce uniform convergence on this bounded subset of $\mathfrak{gl}(n,\mathbf{R})$ . But if we're interested in a particular matrix then it will satisfy $\lVert X\rVert_{op}\leq R$ for some $R$ , so this is all we need.

3.45 Omitting the subscript $o p$ , we have $\lVert f_{k}(X)\rVert=\lVert\sum_{n=0}^{k}\frac{1}{n!}X^{n}\rVert\leq\sum_{n=0% }^{k}\frac{1}{n!}\lVert X^{n}\rVert\leq\sum_{n=0}^{k}\frac{1}{n!}\lVert X% \rVert^{n}$ by the triangle inequality and the fact that $\lVert X^{n}\rVert\leq\lVert X\rVert^{n}$ . Now, assuming $\lVert X\rVert\leq R$ , we get $\lVert f_{k}(X)\rVert\leq\sum_{n=0}^{k}\frac{R^{n}}{n!}$ Define $M_{k}:=\sum_{n=0}^{k}\frac{R^{n}}{n!}$ and observe that $M_{k}$ converges to $\exp(R)$ as $k\to\infty$ .

5.50 Note that $R$ is a number, so here we're just using convergence of the usual exponential function rather than matrix exp. We now apply the Weierstrass M-test and deduce uniform convergence of $\exp$ for $\lVert X\rVert\leq R$ .

Absolute convergence

Remark:

6.30 We've actually proved absolute convergence along the way. This means that if you take norms of every term in the power series then it still converges.

Remark:

Absolute convergence is the property that allows us to do rearrangements without changing the value of the sum.

Derivatives

7.30 We also want to be able to differentiate $\exp(X)$ term-by-term. For that, we need to show that the sequence of partial derivatives of partial sums converges uniformly on bounded sets. This is where the nightmare begins. Watch on at your own risk.

8.45 What do we mean by partial derivative? $\exp(X)$ is a matrix whose entries are functions of the $n^{2}$ matrix entries $X_{11},X_{12},\ldots,X_{1n},X_{21},X_{22},\ldots X_{nn}$ . I'm interested in taking the partial derivative of an entry of $\exp(X)$ with respect to a variable $X_{ij}$ .

Example:

9.45 For example, $\frac{\partial}{\partial X_{12}}(X_{11}X_{12})=X_{11}$ and $\frac{\partial}{\partial X_{11}}(X_{22})=0$ .

10.40 We are therefore interested in applying the Weierstrass M-test to the sequence of partial sums: $f_{K}(X)=\frac{\partial}{\partial X_{ij}}\left(\sum_{n=0}^{K}\frac{1}{n!}X^{n}\right)$ We will now prove that the $L^{1}$ -norm of $f_{K}(X)$ is bounded by $M_{K}$ for some convergent sequence $M_{K}$ . Since the $L^{1}$ -norm of a matrix is the sum of absolute values of entries, this means we need to bound $\left|\frac{\partial}{\partial X_{ij}}\left(\sum_{n=0}^{K}\frac{1}{n!}X^{n}% \right)_{k\ell}\right|.$

12.50 This is a finite sum, so we can take the derivative inside the sum and get $\left|\sum_{n=0}^{K}\frac{1}{n!}\frac{\partial}{\partial X_{ij}}(X^{n})_{k\ell% }\right|.$ 13.30 For a start, what is $(X^{n})_{k\ell}$ ? If $X$ is an $N$ -by- $N$ matrix (to avoid notation-clashes) $(X^{n})_{k\ell}=\sum_{i_{1}=1}^{N}\cdots\sum_{i_{n-1}=1}^{N}X_{ki_{1}}X_{i_{1}% i_{2}}\cdots X_{i_{n-1}\ell}$ so, using the product rule, and just writing one big sum instead of lots of sums, we get $\frac{\partial}{\partial X_{ij}}(X^{n})_{k\ell}=\sum\left(\frac{\partial X_{ki% _{1}}}{\partial X_{ij}}X_{i_{1}i_{2}}\cdots X_{i_{n-1}\ell}+X_{ki_{1}}\frac{% \partial X_{i_{1}i_{2}}}{\partial X_{ij}}\cdots X_{i_{n-1}\ell}+\cdots+X_{ki_{% 1}}\cdots\frac{\partial X_{i_{n-1}\ell}}{\partial X_{ij}}\right)$ 16.20 Note that $\partial X_{ki_{1}}/\partial X_{ij}$ is either 1 or 0. It's 1 if $k=i$ and $i_{1}=j$ . In terms of the Kronecker delta $\delta_{ab}=\begin{cases}0&\mbox{ if }a\neq b\\ 1&\mbox{ if }a=b\end{cases}$ , this means we have $\sum\left(\delta_{ki}\delta_{i_{1}j}X_{i_{1}i_{2}}\cdots X_{i_{n-1}\ell}+X_{ki% _{1}}\delta_{i_{1}i}\delta_{i_{2}j}X_{i_{2}i_{3}}\cdots X_{i_{n-1}\ell}+\cdots% +X_{ki_{1}}X_{i_{1}i_{2}}\cdots X_{i_{n-2}i_{n-1}}\delta_{i_{n-1}i}\delta_{% \ell j}\right).$ 18.30 In the first term, we can group $\delta_{i_{1}j}X_{i_{1}i_{2}}\cdots X_{i_{n-1}\ell}$ and when we sum over $i_{1},i_{2},\ldots,i_{n-1}$ this is just the $j\ell$ matrix entry of $IX\cdots X=X^{n-1}$ (because $\delta_{i_{1}j}$ is the $i_{1}j$ matrix entry of $I$ ).

19.45 Similarly, in the second term, we can group $X_{ki_{1}}\delta_{i_{1}i}$ and $\delta_{i_{2}j}X_{i_{2}i_{3}}\cdots X_{i_{n-1}\ell}$ and, when we perform all the sums, these become $X_{ki}$ and $(X^{n-2})_{j\ell}$ .

20.35 Proceeding in this manner, the sum goes away and we get: $\frac{\partial}{\partial X_{ij}}(X^{n})_{k\ell}=\delta_{ki}X^{n-1}_{j\ell}+X_{% ki}X^{n-2}_{j\ell}+\cdots+X^{n-1}_{ki}\delta_{j\ell}.$

22.00 We're trying to bound $\left|\frac{\partial}{\partial X_{ij}}\left(\sum_{n=0}^{K}\frac{1}{n!}X^{n}% \right)_{k\ell}\right|$ and we now know this is equal to $\left|\sum_{n=0}^{K}\frac{1}{n!}(\delta_{ki}X^{n-1}_{j\ell}+X_{ki}X^{n-2}_{j% \ell}+\cdots+X^{n-1}_{ki}\delta_{j\ell})\right|$ 23.00 Using the triangle inequality, this is bounded above by $\sum_{n=0}^{K}\frac{1}{n!}(|\delta_{ki}||X^{n-1}_{j\ell}|+|X_{ki}||X^{n-2}|_{j% \ell}+\cdots+|X^{n-1}_{ki}||\delta_{j\ell}|)$ Note that these are really absolute values because we are working with matrix entries rather than matrices.

24.20 Each term inside the bracket has the form $|X^{m}_{ki}||X^{n-m-1}_{j\ell}|$ and we want to bound such quantities. By definition of the $L^{1}$ norm, we have $|X_{ki}^{m}|\leq\lVert X^{m}\rVert_{L^{1}}$ and, because the $L^{1}$ and operator norms are Lipschitz equivalent, we have $\lVert X^{m}\rVert_{L^{1}}\leq C\lVert X^{m}\rVert_{op}\leq C\lVert X\rVert_{% op}^{m}$ for some Lipschitz constant $C$ .

26.15 Again, using Lipschitz equivalence we get $C\lVert X\rVert_{op}^{m}\leq CD^{m}\lVert X\rVert_{L^{1}}^{m}$ for some Lipschitz constant $D$ . Therefore we get $|X^{m}_{ki}||X^{n-m-1}_{j\ell}|\leq CD^{m}\lVert X\rVert_{L^{1}}^{m}CD^{n-m-1}% \lVert X\rVert_{L^{1}}^{n-m-1}=C^{2}D^{n-1}\lVert X\rVert_{L^{1}}^{n-1}.$

28.00 All together, we get $\left|\frac{\partial}{\partial X_{ij}}\left(\sum_{n=0}^{K}\frac{1}{n!}X^{n}% \right)_{k\ell}\right|\leq\sum_{n=0}^{K}\frac{1}{n!}nC^{2}D^{n-1}\lVert X% \rVert_{L^{1}}^{n-1}=C^{2}\sum_{n=0}^{K}\frac{1}{(n-1)!}(D\lVert X\rVert_{L^{1% }})^{n-1}$ So if we assume $\lVert X\rVert_{L^{1}}\leq R$ then this is bounded above by $C^{2}\exp(DR)$ and Weierstrass's M-test applies, so the partial derivatives of the partial sums converge uniformly on bounded sets of matrices.

Don't say I didn't warn you.