If f k of X is a sequence of functions such that the norm of f k of X is less than or equal to M k for all k,X then, if M k converges, f k converges uniformly to a function f of X.
Optional: Convergence of exp
Convergence
Weierstrass M-test
We will now prove that the power series exp of X equals the sum from n equals zero to infinity of one over n factorial times X to the n converges absolutely and uniformly on bounded sets. This will be relatively painless, but then I will show that the power series for the partial derivatives converge uniformly on bounded sets (which is needed to deduce that \exp is continuously differentiable and that we can differentiate term by term). This will take a lot longer, and should only be watched by those desiring a course of analytic self-flagellation.
We will use the Weierstrass M-test:
Uniform convergence on bounded sets
So our strategy will be to find an upper bound for the operator norm of f k of X equals the sum from n = 0 to k of 1 over n factorial times X to the n which is independent of X. This will fail because \exp doesn't converge uniformly everywhere: we will need to assume the operator norm of X is less than R for some R. We will then deduce uniform convergence on this bounded subset of little g l n R. But if we're interested in a particular matrix then it will satisfy the operator norm of X is less than or equal to R for some R, so this is all we need.
Omitting the subscript op, we have the norm of f k of X equals the norm of the sum from n = 0 to k of 1 over n factorial times X to the n, which is less than or equal to the sum from n = 0 to k of 1 over n factorial times the norm of (X to the n), which is less than or equal to the sum of 1 over n factorial times (the norm of X) to the n by the triangle inequality and the fact that the norm of (X to the n) is less than or equal to the norm of (X to the n). Now, assuming the norm of X is less than or equal to R, we get the norm of f k of X is less than or equal to the sum from n = 0 to k of R to the n over n factorial. Define M k to be the sum from n = 0 to k of R to the n over n factorial and observe that M k converges to exp of R as k goes to infinity.
Note that R is a number, so here we're just using convergence of the usual exponential function rather than matrix exp. We now apply the Weierstrass M-test and deduce uniform convergence of \exp for the norm of X less than or equal to R.
Absolute convergence
We've actually proved absolute convergence along the way. This means that if you take norms of every term in the power series then it still converges.
Absolute convergence is the property that allows us to do rearrangements without changing the value of the sum.
Derivatives
We also want to be able to differentiate exp of X term-by-term. For that, we need to show that the sequence of partial derivatives of partial sums converges uniformly on bounded sets. This is where the nightmare begins. Watch on at your own risk.
What do we mean by partial derivative? exp of X is a matrix whose entries are functions of the n squared matrix entries X 1,1; X 1,2; dot dot dot; X 1,n; X 2,1; X 2,2; dot dot dot; X n,n. I'm interested in taking the partial derivative of an entry of exp of X with respect to a variable X i j.
For example, partial d by d X 1,2 of (X 1,1 times X 1,2) equals X 1,1 and partial d by d X 1,1 of X 2,2 equals 0.
We are therefore interested in applying the Weierstrass M-test to the sequence of partial sums: f K of X equals partial d by d X i,j of the sum from n=0 to capital K of 1 over n factorial X to the n We will now prove that the L 1-norm of f K of X is bounded by M K for some convergent sequence M K. Since the L 1-norm of a matrix is the sum of absolute values of entries, this means we need to bound the absolute value of (partial d by d X i, of the sum from n=0 to capital K fo 1 over n factorial times the k,l component of (X to the n)).
This is a finite sum, so we can take the derivative inside the sum and get the absolute value of the sum from n=0 to K of 1 over n factorial of partial d by d X i,j of the k,l component of X to the n For a start, what is the k,l component of X to the n? If X is an N-by-N matrix (to avoid notation-clashes) the k,l component of X to the n equals the sum from i_1 = 1 to N dot dot dot the sum from i_n minus 1 equals 1 to N of (X k,i_1; times X i_1,i_2; times dot dot dot; times X i_n minus 1,l) so, using the product rule, and just writing one big sum instead of lots of sums, we get partial d by d X i,j of the k,l component of X to the n equals the sum of d X k,i_1 by d X i,j times X i_1,i_2 dot dot dot X i_(n minus 1),l; plus X k,i_1 partial d X i_1,i_2 by d X i,j dot dot dot X i_(n minus 1) l plus dot dot dot plus X k,i_1 X i_1,i_2 dot dot dot partial d X i_(n minus 1),l by d X i,j Note that partial d X k,i_1 by d X i,j is either 1 or 0. It's 1 if k=i and i_1=j. In terms of the Kronecker delta delta a,b which equals 0 if a and b are different at 1 if a=b, this means we have the sum of (delta k,i times delta i_1,j times X i_1,i_2, dot dot dot X i_(n minus 1),l + X k,i times delta i,i_1 delta i_2,j dot dot dot times X i_(n minus 1),l, plus dot dot dot, plus X i,k times X i_1,i_2, times dot dot dot times delta i_(n minus 1),i delta l,j In the first term, we can group delta i_1,j times X i_1,i_2, dot dot dot X i_(n minus 1),l and when we sum over i_1, i_2, up to i_(n minus 1) this is just the j,l matrix entry of I times X to the n minus 1 (because delta i_1,j is the i_1,j matrix entry of I).
Similarly, in the second term, we can group X k,i_1 delta i_1,i and delta i_2,j dot dot dot times X i_(n minus 1),l and, when we perform all the sums, these become X k,i and the j,l component of (X to the n minus 2).
Proceeding in this manner, the sum goes away and we get: partial d by d X i,j of the k,l component of X to the n equals delta k,i times the j,l component of X to the n minus 1; plus X k,i times the j,l component of X to the n minus 2; plus dot dot dot plus the k,i component of X to the n minus 1 times delta j,l
We're trying to bound the absolute value of (partial d by d X i, of the sum from n=0 to capital K fo 1 over n factorial times the k,l component of (X to the n)). and we now know this is equal to the absolute value of the sum from n = 0 to K of 1 over n faactorial times (delta k,i times the j,l component of X to the n minus 1; plus X k,i times the j,l component of X to the n minus 2; plus dot dot dot plus the k,i component of X to the n minus 1 times delta j,l) Using the triangle inequality, this is bounded above by the sum from n = 0 to K of one over n factorial times (absolute value of delta k,i times the absolute value of the j,l component of X to the n minus 1; plus the absolute value of X k,i times the absolute value of the j,l entry of X to the n minus 2; plus dot dot dot, plus the absolute value of the k,i component of X to the n minus 1 times the absolute value of delta j,l Note that these are really absolute values because we are working with matrix entries rather than matrices.
Each term inside the bracket has the form absolute value of the k,i entry of X to the m times the absolute value of the j,l entry of X to the n minus m minus 1 and we want to bound such quantities. By definition of the L 1 norm, we have the absolute value of X k,i to the m is less than or equal to the L 1 norm of X and, because the L 1 and operator norms are Lipschitz equivalent, we have the L1 norm of X to the m is less than or equal to C times the operator norm of X to the m, which is less than or equal to C times (the opertor norm of X) to the m for some Lipschitz constant C.
Again, using Lipschitz equivalence we get C times (the operator norm of X) to the m is less than or equal to C times D to the m times the L 1 norm of X for some Lipschitz constant D. Therefore we get the absolute value of the k,i entry of X to the m, times the absolute value of the j,l entry of X to the n minus m minus 1 is less than or equal to C times D to the m times (the L1 norm of X) to the m, times C times D to the n minus m minus 1 times (the L1 norm of X) to the n minus m minus 1
All together, we get the absolute value of partial d by d X i,j of the sum from n=0 to K of 1 over n factorial times the k,l component of X to the n, is less than or equal to the sum from n=0 to K for one over n factorial times n C squared times D to the n minus 1 times the L1 norm of X) to the n minus 1, which equals C squared times the sum of 1 over (n minus 1) factorial times (D times X) to the (n-1) So if we assume the L1 norm of X is bounded above by R then this is bounded above by C squared times exp of D R and Weierstrass's M-test applies, so the partial derivatives of the partial sums converge uniformly on bounded sets of matrices.
Don't say I didn't warn you.