This power series converges for all X. Moreover, it converges (a) absolutely and (b) uniformly along with all its partial derivatives on any bounded set of matrices. This is a slightly technical result to state and prove which we'll establish in some optional videos (video about matrix norms and video proving convergence). Absolute convergence means that we can reorder terms in the power series without worrying; uniform convergence along with partial derivatives means that exp is differentiable and we can differentiate inside the sum.
Let t be a real variable. Then d by d t of exp t X equals X exp t X. Here, exp t X is a matrix whose entries are functions of t; d by d t of exp t X means the matrix you get by differentiating each entry of exp t X.
If X Y = Y X then exp of X times exp of Y equals exp of (X plus Y).
exp X is invertible, with inverse exp of minus X.
Properties of the matrix exponential
In the last video, we introduced the exponential of a matrix, exp(X) = sum from n = 0 to infinity of 1 over n factorial times X to the n. In this video, we'll prove some nice properties of exp.
Proof of the lemma
To prove 2, we write the definition exp(t X) = sum from n = 0 to infinity of 1 over n factorial times t to the n times X to the n and then differentiate term-by-term (allowed by part 1 of the Lemma). This gives: d by d t exp(t X) = sum from n = 0 to infinity one over n factorial n t to the n minus 1 times X to the n. The n = 0 term vanishes and the n over n factorial gives one over n minus 1 factorial, so (pulling out a common factor of X) this equals X times sum from n = 1 to infinity of t to the n minus 1 over n minus 1 factorial times X to the n minus 1. Now relabelling m = n minus 1 this becomes X times sum from m = 0 to infinity t to the m over m factorial times X to the m, which equals X exp(t X) as required.
Let's just illustrate this with an example: take X equals the matrix 0, -1, 1, 0; then by the calculation from the last video, we get: d by dt of exp t times the matrix 0, -1, 1, 0 equals d by dt of the matrix cos t, minus sine t, sine t, cos t, which equals the matrix minus sine t, minus cos t, cos t, minus sin t and this should be equal to X exp t X, which is the matrix 0, -1, 1, 0 times the matrix cos t, minus sine t, sine t, cos t, which equals the matrix minus sine t, minus cos t, cos t, minus sine t as expected.
(4) follows from (3) by taking Y equals minus X: since X commutes with -X, we get exp X times exp (minus X) equals exp of X minus X equals exp of 0, which equals the identity.
Finally, we will prove (3). Let's write out the product exp X exp Y equals sum from i = 0 to infinity of one over i factorial X to the i, times sum from j = 0 to infinity of one over j factorial X to the j. We will do various manipulations which are justified by the absolute convergence claimed in part (1) of the Lemma. First, let's take all the summations outside: exp X exp Y equals sum from i = 0 to infinity sum from j = 0 to infinity of one over i factorial times j factorial, times X to the i times Y to the j. Next, we will write out the double sum as a grid. i = 0 j = 0: I; i = 1, j = 0: X; i = 2, j = 0: a half X squared; i = 0, j = 1: Y; i = 1, j = 1: XY; i = 2, j = 1: etc; i = 0, j = 2 a half Y squared; etc etc etc Let's group together the constant term I, the linear terms X + Y, the quadratic terms a half X squared plus X Y plus a half Y squared, etc. Let's call these groups the k=0, k=1, k=2 terms etc (so k is the order of the terms in question).
With this regrouping of terms, our infinite sum becomes sum from k = 0 to infinity of (the sum from i = 0 to k of one over i factorial times k minus i factorial, times X to the i times Y to the k minus i (Here, in each group, j is determined by i as j equals k minus i.)
The inner sum is a finite sum. It looks a lot like a binomial expansion, but not quite. To make it look more like a binomial expansion, we multiply the sum by k factorial and simultaneously divide it by k factorial (leaving the answer unchanged): sum from k = 0 to infinity, one over k factorial, times sum from i = 0 to k, k factorial over i factorial times (k minus i) factorial, times X to the i times Y to the k minus i. The term k factorial over i factorial times (k minus i) factorial is the binomial coefficient k choose i, so this is equivalent to sum from k = 0 to infinity of (X + Y) to the k provided that X Y = Y X.
To finish the proof, note that sum from k = 0 to infinity of one over k factorial times (X + Y) to the k equals exp X plus Y by inspection.
We really need X Y = Y X. For example, (X plus Y) squared equals X squared plus X Y plus Y X plus Y squared, which is not equal to X squared plus two X Y plus Y squared, unless X Y = Y X.
The double-sum rearrangement in the proof of 4 is called the Cauchy product formula, and works whenever you have absolutely convergent power series.