Optional: Matrix norms

Matrix norms

We defined exp ( A ) = n = 0 1 n ! A n , A 0 = I , and claimed that it converges. What does it mean for a sequence of matrices to converge?

Definition:

A sequence A k of matrices converges to a matrix A if, for all ϵ > 0 , there exists an N such that A k - A < ϵ whenever k N .

Here, denotes a matrix norm, i.e. a way of measuring how "big" a matrix is.

Definition:

A matrix norm is any function : 𝔤 𝔩 ( n , 𝐑 ) 𝐑 satisfying:

  1. (triangle inequality) A + B A + B for all A , B 𝔤 𝔩 ( n , 𝐑 ) ;

  2. λ A = | λ | A for all A 𝔤 𝔩 ( n , 𝐑 ) and λ 𝐑 ;

  3. A = 0 if and only if A = 0 .

We will focus on two particular matrix norms.

L1 norm

Definition:

The L 1 -norm of A , is A L 1 = i , j | A i , j | . So a matrix is "big in the L 1 norm" if it has an entry with large absolute value.

If A L 1 = 0 then A i j = 0 for all i , j , so A = 0 . If we rescale A by λ then all entries are scaled by λ , so λ A L 1 = | λ | A L 1 . The triangle inequality can be deduced by applying the triangle inequality for absolute values to each matrix entry.

Operator norm

Definition:

The operator norm of A is defined to be A o p = max { | A u | : u 𝐑 n , | u | = 1 } where | v | denotes the Euclidean length of v 𝐑 n , i.e. | v | = v 1 2 + + v n 2 .

Another way to think of this is: you take the unit sphere in 𝐑 n , you apply A to obtain an ellipsoid, and you take the furthest distance of a point on this ellipsoid from the origin.

Example:

Take A = ( 1 0 0 2 ) . The x -axis is fixed and the y -axis is rescaled by a factor of 2. The unit circle therefore becomes an ellipse with height 2 and width 1. The furthest point from the origin is a distance 2 from the origin (either the north or south pole), so A o p = 2 .

The unit circle is sent to an ellipse

To see that the operator norm is a norm, note that:

  1. If A o p = 0 then all points on the ellipsoid image of the unit sphere are a distance 0 from the origin, so the image of the unit sphere is the origin. Therefore A = 0 .

  2. If you rescale A by λ then the lengths of the vectors A u over which we are taking the maximum are rescaled by | λ | , so λ A o p = | λ | A o p .

  3. To prove the triangle inequality, note that | ( A + B ) u | = | A u + B u | | A u | + | B u | A o p + B o p . Since A + B o p = max { | ( A + B ) u | : | u | = 1 } , this shows A + B o p A o p + B o p .

Lipschitz equivalence

Lemma:

Any two matrix norms on 𝔤 𝔩 ( n , 𝐑 ) are Lipschitz equivalent. For the two norms we've met so far, this means there exist constants C 1 , C 2 , D 1 , D 2 (independent of M ) such that: C 1 A L 1 A o p C 2 A L 1 and D 1 A o p A L 1 D 2 A o p .

Remark:

These inequalities will be useful in the proof of convergence: sometimes it's easier to bound one or other, and this is telling us that if you can bound one then you can bound the other. It also tells us that the notion of convergence we defined (where we had implicitly picked a matrix norm) doesn't depend on which norm we picked.

We won't prove the lemma, but for those who are interested, it's true more generally that any two norms on a finite-dimensional vector space are Lipschitz equivalent. This fails for infinite-dimensional vector spaces, but we're working with 𝔤 𝔩 ( n , 𝐑 ) which is n 2 -dimensional.

Properties of the operator norm

We will now prove some useful properties of the operator norm. Since we're only focusing on this norm, we will drop the subscript o p and write it as A .

Lemma:
  1. For any vector v 𝐑 n , | A v | A | v | .

  2. A B A B .

  3. A m A m .

  1. Write v as | v | u for some unit vector u . Then | A v | = | v | | A u | A | v | because | A u | A by definition of the operator norm.

  2. Let u be a unit vector. We have | A B u | A | B u | A B | u | = A B using the first part of the lemma twice. This shows that the things we are maximising over to get A B are all less than A B , so A B A B .

  3. Lastly, A m A m - 1 A A m using the previous part of the lemma inductively.