APPLIED MATRIX THEORY j Lecture Notes for Math 464/514 Presented by DR. MONIKA NITSCHE j Typeset and Editted by ERIC M. BENNER j STUDENTS PRESS December 3, 2013 Copyright © 2013 Contents 1 Introduction to Linear Algebra 1.1 1 Lecture 1: August 19, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 1 About the class, 1. Linear Systems, 1. Example: Application to boundary value problem, 2. Analysis of error, 3. Solution of the discretized equation, 4. 2 Matrix Inversion 2.1 5 Lecture 2: August 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Gaussian Elimination, 5. Inner-product based implementation, 7. other class notes, 8. Example: Gauss Elimination, 8. 2.2 Office hours and Lecture 3: August 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Example: Gauss Elimination, cont., 8. Operation Cost of Forward Elimination, 9. Cost of the Order of an Algorithm, 10. Validation of Lower/Upper Triangular Form, 11. Theoretical derivation of Lower/Upper Form, 11. 2.3 3 HW 1: Due August 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 12 Factorization 3.1 15 Lecture 4: August 26, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Elementary Matrices, 15. Solution of Matrix using the Lower/Upper factorization, 18. Sparse and Banded Matrices, 18. Motivation for Gauss Elimination with Pivoting, 19. 3.2 Lecture 5: August 28, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Motivation for Gauss Elimination with Pivoting, cont., 19. Discussion of well-posedness, 20. Gaussian elimination with pivoting, 21. 3.3 Lecture 6: August 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Discussion of HW problem 2, 22. PLU factorization, 22. 3.4 Lecture 7: September 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 24 PLU Factorization, 24. Triangular Matrices, 25. Multiplication of lower triangular matrices, 25. Inverse of a lower triangular matrix, 25. Uniqueness of LU factorization, 26. Existence of the LU factorization, 26. 3.5 Lecture 8: September 6, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 27 About Homeworks, 27. Discussion of ill-conditioned systems, 27. Inversion of lower triangular matrices, 28. Example of LU decomposition of a lower triangular matrix, 28. Banded matrix example, 29. iii Nitsche and Benner 3.6 Applied Matrix Theory Lecture 9: September 9, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 29 Existence of the LU factorization (cont.), 29. Rectangular matrices, 31. 3.7 4 HW 2: Due September 13, 2013 . . . . . . . . . . . . . . . . . . . . . . . 32 Rectangular Matrices 4.1 35 Lecture 10: September 11, 2013 . . . . . . . . . . . . . . . . . . . . . . . 35 Rectangular matrices (cont.), 35. Example of RREF of a Rectangular Matrix, 37. 4.2 Lecture 11: September 13, 2013 . . . . . . . . . . . . . . . . . . . . . . . 38 Solving Ax = b, 38. Example, 38. Linear functions, 39. Example: Transpose operator, 40. Example: trace operator, 40. Matrix multiplication, 41. Proof of transposition property, 42. 4.3 Lecture 12: September 16, 2013 . . . . . . . . . . . . . . . . . . . . . . . 42 Inverses, 42. Low rank perturbations of I, 43. The Sherman–Morrison Formula, 44. Finite difference example with periodic boundary conditions, 44. Examples of perturbation, 45. Small perturbations of I, 45. 4.4 Lecture 13: September 18, 2013 . . . . . . . . . . . . . . . . . . . . . . . 46 Small perturbations of I (cont.), 46. Matrix Norms, 47. Condition Number, 48. 4.5 5 HW 3: Due September 27, 2013 . . . . . . . . . . . . . . . . . . . . . . . 49 Vector Spaces 5.1 Lecture 14: September 20, 2013 . . . . . . . . . . . . . . . . . . . . . . . 55 Topics in Vector Spaces, 55. spaces, 57. 5.2 55 Field, 55. Vector Space, 56. Examples of function Lecture 15: September 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . 58 The four subspaces of Am×n , 58. 5.3 Lecture 16: September 25, 2013 . . . . . . . . . . . . . . . . . . . . . . . 61 The Four Subspaces of A, 62. Linear Independence, 63. 5.4 Lecture 17: September 27, 2013 . . . . . . . . . . . . . . . . . . . . . . . 64 Linear functions (rev), 64. Review for exam, 64. Previous lecture continued, 65. 5.5 Lecture 18: October 2, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 66 Exams and Points, 66. Continuation of last lecture, 66. 6 Least Squares 6.1 69 Lecture 19: October 4, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 69 Least Squares, 69. 6.2 Lecture 20: October 7, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 70 Properties of Transpose Multiplication, 71. The Normal Equations, 71. Exam 1, 73. 6.3 Lecture 21: October 9, 2013. . . . . . . . . . . . . . . . . . . . . . . . . . 74 Exam Review, 74. Least squares and minimization, 74. 6.4 HW 4: Due October 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 76 iv Nitsche and Benner 7 Applied Matrix Theory Linear Transformations 7.1 81 Lecture 22: October 14, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 81 Linear Transformations, 83. Examples of Linear Functions, 83. Matrix representation of linear transformations, 83. 7.2 Lecture 23: October 16, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 84 Basis of a linear transformation, 84. Action of linear transform, 87. Change of Basis, 88. 7.3 Lecture 24: October 21, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 89 Change of Basis (cont.), 89. 7.4 Lecture 25: October 23, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 91 Properties of Special Bases, 91. Invariant Subspaces, 93. 7.5 8 HW 5: Due November 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . . 94 Norms 8.1 99 Lecture 26: October 25, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . 99 Difinition of norms, 99. Vector Norms, 99. The two norm, 99. Matrix Norms, 101. Induced Norms, 102. 8.2 Lecture 27: October 28, 2013 . . . . . . . . . . . . . . . . . . . . . . . . .102 Matrix norms (review), 102. Frobenius Norm, 102. Induced Matrix Norms, 104. 8.3 Lecture 28: October 30, 2013 . . . . . . . . . . . . . . . . . . . . . . . . .106 The 2-norm, 106. 9 Orthogonalization with Projection and Rotation 9.1 109 Lecture 28 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109 Inner Product Spaces, 109. 9.2 Lecture 29: November 1, 2013 . . . . . . . . . . . . . . . . . . . . . . . .110 Inner Product Spaces, 110. (Gramm-Schmidt), 111. 9.3 Fourier Expansion, 111. Orthogonalization Process Lecture 30: November 4, 2013 . . . . . . . . . . . . . . . . . . . . . . . .112 Gramm–Schmidt Orthogonalization, 112. 9.4 Lecture 31: November 6, 2013 . . . . . . . . . . . . . . . . . . . . . . . .115 Unitary (orthogonal) matrices, 116. Rotation, 117. Reflection, 118. 9.5 9.6 HW 6: Due November 11, 2013 . . . . . . . . . . . . . . . . . . . . . . .118 Lecture 32: November 8, 2013 . . . . . . . . . . . . . . . . . . . . . . . .120 Elementary orthogonal projectors, 120. Subspaces of V, 121. Projectors, 121. 9.7 Elementary reflection, 121. Complimentary Lecture 33: November 11, 2013. . . . . . . . . . . . . . . . . . . . . . . .122 Projectors, 122. Representation of a projector, 123. 9.8 Lecture 34: November 13, 2013. . . . . . . . . . . . . . . . . . . . . . . .124 Projectors, 124. An×n , 126. 9.9 Decompositions of Rn , 125. Range Nullspace decomposition of HW 7: Due November 22, 2013 . . . . . . . . . . . . . . . . . . . . . . .126 v Nitsche and Benner Applied Matrix Theory 9.10 Lecture 35: November 15, 2013. . . . . . . . . . . . . . . . . . . . . . . .128 Range Nullspace decomposition of An×n , 128. Corresponding factorization of A, 129. 10 Singular Value Decomposition 131 10.1 Lecture 35 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131 Singular Value Decomposition, 131. 10.2 Lecture 36: November 18, 2013. . . . . . . . . . . . . . . . . . . . . . . .132 Singular Value Decomposition, 132. Existence of the Singular Value Decomposition, 133. 10.3 Lecture 37: November 20, 2013. . . . . . . . . . . . . . . . . . . . . . . .136 Review and correction from last time, 136. Singular Value Decomposition, 136. Geometric interpretation, 138. 10.4 Lecture 38: November 22, 2013. . . . . . . . . . . . . . . . . . . . . . . .139 Review for Exam 2, 139. Norms, 139. More major topics, 140. 10.5 HW 8: Due December 10, 2013 . . . . . . . . . . . . . . . . . . . . . . .142 10.6 Lecture 39: November 27, 2013. . . . . . . . . . . . . . . . . . . . . . . .144 Singular Value Decomposition, 144. SVD in Matlab, 145. 11 Additional Topics 149 11.1 Lecture 39 (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149 The Determinant, 149. 11.2 Lecture 40: December 2, 2013 . . . . . . . . . . . . . . . . . . . . . . . .150 Further details for class, 150. Diagonalizable Matrices, 150. Eigenvalues and eigenvectors, 150. Index 155 Other Contents 157 vi UNIT 1 Introduction to Linear Algebra 1.1 Lecture 1: August 19, 2013 About the class The textbook for the class will be Matrix Analysis and Applied Linear Algebra by Meyer. Another highly recommended text is Laub’s Matrix Analysis for Scientists and Engineers. Linear Systems A linear system may be of the general form Ax = b. (1.1.1) This may be represented in several equivalent ways. 2x1 + x2 − 3x3 = 18, −4x1 + 5x3 = −28, 6x1 + 13x2 = 37. (1.1.2a) (1.1.2b) (1.1.2c) This also may be put in matrix form 2 1 −3 x1 18 −4 0 5 x2 = −28. 6 13 0 x3 37 (1.1.3) Finally, a the third common form is vector form: 2 1 −3 18 −4 x1 + 0 x2 + 5 x3 = −28. 6 13 0 37 1 (1.1.4) Nitsche and Benner Unit 1. Introduction to Linear Algebra y y(t) t0 t1 t2 t3 ··· t tn Figure 1.1. Finite difference approximation of a 1D boundary value problem. Example: Application to boundary value problem We will use finite difference approximations on a rectangular grid to solve the system, − y 00 (t) = f (t), for t ∈ [0, 1], (1.1.5) with the boundary conditions y(0) = 0, y(1) = 0. (1.1.6a) (1.1.6b) This is a 1D version of the general Laplace equation represented by, − ∆u = f (1.1.7) − ∇2 u = f. (1.1.8) or in more engineering/science form The Laplace operator in cartesian coordinates, ∇2 u = ∇ · (∇u), = uxx + uyy + uzz . (1.1.9a) (1.1.9b) Finite Difference Approximation Let tj = j∆t, with j = 0, . . . , N . The approximate forms of the solution yj ≈ y(tj ). Now we need to approximate the derivatives with discrete values of the variables. The forward difference approximation is y 0 (tj ) = yj+1 − yj , tj+1 − tj (1.1.10) y 0 (tj ) = yj+1 − yj , ∆t (1.1.11) or 2 1.1. Lecture 1: August 19, 2013 Applied Matrix Theory The backward difference approximation is y 0 (tj ) = yj − yj−1 . ∆t (1.1.12) The centered difference approximation is yj+1 − yj−1 . (1.1.13) 2∆t Each of these are useful approximations to the first derivative that have varying properties when applied to specific differential equations. The second derivative may be approximated by combining the approximations of the first derivative y 0 (tj ) = 0 0 (y ) (tj ) ≈ 0 0 yj+ 1 − y j− 1 2 2 , ∆t yj+1 −yj y −y − j ∆tj−1 = ∆t , ∆t yj+1 − 2yj + yj−1 = . ∆t2 (1.1.14a) (1.1.14b) (1.1.14c) Analysis of error To understand the error of this approximation we may utilize the Taylor series. A general Taylor series is 1 1 f (x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 + f 000 (a)(x − a)3 + · · · (1.1.15) 2 3! By the Taylor remainder theorem, we may approximate the error with a special truncation of the series, 1 1 f (x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 + f 000 (ξ)(x − a)3 , 2 3! (1.1.16) or simply 1 f (x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 + O (x − a)3 . 2 The difference we are interested in to find the error is, E = y 00 (tj ) − y(tj+1 ) − 2y(tj ) + y(tj−1 ) ∆t2 (1.1.17) (1.1.18) The Taylor series, y(tj+1 ) = y(tj + ∆t) = y(tj ) + y 0 (tj )∆t + O ∆t2 , y(tj−1 ) = y(tj − ∆t) = y(tj ) − y 0 (tj )∆t + O ∆t2 (1.1.19a) (1.1.19b) will need to be substituted. A function g is said to be order 2, or g = O(h2 ), if, |g| ≤ Ch2 . 3 (1.1.20) Nitsche and Benner Unit 1. Introduction to Linear Algebra Solution of the discretized equation We now substitute the discrete difference, − yj+1 − 2yj + yj−1 = f (tj ), ∆t2 for j = 1, . . . , n − 1 (1.1.21) and the boundary conditions become y0 = 0, yn = 0. This gives the linear system which 2 −1 0 −1 2 −1 0 −1 2 . . . .. . . . . 0 ··· will need to be solved for the unknowns yi . ··· 0 y f (t1 ) 1 .. ... f (t2 ) . y 2 .. .. . 2 ... = ∆t 0 . . .. f (tn−2 ) . −1 yn−2 yn−1 f (tn−1 ) 0 −1 2 4 (1.1.22a) (1.1.22b) (1.1.23) UNIT 2 Matrix Inversion 2.1 Lecture 2: August 21, 2013 Previously we came up with a tridiagonal system for finite difference solution last time. Gaussian Elimination We want to solve Ax = b. Claim: Gaussian elimination: A = LU Notation: A = [aij ] (2.1.1) Lower triangular system Lx = b. In class we use underlines to indicate the vector. In general these vectors are column vectors, and we will use x| to indicate the row vector. Lower triangular system Lx = b `11 0 0 0 x1 b1 `21 `22 0 · · · 0 `31 `32 `21 .. .. 0 . = . .. .. .. . . . `n1 `n2 `n3 · · · `nn xn bn (2.1.2) `11 x1 = b1 `21 x1 + `22 x2 = b2 ··· `n1 x1 + `n2 x2 + · · · + `nn xn = bn (2.1.3a) (2.1.3b) (2.1.3c) (2.1.3d) or 5 Nitsche and Benner Unit 2. Matrix Inversion Rearranging to solve the equations, b1 `11 b2 − `21 x1 x2 = `22 ··· x1 = xi = (2.1.4a) (2.1.4b) (2.1.4c) bi − `i(i−1) xi−1 + · · · + `i1 x1 `ii (2.1.4d) The basic algorithm for solution of the above system in pseudo code follows: 1: x1 ← b1 /`11 2: for i ← 2, n do P 3: xi ← [bi − i−1 k=1 `ik xk ]/`ii 4: end for The operation count, Nops , becomes, Nops = 1 + n X i=2 1 + |{z} division 1 |{z} substitution + (i − 1) + (i − 2) . | {z } | {z } multiplication (2.1.5) addition Each of the terms arise directly from the steps of the algorithm shown above. ASIDE: Finite sums We need the following sums for our derivations of the operation counts, n X n(n + 1) , 2 (2.1.6) n(n + 1)(2n + 1) . 6 (2.1.7) i= i=1 n X i2 = i=1 Evaluating the operation count, Nops = 1 + n X (2i − 1), (2.1.8a) i=2 = n X (2i − 1), (2.1.8b) i=1 =2 n X ! i − n, (2.1.8c) i=1 = n(n + 1) − n, = n2 . 6 (2.1.8d) (2.1.8e) 2.1. Lecture 2: August 21, 2013 Applied Matrix Theory Implementation of lower triangular solution in Matlab We give a Matlab code for this solution, 1 2 3 4 5 6 7 8 9 10 11 12 13 function x = L t r i s o l (L , b ) % s o l v e $Lx = b$ , assuming $L { i i } \ ne 0$ n = length ( b ) ; % i n i t i a l i z e t h e s i z e o f your v e c t o r s x1 = b ( 1 ) / l ( 1 , 1 ) ; for i = 2 : n x( i ) = b( i ) ; f o r k = 1 : i −1 x( i ) = x( i ) − l ( i , k) ∗ x(k ); end end % end This would be saved as the code Ltrisol.m and would be run as >> L = ...; b = ...; >> x = Ltrisol(L, b) Warning: Matlab loops are very slow! Inner-product based implementation How do we re-write the code as inner products? We can reorder the second for-loop so that it is simply an inner-product, 1 2 3 4 5 6 7 8 9 10 function x = L t r i s o l (L , b ) % s o l v e $Lx = b$ , assuming $L { i i } \ ne 0$ n = length ( b ) ; % i n i t i a l i z e t h e s i z e o f your v e c t o r s x1 = b ( 1 ) / l ( 1 , 1 ) ; for i = 2 : n x ( i ) = ( b ( i ) − l ( i , 1 : i −1)∗x ( 1 : i −1))/ l ( i , i ) ; end % end Note that the l(i,1:i-1) term is a row vector and x(1:i-1) is a column vector so this code will work fine. Recall that this required that x be initialized as a column vector. The inner part can also be rewritten more cleanly as, 1 function x = L t r i s o l (L , b ) 2 % s o l v e $Lx = b$ , assuming $L { i i } \ ne 0$ 3 n = length ( b ) ; 4 % i n i t i a l i z e t h e s i z e o f your v e c t o r s 5 x1 = b ( 1 ) / l ( 1 , 1 ) ; 6 for i = 2 : n 7 k = 1 : i −1; 8 x ( i ) = (b( i ) − l ( i , k )∗ x ( k ))/ l ( i , i ) ; 9 end 7 Nitsche and Benner Unit 2. Matrix Inversion 10 % 11 end Office hours and other class notes Office hours will be from 12–1 on MWF, the web address is, www.math.unm.edu/~nitsche/ math464.html. Example: Gauss Elimination Example: 2x1 − x2 + 3x3 = 13 −4x1 + 6x2 − 5x3 = −28 6x1 + 13x2 − 16x3 = 37 (2.1.9a) (2.1.9b) (2.1.9c) Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1 and R3 → R3 − (−3)R1 . 2x1 − x2 + 3x3 = 13 4x2 + x3 = −2 16x2 + 7x3 = −2 (2.1.10a) (2.1.10b) (2.1.10c) Next step will be R3 → R3 − (4)R2 . 2.2 Lecture 3: August 23, 2013 Example: Gauss Elimination, cont. Example: 2x1 − x2 + 3x3 = 13 −4x1 + 6x2 − 5x3 = −28 6x1 + 13x2 − 16x3 = 37 (2.2.1a) (2.2.1b) (2.2.1c) Let’s perform each step in full equation form. So we execute the steps R2 → R2 − (−2)R1 and R3 → R3 − (−3)R1 . 2x1 − x2 + 3x3 = 13 4x2 + x3 = −2 16x2 + 7x3 = −2 8 (2.2.2a) (2.2.2b) (2.2.2c) 2.2. Lecture 3: August 23, 2013 Applied Matrix Theory Next step will be R3 → R3 − (4)R2 . 2x1 − x2 + 3x3 = 13 4x2 + x3 = −2 3x3 = 6 (2.2.3a) (2.2.3b) (2.2.3c) Now we begin the backward substitution. x3 = 2; x2 = (−2 − x3 )/4, = −1; x1 = (13 + x2 − 3x3 )/2, = 3. (2.2.4a) (2.2.4b) (2.2.4c) (2.2.4d) (2.2.4e) Gauss Elimination is forward elimination and backward substitution. Now we will do the same problem in matrix form, 2 −1 3 13 2 −1 3 13 −4 6 −5 −28 → 0 4 1 −2 , (2.2.5a) 6 13 16 37 0 16 7 −2 2 −1 3 13 4 1 −2 . (2.2.5b) → 0 0 0 3 6 Operation Cost of Forward Elimination Now we want to know the operation count for the forward elimination step when we take A → U without pivoting for a general n × n matrix, A = [aij ]. As an example of each step: a11 a21 a31 a41 a51 a12 a22 a32 a42 a52 a13 a23 a33 a43 a53 a14 a24 a34 a44 a54 a15 a11 a25 0 0 a35 → 0 a45 0 a55 These operations are given by, rowj → not be close to zero or we will need to The next step, a11 0 → 0 0 0 a12 a022 a032 a042 a052 a13 a023 a033 a043 a053 a15 a025 a035 a045 a055 a14 a024 a034 a044 a054 (2.2.6a) a rowj − `ij rowi , where `ij = aijii if aii 6= 0 (aii should a1j use pivoting). An example, a1j → aij − a11 a1j = 0. a12 a022 0 0 0 a13 a023 a0033 a0043 a0053 9 a14 a024 a0034 a0044 a0054 a15 a025 a0035 a0045 a0055 (2.2.6b) Nitsche and Benner Unit 2. Matrix Inversion y y y(t) t0 t1 t2 t3 y(t) t ··· t0 t2 t4 t6 t8 t10 t12 t14 t16 · · · t4n tn (a) n grid t (b) 4n grid Figure 2.1. One-dimensional discrete grids. At ith step (i = 1 : n − 1), B(n−i)×(n−i) → B̃(n−i)×(n−i) , (2.2.7) the cost of the individual step: n − i + 2(n − i)2 . The total cost is thus, | {z } | {z } comp `ij Nops = comp aij n−1 X (n − i) + 2(n − 1)2 (2.2.8a) i=1 Let k = n − i then i = 1 → k = n − 1 and i = n − 1 → k = n − (n − 1) = 1 = 1 X (k + 2k 2 ), (2.2.8b) k=n−1 = (n − 1)n (n − 1)n(2(n − 1) + 1) +2 , 2 } 6 | {z | {z } O(n2 ) (2.2.8c) O(n3 ) 3 ≈O n . (2.2.8d) This means that the problem scales with order 3. Cost of the Order of an Algorithm For an order 3 algorithm, if you increase the size of your matrix by a factor of 2, the expense of computer time will increase by a factor of 8. Similarly, if it took one day to solve a boundary value problem in 1D with n = 1000, then it will take 64 days to do n = 4000 (see figure 2.1). Alternatively, if you are doing a 2D simulation, increasing by a factor of 4, as shown in figure 2.2, would increase the domain to 16 and thus the calculations would increase to 163 . This gets very expensive! This is one of the reasons that models of phenomena such as the weather is very difficult. 10 2.2. Lecture 3: August 23, 2013 Applied Matrix Theory y y yn y4n y0 y0 x x0 xn x x0 (a) n × n grid x4n (b) 4n × 4n grid Figure 2.2. Two-dimensional discrete grids. Validation of Lower/Upper Triangular Form Consider that we have the Gaussian Elimination with A = LU, where L= 1 0 . `ij 1 (2.2.9) Check our previous system: 2 −1 3 1 0 0 2 −1 3 −4 4 1 . 6 −5 = −2 1 0 0 6 13 16 3 4 1 0 0 3 (2.2.10) This works! Theoretical derivation of Lower/Upper Form We want to show that Gauss elimination naturally leads to the LU form using elementary row operations. The three elementary operations are: 1. Multiply row by α; 2. Switch rowi and rowj ; 3. Add multiple of rowi to rowj . All are equivalent to pre-multiplying A by an elementary matrix. Let’s illustrate these: 11 Nitsche and Benner 1. Multiply by α. 1 0 0 0 1 0 · · · 0 0 α .. . 0 0 0 ··· {z | Ei Unit 2. Matrix Inversion a11 0 a21 0 0 a31 .. . an1 1 } a12 a13 a1n a11 a12 a13 a1n a22 a23 · · · a2n a22 a23 · · · a2n a21 αa31 αa32 αa33 a32 a33 a3n αa 3n = .. .. .. . . . .. ... . . . . an2 an3 · · · ann an1 an2 an3 · · · ann (2.2.11a) 2.3 Homework Assignment 1: Due Friday, August 30, 2013 1. Use Taylor series expansions of f (x ± h) about x to show that f 00 (x) = f (x + h) − 2f (x) + f (x − h) h2 (4) 4 − f (x) + O h . h2 12 (2.3.1) 2. Consider the two-point boundary value problem y 00 (x) = ex , 1 y(−1) = , e y(1) = e (2.3.2) where x ∈ [−1, 1], Divide the interval [−1, 1] into N equal subintervals and apply the finite difference method presented in class to find the approximate the solution yj ≈ y(xj ) at the N −1 interior points j = 1, . . . , N −1, where xj = a+jh, h = (b−a)/N , and [a, b] = [−1, 1]. Compare the approximate values at the grid points with the exact solution at the grid points. Use N = 2, 4, 8, . . . , 29 and report the maximal absolute error for each N in a table. Your writeup should contain: • the Matlab code; • a table with two columns. The first contains h, the second contains the corresponding maximal errors. By how much is the error reduced every time N is doubled? Can you conclude whether the error is O(h), O(h2 ) or O(hp ) for some other integer p? Regarding Matlab: If needed, go over the Matlab tutorial on the course website, items 1–6. This covers more than you need for this problem. In Matlab, type help diag or help ones to find what these commands do. The (N −1)×(N −1) matrix with 2s on the diagonal and –1 on the off-diagonals can be constructed by v=ones(1,n-1); A=2*diag(v)-diag(v(1:n-2),1)-diag(v(1:n-2),-1); 12 2.3. HW 1: Due August 30, 2013 Applied Matrix Theory The system Ax = b can be solved in Matlab by x = A\b. The maximal difference between two vectors x and y is error=max(abs(x-y)). Your code should have the following structure Listing 2.1. code stub for tridiagonal solver 1 disp ( s p r i n t f ( h error ) 2 a=...; b=...; % Set values of endpoints 3 ya = . . . ; yb = . . . ; % Set values of y at the endpoints 4 for n = . . . ; 5 h=2/n ; 6 x=a : h : b ; 7 % S e t m a t r i x A o f t h e l i n e a r system t o be s o l v e d . 8 v=o n e s ( 1 , n −1); 9 A=2∗diag ( v)−diag ( v ( 1 : n−2) ,1) − diag ( v ( 1 : n −2) , −1); 10 % S e t r i g h t hand s i d e o f l i n e a r system . 11 rhs = . . . 12 % S o l v e l i n e a r system t o f i n d a p p r o x i m a t e s o l u t i o n . 13 y ( 2 : n)=A\ r h s ; y (1)= ya ; y ( n+1)=yb ; 14 % Compute e x a c t s o l u t i o n and a p p r o x i m a t i o n e r r o r 15 yex = . . . % set exact solution 16 plot ( x , y , b − , x , yex , r − ) % t o compare v i s u a l l y 17 error=max( abs ( y−yex ) ) 18 disp ( s p r i n t f ( %1 5 . 1 0 f %20.15 f , h , e r r o r ) ) 19 end Note that in Matlab the index of all vectors starts with 1. Thus, x=-1:h:1, is a vector of length n + 1 and the interior points are x(2:n). 3. Let U be an upper triangular n × n matrix with nonzero entries uij , j ≥ i. (a) Write an algorithm that solves Ux = b for a given right hand side b for the unknown x. (b) Find the number of operations that it takes to solve for x, using your algorithm above. (c) Write a Matlab function function x=utrisol(u,b) that implements your algorithm and returns the solution x. 4. Given A, b below, (a) find the LU factorization of A (using the Gauss Elimination algorithm); (b) use it to solve Ax = b. 2 −1 0 0 0 −1 2 −1 0 0. A= , b = (2.3.3) 0 −1 0 2 −1 0 0 −1 2 5 5. Sparsity of L and U, given sparsity of A = LU. If A, B, C, D have non-zeros in the positions marked by x, which zeros (marked by 0) are still guaranteed to be zero in 13 Nitsche and Benner Unit 2. Matrix Inversion their factors L and U? (B, C, D are all band matrices with p = 3 bands, but differing sparsity within the bands. The question is how much of this sparsity is preserved.) In each case, highlight the new nonzero entries in L and U. x 0 x 0 0 0 0 x 0 x 0 0 x x x x x x x 0 x 0 x 0 x 0 , A= , B= 0 x x x 0 x 0 x 0 0 0 0 x 0 x 0 0 0 x x 0 0 0 x 0 x x 0 x C= 0 0 0 x x 0 x 0 0 x 0 x 0 x 0 0 x 0 x 0 x 0 0 x 0 x 0 0 0 0 , 0 0 x x 0 x D= 0 0 0 0 x 0 x 0 0 0 0 x 0 x 0 x 0 0 x 0 x 0 x 0 0 x 0 0 0 x , 0 0 x 6. Consider solving a differential equation in a unit cube, using N points to discretize each dimension. That is, you have a total of N 3 points at which you want to approximate the solution. Suppose that at each time step, you need to solve a linear system Ax = b, where A is an N 3 × N 3 matrix, which you solve using Gauss Elimination, and suppose there are no other computations involved. Assume your personal computer runs at 1 GigaFLOPS, that is, it executes 109 floating point operations per second. (a) How much time does it take to solve your problem for N = 500 for 1000 timesteps? (b) When you double the number of points N , you typically also have to halve the timestep, that is, double the total number of timesteps taken. By what factor does the runtime increase each time you double N ? (c) How much time will it take to solve the problem if you use N = 2000? 14 UNIT 3 Factorization 3.1 Lecture 4: August 26, 2013 For the h in the homework, for n = 2.^(1:1:10). We want to deduce the order of the method from the table of h and the error. Elementary Matrices 1. Multiply rowi by α: 1 0 0 0 0 0 . . . 0 0 0 E 1 = 0 0 α 0 0 . 0 0 0 . . . 0 0 0 0 0 1 (3.1.1) The inverse is, E−1 1 1 0 0 0 0 . . . 0 0 = 0 0 α1 0 0 0 0 . . . 0 0 0 0 0 0 0 . 0 (3.1.2) 1 E1 E−1 1 = I (3.1.3) 2. Exchange rowi and rowj : 1 0 0 E2 = 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 15 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 . 0 0 1 (3.1.4) Nitsche and Benner Unit 3. Factorization E22 = I 3. Replace rowj by rowj + αrowi . 1 0 0 E3 = 0 0 0 E−1 3 What happens if the columns instead a11 a12 a21 a22 AE1 = a31 a32 .. . an1 an2 1 0 0 = 0 0 0 we post-multiply by of the rows. 1 a13 a1n a23 · · · a2n 0 a33 a3n 0 . . . .. . 0 an3 · · · an 0 a11 a21 AE2 = a31 an1 a12 a13 a22 a23 a32 a33 .. . an2 an3 0 1 0 0 0 0 0 0 1 α 0 0 (3.1.5) 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 . 0 0 1 0 0 0 . 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 −α 1 0 0 0 0 1 0 0 0 0 (3.1.6) (3.1.7) the elementary matrices? The matrices will act on 0 a11 a12 αa13 a1n 0 a21 a22 αa23 · · · a2n a31 a32 αa33 a3n 0 = .. .. . . . . . 0 an1 an2 α an3 · · · an 1 (3.1.8) 1 0 0 0 0 0 a1n 0 1 0 0 0 0 · · · a2n 0 0 0 1 0 0 a3n (3.1.9) 0 0 1 0 0 0 .. .. . . 0 0 0 0 1 0 · · · an 0 0 0 0 0 1 0 0 0 .. 0 0 . 0 α 0 0 0 ... 0 0 0 Gaussian Elimination without pivoting Premultiply by elementary matrices type 3 repeatedly. aji , aii x 0 E−21 A = x x x `ji = for j > i x x x x x 16 x x x x x x x x x x x x x x x (3.1.10) (3.1.11) 3.1. Lecture 4: August 26, 2013 Applied Matrix Theory x x x x 0 x x x E−31 E−21 A = 0 x x x x x x x x x x x This sequence continues until we have introduced zeros to x x 0 x E−n,n−1 · · · E−n1 · · · E−31 E−21 A = 0 0 0 0 0 0 x x x (3.1.12) x x get the lower diagonal: x x x x x x x x x (3.1.13) =U 0 x x 0 0 x Thus, 1 0 `21 1 0 0 E21 E31 = 0 0 0 0 0 0 Which extends to 0 0 1 0 0 0 0 0 0 1 0 0 A = E21 E31 · · · En−1,n−2 En,n−2 En,n−1 U {z } | L 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 `21 1 0 0 `31 0 1 0 0 0 = `31 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 Ẽ1 = En1 · · · E21 E31 1 `21 `31 .. . 0 0 1 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 = `n−1,1 `n1 (3.1.14) 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 . 0 0 1 0 0 1 0 0 0 . (3.1.15) 0 0 1 (3.1.16) This further extends to, 1 `21 `31 .. . 0 1 `32 .. . Ẽ1 Ẽ2 = `n−1,1 `n−1,2 `n1 `n2 0 0 1 0 0 0 . 0 .. 0 0 0 0 0 0 0 0 0 0 . 0 0 1 0 0 1 (3.1.17) Finally we get that Ẽ1 Ẽ2 · · · Ẽn−1 1 `21 `31 .. . 0 1 `32 .. . 0 0 1 .. . 0 0 0 .. . 0 0 0 = 0 `n−1,1 `n−1,2 · · · `n−1,n−2 1 `n1 `n2 · · · `n,n−2 `n,n−1 17 0 0 0 . 0 0 1 (3.1.18) Nitsche and Benner Unit 3. Factorization Solution of Matrix using the Lower/Upper factorization To use A = LU to solve Ax = b. 1. Find L, U (number of operations: 23 n3 ) 2. L(Ux) = b First solve Ly = b (number of operations: n2 ), then solve, Ux = y (number of operations: n2 ). Example: To solve Ax = b_k k= 1,10^4 % Find L, U once O(2/3 n^3) then solve L y = b U x = y 10,000 times O(10,000 * n^2 * 2) Sparse and Banded Matrices Given x 0 0 0 0 0 . . . 0 0 0 A = 0 0 x 0 0 0 0 0 . . . 0 0 0 0 0 x (3.1.19) the bandwidth is 1. Below, x x 0 A= 0 0 0 x x x 0 0 0 0 x x x 0 0 the bandwidth is 3—this is a tridiagonal matrix . when it undergoes LU decomposition. x x 0 0 0 0 1 0 0 0 0 x x x 0 0 0 x 1 0 0 0 0 x x x 0 0 0 x 1 0 0 0 0 x x x 0 = 0 0 x 1 0 0 0 0 x x x 0 0 0 x 1 0 0 0 0 x x 0 0 0 0 x 18 0 0 x x x 0 0 0 0 x x x 0 0 0 , 0 x x (3.1.20) This type of matrix maintains it’s sparsity 0 x 0 0 0 0 0 0 0 0 1 0 x x 0 0 0 0 0 x x 0 0 0 0 0 x x 0 0 0 0 0 x x 0 0 0 0 . 0 x x (3.1.21) 3.2. Lecture 5: August 28, 2013 Applied Matrix Theory Motivation for Gauss Elimination with Pivoting When does Gauss elimination give us a problem? For example 0 1 1. 1 1 δ 1 1+δ 1 2. A = . Solve Ax = , the exact solution is . However, we run into 1 1 2 1 numerical problems. 3.2 Lecture 5: August 28, 2013 Motivation for Gauss Elimination with Pivoting, cont. Whendoes Gauss elimination give us a problem? Returning to the example problem, A = δ 1 1+δ 1 . Solve Ax = , the exact solution is , but we run into numerical 1 1 2 1 problems. There are a couple approaches to this problem. First, solve for x by first finding L, U and using them numerically, δ 1 δ 1 =U (3.2.1) A= → 1 1 0 1 − 1δ and 1 L= 1 δ 0 1 (3.2.2) Now we want to solve L (Ux) = b 1 f o r j =1:16 2 d e l t a = 10ˆ(− j ) ; 3 b = [1 + delta , 2 ] ; 4 L = [ 1 , 0 ; 1/ d e l t a , 1 ] ; 5 U = [ d e l t a , 1 ; 0 , 1−1/ d e l t a ] ; 6 % S o l v e Ly = b \ t o y 7 y (1) = b ( 1 ) ; y (2) = b(2) − L(2 ,1)∗ y ( 1 ) ; 8 % S o l v e Ux = y \ t o x 9 x (2) = y(2)/u (2 ,2); x (1) = (y (1) − u(1 ,2)∗ x (2))/u (1 ,1); 10 % 11 disp ( s p r i n t f ( ’ %5.0 e %20.15 f %20.15 f %10.8 e ’ , d e l t a , x ( 1 ) , x ( 2 ) ,norm( x − [ 1 , 1 ] ) ) ; 12 end p Note that the norm is the Euclidian norm, x − [1, 1] = (x(1) − 1)2 + (x(2) − 1)2 . This gives us a table of results as shown below Conclusion: Ax = b is a good problem (well-posed) introducing small perturbations (e.g., by roundoff) does not change the solution by much. Matlab’s algorithm A\b is a good algorithm (stable); LU decomposition does not give a good algorithm (unstable). 19 Nitsche and Benner Unit 3. Factorization Table 3.1. Variation of error with the perturbation variable δ 1e-01 1e-02 1e-03 1e-04 1e-05 ... 1e-16 x(1) 1.000 1.000 0.999 1.000. . . 28 ... ... 0.888 x(2) ||x − [1, 1]||2 1.000 8e-16 1.000 1e-13 1.000 6e-12 1.000 e-11 1.000 e-10 ... ... 1.000 e-0 Discussion of well-posedness Geometrically, Ax = b, δx1 + x2 = 1 + δ, x1 + x2 = 2. (3.2.3a) (3.2.3b) This is a well-posed system. Rearranging x2 ≈ 1 − δx1 , x 2 = 2 − x1 . (3.2.4a) Our other system Ly = b, y1 = 1 (3.2.5a) 1 y1 + y2 = 2 δ (3.2.5b) This makes a very ill-posed system because small wiggles in δ give much larger errors because the slopes are so near each other. Now we consider Ux = y, δx1 + x2 = 1, 1 1− x2 = y2 . δ (3.2.6a) (3.2.6b) This is also ill-posed as well. All of these linear problems are illustrated in figure 3.1. 20 3.2. Lecture 5: August 28, 2013 Applied Matrix Theory x2 x2 x2 (1, 1) x1 x1 (a) Ax = b x1 (b) Ly = b (c) Ux = y Figure 3.1. Plot of linear problems and their solutions. Gaussian elimination with pivoting Pivoting means we exchange rows such that the current |aii | = max |aji |. Similarly, `ji = j≥i aji aii ≤ 1 for all j > i. Now, δ 1 1+δ 1 1 2 → 1 1 2 δ 1 1+δ 1 1 2 −−−−−−−→ 0 1 − δ 1 + δ − 2δ 1 1 2 → 0 1−δ 1−δ R2 ←R2 −δR1 (3.2.7a) (3.2.7b) (3.2.7c) PLU always works. Theorem: Gaussian elimination with pivoting yields PA = LU. The permutation matrix is P. Every matrix has a PLU factorization. To do the pivoting, at each step, first premultiply A by 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 1 0 0 Pk = (3.2.8) 0 0 1 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 1 then premultiply by 1 0 0 ... 0 0 0 0 1 Lk = 0 0 `k−1,k .. 0 0 . 0 0 `n,k 21 0 0 1 1 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 (3.2.9) Nitsche and Benner Unit 3. Factorization We do this in succession, Ln−1 Pn−1 · · · L2 P2 L1 P1 A = U (3.2.10) How do these commute into a useful P and L matrix? 3.3 Lecture 6: August 30, 2013 Discussion of HW problem 2 − yj−1 + 2yj − yj+1 = h2 f (xj ), 2 −1 −1 2 0 −1 . . .. . . 0 ··· for j = 1, . . . , n − 1. 0 ··· 0 y f (t1 ) + y0 1 .. . −1 . . . y 2 f (t2 ) .. .. 2 ... = h . . 2 0 . .. .. f (tn−2 ) . −1 yn−2 . yn−1 f (tn−1 ) + yn 0 −1 2 (3.3.1) (3.3.2) So we’ve set up our matrix rhs = matrix of zeros size \(1 \times n-1\) for A_{(n-1)x(n-1)} x = a:h:b = linspace(a,b,n+1) rhs = h^2*f(x(2:n)); rhs(1) = rhs(1) + ya; rhs(n-1) = rhs(n-1) + yb; Recall that our f (x) = −ex : − y 00 = −ex (3.3.3) PLU factorization For PLU factorization, we are doing Gauss elimination with pivoting. At each k th step (k) of Gaussian elimination, switch rows so that the pivots, akk , are the largest number by magnitude in the k th column. For example, 1 −1 3 x1 −3 −1 0 −2 x2 = 1. (3.3.4) 2 2 4 x3 0 22 3.3. Lecture 6: August 30, 2013 Applied Matrix Theory or 1 −1 3 −3 2 2 4 0 −1 1 → −1 1 , 0 −2 0 −2 2 2 4 0 1 −1 3 −3 0 2 2 4 1 −0 1 , → 0 0 −2 1 −3 row1 ↔ row3 (3.3.5a) 1 1 row2 ← row2 − row1 , and row3 ← row3 − row1 3 2 (3.3.5b) 2 2 4 0 , −3 0 −2 1 → row2 ↔ row3 (3.3.5c) 0 1 −0 1 0 2 2 4 1 1 −3 , → 0 −2 row3 ← row3 − − row2 2 0 0 1/2 −1/2 (3.3.5d) We need to do the back substitution to solve this system. But more importantly, we want to know what the factorization of this system would be. Recall, 1 0 0 . 0 0 . . 1 0 0 Lk = 0 0 ` k−1,k .. 0 0 . 0 0 `n,k 0 0 0 0 0 , 0 0 .. . 0 0 1 0 0 0 0 1 0 0 (3.3.6) and L−(n−1) Pn−1 · · · L−2 P2 L−1 P1 A = U. (3.3.7) Pn−1 · · · L−2 P2 L−1 P1 A = L(n−1) U. (3.3.8) Reordering, We want to move each P to be right next to A and all the Ls such that we can form a true L. Claim, Pj L−k = L̃−k Pj , j > k. (3.3.9) Pj permutation moves columns below the k th row. This allows us to move L’s out. Pj L−k Pj = L̃−k (3.3.10a) ˜ P L̃−n · · · L̃ −1 n−1 · · · P1 A = U (3.3.11) 23 Nitsche and Benner Unit 3. Factorization Now we can return to our example but with keeping track of the 1 −1 3 −3 2 2 4 0 0 0 1 −1 0 −2 1 → −1 0 −2 1 , row1 ↔ row3 , P1 = 0 1 0 0 2 2 4 1 −1 3 −3 1 0 0 (3.3.12a) 2 − 21 → → 0 4 1 1 row2 ← row2 − − row1 , row3 ← row3 − ro 2 2 1 −0 1 , −2 1 −3 1 2 2 2 1 2 − 21 (3.3.12b) 0 0 1 row2 ↔ row3 , P2 = 1 0 0 0 1 0 0 4 1 −3 , 1 −0 1 2 −2 (3.3.12c) → 2 1 2 − 21 2 −2 − 12 0 −3 , 1/2 −1/2 4 1 1 row3 ← row3 − − 2 row2 (3.3.12d) Because P = P−1 , we should remember that, PA = LU A = PLU. 3.4 (3.3.13a) (3.3.13b) Lecture 7: September 4, 2013 PLU Factorization Recall PA = LU (3.4.1) always exists by construction. This is because we can make anything non-zero by the permutation. This is also equivalent to, A = PLU (3.4.2) because P = P−1 . To use this in an actual solution, PAx = Pb, (3.4.3) LUx = Pb, (3.4.4) or So this system is determined by: 24 3.4. Lecture 7: September 4, 2013 Applied Matrix Theory 1. Solving Ly = Pb, 2. Solving Ux = y. In Matlab, we would use the commands [L,U,P] = lu(A), to find these three matrices. This factorization is not unique. We want to show the uniqueness of the LU factorization, and are also interested in when it exists. Triangular Matrices We are interested in the determinants of lower or upper triangular matrices. Let’s discuss det(L). `11 0 0 0 0 .. . . . 0 0 0 . 0 L = `i1 · · · `jj 0 (3.4.5) . .. · · · ... . . . 0 `n1 · · · `nj . . . `nn Qn the determinant is det(L) = i=1 `ii . Thus L is invertible only if `ii 6= 0 for all `ii . We conjecture the product of two lower triangular matrices will give us lower a triangular matrix. e.g. L1 L2 = L12 (3.4.6) We want to prove this! Multiplication of lower triangular matrices Prove that L1 L2 is lower triangular. Assume AB are lower triangular. Show C = AB is lower triangular. We know that bij aij = 0 for j > i. In our proof, we first consider matrix multiplication. X eij = aik bkj . (3.4.7) We know that aik = 0 for k > i, and bkj = 0 for j > k. If j > i, then when k < i we have that k < j so bkj = 0. Alternatively, if k > i then aik = 0. Thus, in either case one of the two products is zero and we have proved our hypothesis. Inverse of a lower triangular matrix A lower triangular matrix’s inverse is also `11 · · · .. . . −1 L = . . `n1 a lower triangular matrix; 0 .. = Lower triangular . · · · `nn (3.4.8) So, this helps with inversion of the form, L−n · · · L−2 L−1 A = U. 25 (3.4.9) Nitsche and Benner Unit 3. Factorization For matrixes of the form L−k 1 0 0 . 0 0 . . 1 0 0 = 0 0 −` ij .. 0 0 . 0 0 −`nj 0 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 ; 0 0 (3.4.10) 1 the inverse matrix is 1 0 0 ... 0 0 1 0 0 Lk = 0 0 `ij .. 0 0 . 0 0 `nj 0 0 0 1 0 0 0 0 0 0 0 . 0 0 ... 0 0 1 0 (3.4.11) For any 1 0 0 0 0 `11 0 0 0 0 .. . . . . ... 0 0 0 .. 0 0 0 . 0 Lk = 0 · · · 1 0 0 0 · · · `ii 0 . . .. . . .. . . .. 0 . 0 .. . 0 . 0 . 0 · · · `in . . . 1 0 · · · 0 . . . `nn (3.4.12) GE To find L−1 , [L I] −−→ [I L−1 ]. Use Gaussian elimination on L, and we go through each column. Uniqueness of LU factorization Theorem: If A is such that no non-zero pivots are encountered, then A = LU with `ii = 1 a and uii 6= 0, which are the pivots. For, `ij = aijii for j < i by construction. Proof: Assume A = L1 U1 = L2 U2 , then L−1 2 L1 U1 = U2 , −1 L−1 2 L1 = U2 U1 = diagonal matrix = I. (3.4.13a) (3.4.13b) (3.4.13c) (3.4.13d) If this is the case, then L−1 2 L1 = I or L2 = L1 , and similarly U2 = U1 . Thus these matrices are the same and the solution must be unique. Existence of the LU factorization Theorem: A = LU with no zero pivots, then all leading principal submatrices Ak are nonsingular. We define the leading principle sub matrices Ak of An×n is Ak = A(1:k),(1:k) . These are the upper-left square matrices of the full matrix. 26 3.5. Lecture 8: September 6, 2013 Applied Matrix Theory Part 2. A = LU then define Ak 6= 0 for any k. We want to prove that if A = LU, show that Ak is invertible. Then if Ak is invertible show that A = LU. 3.5 Lecture 8: September 6, 2013 About Homeworks The median score was 50 out of 60. A histogram was shown with the general grade distribution. 1 around 10, 3 around 25, 1 around 40, 4 from 45–50, 4 from 50–55, 6 from 55–60. Comments: write in working Matlab code. Also, L must have ones on the diagonal, while U has pivots on the diagonal. “Computing efficiently” means using the LU decomposition, not invert the matrix A. For homework 2, we will have applications of finding the inverse of A or solve AX = I (3.5.1) A x1 x2 · · · xn = e1 e2 · · · en (3.5.2) Axj = ej , (3.5.3) or To find A−1 , solve for all j = 1, 2, . . . , n. Use the LU decomposition. Discussion of ill-conditioned systems We define Ax = b as an ill-conditioned system if small changes in A or b introduces large changes in the solution. Geometrically we showed this interpretation previously on a 2 × 2 system, and we noted that the slopes were very similar to each-other. Numerically, we have trouble because the roundoff when we solve Ãx = b. We also may compute a condition number which tells us the amplification factor of errors in the system. In Matlab, the command cond(A) gives you the condition. This should hopefully be under a thousand. The condition number essentially tells you how much accuracy you can expect to get from the final solution. In other words, if your condition number is 1 × 105 then you can only expect to have about 11 significant digits in our solution at floating point arithmetic. 27 Nitsche and Benner Unit 3. Factorization Inversion of lower triangular matrices Show that if A is a lower triangular matrix then so is A−1 . So let’s solve AX = I with A lower triangular. x x x x x 0 x x x x 0 0 x x x 0 0 0 x x 0 0 0 0 x 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 → → → → x x x x x 0 x x x x 0 0 x x x 0 0 0 x x x x x x x 0 x x x x 0 0 x x x x x x x x 0 x x x x 0 0 x x x 0 0 0 x x x x x x x 0 x x x x 0 0 x x x 0 0 0 x x 1 y y y y 0 0 0 0 x 0 0 0 x x 0 1 0 0 0 1 y y y y 0 0 0 0 x 0 0 0 0 x 0 0 0 0 x 0 0 1 0 0 0 1 y y y 1 y y y y 0 0 0 1 0 0 0 1 0 0 0 1 y y y 1 y y y y 0 0 0 1 0 0 0 1 y y 0 1 y y y 0 0 0 0 1 , 0 0 0 0 1 0 0 0 1 0 0 0 1 y y , 0 0 0 0 1 0 0 0 1 y (3.5.4a) (3.5.4b) , 0 0 0 0 1 (3.5.4c) . (3.5.4d) We now have shown that we can get the lower triangular matrix A into the form LD. Now we do backward substitution to get our X. In this case this is simply deviding each row by the value of the pivot of that row. In this way with D = U, we have X = D−1 L−1 . Example of LU decomposition of a lower triangular matrix Given the matrix, 2 0 0 1 0 0 2 0 0 1 3 0 = 1 1 0 0 3 0 , 2 2 1 4 1 13 1 0 0 4 = LU. 28 (3.5.5a) (3.5.5b) 3.6. Lecture 9: September 9, 2013 Applied Matrix Theory Banded matrix example Exercise 3.10.7: Band matrix A with bandwidth w is a matrix with aij = 0 if |i − j| > w. If w = 0, we have a diagonal matrix. a11 0 0 0 0 0 a22 0 0 0 0 a33 0 0 Aw=0 = 0 (3.5.6) . 0 0 0 a44 0 0 0 0 0 a55 For bandwidth, w = 1, Aw=1 a11 a12 0 0 0 a21 a22 a23 0 0 . 0 a a a 0 = 32 33 34 0 0 a43 a44 a45 0 0 0 a54 a55 (3.5.7) For bandwidth, w = 2, Aw=2 a11 a21 = a31 0 0 a12 a22 a32 a42 0 a13 a23 a33 a43 a53 0 0 a24 0 a34 a35 . a44 a45 a54 a55 (3.5.8) In the LU decomposition these zeros are preserved. However there are other cases (as shown in the homework) where the zeros may not be preserved. We will return to our theorem on Monday. For the homework, a matrix has an LU decomposition if and only if all principle submatrices are invertible. 3.6 Lecture 9: September 9, 2013 Existence of the LU factorization (cont.) When does LU factorization exist? Theorem: If no zero pivots that appears in Gaussian elimination (including the nth one) then A = LU, `ii = 1 and uii 6= 0 are pivots. Then L, U are unique. Theorem: A = LU if and only if the leading principle submatrices Ak is invertible. Proof: Assume (for block matrices of length k × k, n − k × n − k and the difference) A = LU, L11 0 U11 U12 = , L21 L22 0 U22 L11 U11 L11 U12 = L21 U11 L22 U22 29 (3.6.1) (3.6.2) (3.6.3) Nitsche and Benner Unit 3. Factorization Q Now our question: is Ak = L11 U11 ? We know that det L11 = kj=1 `jj 6= 0 so L11 is Q invertible. Similarly, U11 = kj=1 ujj 6= 0 so it is also invertibles. Since we know that the product of two invertible matrices is also invertible, Ak must also be invertible. We will now do a proof by induction: If we assume that all Ak are invertible. Show that A = LU. ASIDE: Example of proof by induction. We want to show, n X j2 = j=1 n(n + 1)(2n + 1) . 6 (3.6.4) The steps of proof by induction are 1. First we show that this holds for n = 1, 2. next we assume it holds for n, 3. finally we show that it holds for n + 1. Let’s show the third step, n+1 X j2 = j=1 n X j 2 + (n + 1)2 , (3.6.5a) j=1 = = = = = n(n + 1)(2n + 1) + (n + 1)2 , 6 n(n + 1)(2n + 1) + 6(n + 1)2 , 6 (n + 1) [n(2n + 1) + 6(n + 1)] , 6 2 (n + 1) 2n + 7n + 1 , 6 (n + 1)(n + 2)(2n + 3) . 6 (3.6.5b) (3.6.5c) (3.6.5d) (3.6.5e) (3.6.5f) Which is what would be expected, and we have proved this relation by induction. So for our system, 1. First we show that this holds for n = 1, A = [a11 ] = [1] [a11 ] where a11 6= 0. 2. Assume true for n: If Ak , k = 1, . . . , n are invertible, then An×n = Ln×n Un×n . 3. Show it holds for n + 1. So let’s move onto the third step, assume A(n+1)×(n+1) with Ak , k = 1, . . . , n+1 are invertible. By induction assumption An = Ln Un , since A1 , . . . , An are invertible. Now we need to show that An+1 = Ln+1 Un+1 , An b An+1 = , (3.6.6a) c| α Ln Un b = , (3.6.6b) c| α Ln 0 Un x = . (3.6.6c) | y| 1 0 β 30 3.6. Lecture 9: September 9, 2013 Applied Matrix Theory −1 We want Ln x = b so we let x = L−1 n b which supposes that Ln exists. We also want | | y| Un = c| so we let y| = c| U−1 n . Finally, we want y x + β = α, so we let β = α − y x. We know, An+1 Ln Un = c| Ln = | −1 c Un b , α 0 Un L−1 n b . | −1 1 0 α − c| U−1 n Ln b (3.6.7a) (3.6.7b) Since A = An+1 is invertible, we must have β 6= 0 because if β = 0 then det(Ln+1 ) det(Un+1 ) = 0, in which case An+1 would not be invertible. So, An+1 has an LU decomposition and by principle of induction we have proven our theorem. Rectangular matrices For a rectangular matrix Am×n ∈ Rm×n . Our question: is Ax = b solvable? Is the solution unique? We are presented with there options: no solution, unique solution, or infinitely many solutions. We are going to do Gaussian elimination to reduce the form of the matrix to see how many solutions we will have. So we will do row echelon form (REF) reduction. Example of row echelon form 1 2 A= 1 2 1 0 → 0 0 1 0 → 0 0 1 0 → 0 0 2 4 2 4 1 0 3 0 3 4 5 4 3 4 , 5 7 2 1 3 3 0 −2 −2 −2 , 0 2 2 2 0 −2 −1 1 2 1 3 3 0 1 1 1 , 0 0 0 0 0 0 0 2 2 1 3 3 0 1 1 1 . 0 0 0 1 0 0 0 0 (3.6.8a) (3.6.8b) (3.6.8c) (3.6.8d) Where we made interchanges to have leading ones for the columns. What do we know about our matrix A from this information? First, we know what columns are linearly independent. We are trying to find the column space of our matrix. 31 Nitsche and Benner 3.7 Unit 3. Factorization Homework Assignment 2: Due Friday, September 13, 2013 1. Textbook 3.10.1 (a, c): LU and PLU factorizations 1 4 5 Let, A = 4 18 26. 3 16 30 (a) Determine the LU factors of A (c) Use the LU factors to determine A−1 2. Textbook 3.10.2 Let A and b be the matrices, 1 2 4 17 3 6 −12 3 A= 2 3 −3 2 0 2 −2 6 and 17 3 b= 3. 4 (a) Explain why A does not have an LU factorization. (b) Use partial pivoting and find the permutation matrix P as well as the LU factors such that PA = LU. (c) Use the information in P, L, and U to solve Ax = b. 3. Textbook 3.10.3 ξ 2 0 Determine all values of ξ for which A = 1 ξ 1 fails to have an LU factorization. 0 1 ξ 4. Textbook 3.10.5 If A is a matrix that contains only integer entries and all of its pivots are 1, explain why A−1 must also be an integer matrix. Note: This fact can be used to construct random integer matrices that posses integer inverses by randomly generating integer matrices L and U with unit diagonals and then constructing the product A = LU. 5. Lower triangular matrices Let A be a 3 × 3 matrix with real entries. We showed that GE is equivalent to finding lower triangular matrices L−1 and L−2 such that L−2 L−1 A = U where U is upper triangular and, 1 0 0 1 0 0 1 0 , L−1 = −`21 1 0 , L−2 = 0 (3.7.1) −`31 0 1 0 −`32 1 32 3.7. HW 2: Due September 13, 2013 Applied Matrix Theory with (L−1 )−1 1 0 0 = `21 1 0 = L1 , `31 0 1 (L−2 )−1 1 0 0 = 0 1 0 = L2 . 0 `32 1 (3.7.2) It follows that A = L2 L1 U. Show that 1 0 0 L2 L1 = `21 1 0 . `31 `32 1 (3.7.3) Show by example that generally, L2 L1 6= L1 L2 (3.7.4) That is, the order in which these lower triangular matrices are multiplied matters. 6. Textbook 1.6.4: Conditioning Using geometric considerations, rank the following three systems according to their condition. (a) 1.001x − y = 0.235, x + 0.0001y = 0.765. (b) 1.001x − y = 0.235, x + 0.9999y = 0.765. (c) 1.001x + y = 0.235, x + 0.9999y = 0.765. 7. Textbook 1.6.5 Determine the exact solution of the following system: 8x + 5y + 2z = 15, 21x + 19y + 16z = 56, 39x + 48y + 53z = 140. Now change 15 to 14 in the first equation and again solve the system with exact arithmetic. Is the system ill-conditioned? 33 Nitsche and Benner Unit 3. Factorization 8. Textbook 1.6.6 Show that the system v−w−x−y−z w−x−y−z x−y−z y−z z = 0, = 0, = 0, = 0, = 1, is ill-conditioned by considering the following perturbed system: v − w − x − y − z = 0, − 1 v+w−x−y−z 15 1 − v+x−y−z 15 1 − v+y−z 15 1 − v+z 15 34 = 0, = 0, = 0, = 1. UNIT 4 Rectangular Matrices 4.1 Lecture 10: September 11, 2013 Rectangular matrices (cont.) We are interested in a rectangular matrix, Am×n . We may apply REF, or RREF to find the column dependence, what the basic columns are, and what the rank of the matrix is. This way we can find for any system Ax = b, whether the system is consistent and find all the solutions; whether it is homogeneous, or what the free variables are; and what the particular solutions are. Last time’s example, we went from 1 2 A = 1 2 1 0 → 0 0 2 4 2 4 1 0 3 0 3 4 5 4 2 0 0 0 1 2 0 0 3 2 0 0 3 4 , 5 7 3 2 . 3 0 (4.1.1a) (4.1.1b) The first, third, and fifth columns have pivots and are the basic columns. They correspond to the linearly independent columns in A. How do we write the other two columns (c2 , c4 ) as functions of the other three columns? We can notice that, c2 = 2c1 , and similarly c4 = 2c1 + c3 . The reduced row echelon form (RREF) has pivots on 1, and zeros below and above 35 Nitsche and Benner Unit 4. Rectangular Matrices x2 x2 x2 x1 x1 (a) Intersecting system (one solution) x1 (b) Parallel system (no solution) (c) Equivalent system (infinite solutions) Figure 4.1. Geometric illustration of linear systems and their solutions. all pivots. So, 1 0 0 0 2 0 0 0 1 2 0 0 3 2 0 0 3 1 0 2 → 3 0 0 0 1 0 → 0 0 1 0 → 0 0 2 0 0 0 1 1 0 0 3 1 0 0 2 0 0 0 1 1 0 0 3 1 0 0 2 0 0 0 0 1 0 0 2 1 0 0 3 1 , 1 0 0 0 , 1 0 0 0 . 1 0 (4.1.2a) (4.1.2b) (4.1.2c) In this form, the basic columns are very clear, and the relations between the dependent columns and the basic columns is also obvious. So again we can see that, c2 = 2c1 and c4 = 2c1 + 1c3 . The rank of the matrix is the number of linearly independent columns, which is also the number of linearly independent rows, and also the number of pivots in row-echelon form of the matrix. A consistent system, Ax = b is a system that has at least one solution. It is inconsistent if it has no solutions. To determine if Ax = b is consistent, in a 2 × 2 system, Ax = b, a11 x1 + a12 x2 = b1 , a21 x1 + a22 x2 = b2 . (4.1.3a) (4.1.3b) Since this system is a linear system we can see three cases: one intersection, parallel and separated, and parallel and the same. Each of these cases are illustrated in Figure 4.1. In general, for any size matrix, we find the row echelon form of the augmented system 36 4.1. Lecture 10: September 11, 2013 Applied Matrix Theory h i [A b] → E b̃ . x x x x x 0 x x x x 0 0 0 0 α (4.1.4) If α 6= 0, then the system is inconsistent. So Ax = b is consistent if rank([A b]) = rank(A). If α = 0 then b̃ is not a basic column of (A b). The we can write b̃ as a linear combination of the basic columns of E. We can write b as linear combinations of basic columns of A. In our example, we had c1 , c3 , and c5 where the basic columns and Ax = b was consistent. Here then if we were to preform a reduction, the b = x1 c1 + x3 c3 + x5 c5 , or in other words, x1 0 A (4.1.5) x3 = b. 0 x5 Example of RREF of a Rectangular Matrix Given the matrix, 1 2 2 3 1 2 2 5 2 4 4 8 2 4 4 6 1 3 2 5 1 1 1 0 0 1 → 2 0 0 3 0 2 1 1 0 2 → 0 0 0 0 2 0 0 2 2 0 0 0 2 2 0 0 2 0 0 0 1 1 1 −1 , 0 0 0 2 1 1 2 0 . 1 −1 0 0 (4.1.6a) (4.1.6b) Thus, our system is consistent. We have that rank([A b]) = rank(A). Similarly, we observe that we have 3 basic columns, r, and 2 linearly dependent columns, n − r. (If n > m, then n > r, so n − r 6= 0). Let’s continue on to perform the reduced row echelon form. 1 1 2 2 1 1 1 1 2 2 1 1 0 2 2 0 2 0 0 0 1 1 0 1 (4.1.7a) 0 0 0 0 1 −1 → 0 0 0 0 1 −1 , 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 0 2 0 1 1 0 0 1 → (4.1.7b) 0 0 0 0 1 −1 , 0 0 0 0 0 0 1 0 1 2 0 1 0 1 1 0 0 1 → (4.1.7c) 0 0 0 0 1 −1 . 0 0 0 0 0 0 37 Nitsche and Benner Unit 4. Rectangular Matrices Thus our b̃ = 1c̃1 + 1c̃2 − 1c̃5 . Therefore, b = 1c1 + 1c2 − 1c5 , and 1 1 x= 0 . 0 −1 (4.1.8) So in review, 1 2 2 3 1 2 2 5 2 4 4 8 2 4 4 6 1 3 2 5 1 1 1 0 → 0 2 3 0 0 1 0 0 1 1 0 0 2 0 0 0 1 0 0 1 . 1 −1 0 0 (4.1.9) We found a particular solution, xp = (1 1 0 0 − 1)| of Ax = b. For any solution xh of Ax = 0, we have that A (xp + xH ) = b + 0. So (xp + xH ) also solves Ax = b. 4.2 Lecture 11: September 13, 2013 Solving Ax = b Ax = b is consistent if rank[A | b] = rank(A). We have that b is a nonbasic column of [A | b]. We can express b in terms of columns of A to get a solution Axp = b. The set of all solutions is xp + xH , where Axp = b has the particular solution to Ax = b. We also solve AxH = 0, and get all homogeneous solutions, xH . Since we can add these two solutions, we have A (xp + xH ) = b. Now to actually find the particular solution, xp , we write b in terms of basic columns. To find the homogeneous solutions, xH , we solve Ax = 0 by solving for basic variables xi in terms of the n − r free variables. Basic variables correspond to basic columns, while free variables correspond to nonbasic columns. Note that if n > r then the set of columns is linearly independent and we can find x 6= 0 such that Ax = 0. Example From our example 1 2 2 3 1 2 2 5 2 4 4 8 2 4 4 6 1 3 2 5 1 1 1 0 → 2 0 3 0 0 1 0 0 1 1 0 0 2 0 0 0 0 1 0 1 , 1 −1 0 0 (4.2.1a) we have that b = a:1 + a:2 − a:5 , = x1 a:1 + x2 a:2 − x5 a:5 , = Axp , | where xp = 1 1 0 0 −1 . 38 (4.2.2a) (4.2.2b) (4.2.2c) 4.2. Lecture 11: September 13, 2013 Applied Matrix Theory Solve, 1 0 [A | 0] = 0 0 0 1 0 0 1 1 0 0 2 0 0 0 0 0 1 0 0 0 . 0 0 (4.2.3a) This gives us the three equations for the homogeneous solutions, x1 = −x3 − 2x4 , x2 = −x3 , x5 = 0. (4.2.4a) (4.2.4b) (4.2.4c) This gives us the homogeneous solutions of the form, −x3 − 2x4 −x3 , x xH = 3 x4 0 −1 −2 −1 0 = x3 0 + x4 0. 0 1 0 0 (4.2.5a) (4.2.5b) Thus the set of all solutions are, x = xp + xH , 1 −1 −2 1 −1 0 = 0 + x3 0 + x4 0 . 0 0 1 −1 0 0 (4.2.6a) (4.2.6b) This solves Ax = b for any x3 and x4 . Therefore we have infinitely many solutions. Not we can only have unique solutions if n = r. Linear functions We have any function f : D → R is a linear function if 1. f (x + y) = f (x) + f (y), 2. f (αx) = αf (x). 39 Nitsche and Benner Unit 4. Rectangular Matrices For example, f (x) = ax + b, with b 6= 0. f (x + y) = (ax + b) + (ay + b) , = a(x + y) + 2b, 6= a(x + y) + b. (4.2.7a) (4.2.7b) (4.2.7c) Thus this is not a linear function. However when b = 0, the function f (x) = ax can be verified to be linear. Example: Transpose operator The transpose operator is f (A) = A| . Define that if A = [aij ], then A| = [aji ] and A∗ = A| = [āji ]. Is this linear? | f (A + B) = (A + B) , (4.2.8a) | = [aij + bij ] , = [aji + bji ] , | | =A +B . (4.2.8b) (4.2.8c) (4.2.8d) To check the second criterion, | f (αA) = [αA] , (4.2.9a) | = α [A] , = αf (A). (4.2.9b) (4.2.9c) So this operator is linear. Example: trace operator P aii . X f (A + B) = (aii + bii ) , The trace operator is f (A) = tr(A) = i (4.2.10a) i = X aii + X i bii , (4.2.10b) i = tr(A) + tr(B). (4.2.10c) The second cirterion, f (αA) = tr(αA), X = αaii , (4.2.11a) (4.2.11b) i =α X aii , (4.2.11c) = α tr(A), = αf (A). (4.2.11d) (4.2.11e) i We have therefore shown that this is a linear operator. 40 4.2. Lecture 11: September 13, 2013 Applied Matrix Theory Matrix multiplication Given, a b A= , c d B= ã b̃ . c̃ d˜ (4.2.12) Then consider ax1 + bx2 f (x) = Ax = , cx1 + dx2 g(x) = Bx = ãx1 + b̃x2 ˜ 2 . c̃x1 + dx (4.2.13) Take f (g(x)) = A (Bx) ≡ ABx. (4.2.14) But, ˜ 2) a(ãx1 + b̃x2 ) + b(c̃x1 + dx f (g(x)) = ˜ 2) , c(ãx1 + b̃x2 ) + d(c̃x1 + dx ˜ 2 (aã + bc̃)x1 + (ab̃ + bd)x = ˜ 2 , (cã + dc̃)x1 + (cb̃ + dd)x aã + bc̃ ab̃ + bd˜ x1 , = cã + dc̃ cb̃ + dd˜ x2 ≡ AB. Now if we define AB = [Ai: B:j ] or Ai: B:j = | {z } (4.2.15a) (4.2.15b) (4.2.15c) (4.2.15d) Pn k=1 Aik Bkj . We get that matrix multiplication (AB)ij is not generally commutative, or AB 6= BA. If AB = 0 then either A = 0 or B = 0 unless A or B are invertible. Further we know that we have the distributive properties, A (B + C) = AB + AC, (4.2.16) (A + B) D = AD + BD, (4.2.17) (AB) C = A (BC) . (4.2.18) or and the associative property A property of the transpose operator is, | | | (AB) = B A , (4.2.19) tr(AB) = tr(BA). (4.2.20) which also helps to understand that, Note, however, that tr(ABC) 6= tr(ACB) as we will demonstrate on the homework. 41 Nitsche and Benner Unit 4. Rectangular Matrices Proof of transposition property We want to prove the useful property, | | | (AB) = B A . (4.2.21) Dealing with our left hand side of the equation, LHS : | | (AB) = (AB)ij , (4.2.22a) = [(AB)ji ], = [Aj: B:i ]. (4.2.22b) (4.2.22c) Manipulating the right hand side of the property, h i | | | | RHS : B A = B A ij , | | = Bi: A:j , = [B:i Aj: ], = [Aj: B:i ], = LHS. (4.2.23a) (4.2.23b) (4.2.23c) (4.2.23d) (4.2.23e) Thus, we have proved the identity. 4.3 Lecture 12: September 16, 2013 We will be having an exam on September 30th . Inverses We define: A has an inverse if each A−1 exists such that, AA−1 = A−1 A = I. (4.3.1) We also have the properties: • (AB)−1 = B−1 A−1 , −1 • (A| ) −1 • (A−1 ) | = (A−1 ) , = A. What about the inverse of sums (A + B)−1 ? There are the special cases, −1 • low rank perturbations of In×n : (I + CD| ) , where C, D ∈ Rn×k or the matrices are of rank k. • small perturbation of I : (I + A)−1 , where ||A||. 42 4.3. Lecture 12: September 16, 2013 Applied Matrix Theory We have a rank-1 matrix uv| , with u, v ∈ Rn = Rn×1 . u1 u2 | uv = .. v1 v2 · · · vk , . uk u1 v1 u1 v2 · · · u1 vk u2 v1 u2 v2 · · · u2 vk = .. .. .. , ... . . . uk v1 uk v2 · · · uk vk u1 v| u2 v| = .. . . uk v| (4.3.2a) (4.3.2b) (4.3.2c) Now let’s say we have an example where all matrix entries are zero except for αij at some point (i, j). 0 0 ··· 0 .. . .. .. = α 0 · · · 1 · · · 0, (4.3.3a) . α . . .. 0 ··· 0 0 | = αei ej . (4.3.3b) Low rank perturbations of I We make the claim the if u, v are such that v| u + 1 6= 0 then I + uv | −1 =I− uv| 1 + v| u (4.3.4) Proof: I + uv | uv| I− 1 + v| u uv| u (v| u) v| | + uv − , 1 + v| u 1 + v| u uv| (v| u) | | =I− + uv − | | uv , 1+v u 1 + v u 1 (v| u) | = I − uv , | +1− | 1+v u | 1+v u 1 + v u | = I − uv 1− , 1 + v| u = I. =I− 43 (4.3.5a) (4.3.5b) (4.3.5c) (4.3.5d) (4.3.5e) Nitsche and Benner Unit 4. Rectangular Matrices | So if c, d ∈ Rn such that d (A−1 c) + 1 6= 0, we are interested in A−1 . | −1 | A + cd = A I + A−1 cd , | = I + A−1 c d A−1 , | A−1 cd = I− A−1 , | −1 1+d A c | A−1 cd A−1 −1 =A − . | 1 + d A−1 c (4.3.6a) (4.3.6b) (4.3.6c) (4.3.6d) The Sherman–Morrison Formula The Sherman–Morrison formula states that if A is invertible and C, D ∈ Rn×k such that I + D| A−1 C is invertible. Then, −1 | −1 | −1 | A + CD = A−1 − A−1 C I + D A−1 C D A (4.3.7) Finite difference example with periodic boundary conditions Previously, we had, −y 00 = f, y(a) = ya , y(b) = yb . on [a, b], (4.3.8a) (4.3.8b) (4.3.8c) We get the finite difference approximation of, f1 2 −1 0 0 y0 y1 y2 .. .. −1 2 −1 · · · 0 . . .. 2 y . 0 3 = h 2 fi + 0 . 0 −1 . . . .. . . . . .. .. .. . . .. . . yn−1 0 0 0 ··· 2 fn−1 yn (4.3.9) If we instead use periodic boundary conditions we have perturbed our solution, 2 −1 −1 2 0 −1 .. . −1 0 −y 00 = f, on [a, b], y(a) = y(b), y 0 (a) = y 0 (b). 0 −1 f1 y1 .. −1 · · · 0 y2 . ... 2 0 y3 = h2 fi . . .. .. ... ... ... . yn−1 0 ··· 2 fn−1 (4.3.10a) (4.3.10b) (4.3.10c) In this case the Shermann–Morrison formula would help greatly with our inversion. 44 (4.3.11) 4.3. Lecture 12: September 16, 2013 Applied Matrix Theory Examples of perturbation Given a matrix 1 2 A= , 1 3 3 −2 −1 A = . −1 1 1 2 B= , 2 3 0 0 =A+ , 1 0 | = A + e2 e1 , (4.3.12a) (4.3.12b) (4.3.12c) (4.3.12d) (4.3.12e) Applying the Shermann–Morrison formula 0 0 A A−1 1 0 = A−1 − , 1 + e|1 A−1 e2 3 −2 0 0 −1 1 3 −2 , = A−1 − 1 − 2 −6 4 −1 =A − , 3 −2 9 −2 = . −4 3 −1 B−1 (4.3.12f) (4.3.12g) (4.3.12h) (4.3.12i) Small perturbations of I We want to show what happens when we have small perturbations from the identity matrix I; ? (I − A)−1 = I + A + A2 + · · · , (4.3.13) when kAk < 1. We first consider the geometric series, 1 , 1−x ∞ X = xn , (1 − x)−1 = (4.3.14a) (4.3.14b) n=0 = 1 + x + x2 + x3 + · · · when |x| < 1. To be continued. . . 45 (4.3.14c) Nitsche and Benner 4.4 Unit 4. Rectangular Matrices Lecture 13: September 18, 2013 Small perturbations of I (cont.) We want to show what happens when we have small perturbations from the identity matrix I; ? (I − A)−1 = I + A + A2 + · · · , (4.4.1) when kAk < 1. We first consider the geometric series, 1 , 1−x ∞ X = xn , (1 − x)−1 = (4.4.2a) (4.4.2b) n=0 = 1 + x + x2 + x3 + · · · , (4.4.2c) when |x| < 1. This is proved as follows, S= n X xk , (4.4.3a) k=0 S − xS = 1 + x + x2 + · · · + xn − x − x2 − · · · − xn+1 , 2 2 n n = 1 + (x−x) + x − x + ··· + (x − x ) − xn+1 , = 1 − xn+1 , 1 − xn−1 , S= 1−x 1 − xn−1 , = lim n→∞ 1 − x 1 = . 1−x (4.4.3b) (4.4.3c) (4.4.3d) (4.4.3e) (4.4.3f) (4.4.3g) Returning to the full series for a matrix, (I − A) (I + A + · · · + An ) = I + A + A2 + · · · + An − A − A2 − · · · − An+1 , (4.4.4a) n 2 2 = I + (A− A) + A − A + ··· + (An−A ) − An+1 , (4.4.4b) = I − An+1 . (4.4.4c) If A is small, so that An → 0 as n → ∞, then (I − A) ∞ X Ak = I, (4.4.4d) k=0 (I − A) −1 = ∞ X k=0 46 Ak . (4.4.4e) 4.4. Lecture 13: September 18, 2013 Applied Matrix Theory Let’s consider the convergence of this series now. L= ∞ X ak , (4.4.5) k=1 Pn where ak → 0 as k → ∞. We define that L is finite if lim n→∞ Pn 1 k=1 ak exists and is finite. P 1 diverges since lim As an example we see that ∞ n→∞ k=1 k → ∞. So we also should n=1 n consider that the difference, (L−) − ∞ X ak → 0, as n → ∞. (4.4.6) k=1 Thus, we can consider that, L≈ n X ak , with error → 0 as n → ∞. (4.4.7) k=1 In particular if A is small then, (I − A)−1 ≈ I + A. (4.4.8) −1 (A + B)−1 = A I + A−1 B , (4.4.9a) −1 −1 = I + A−1 B A , ≈ I − A−1 B A−1 , = A−1 − A−1 BA−1 . (4.4.9b) For example, where A−1 exists, (4.4.9c) (4.4.9d) Matrix Norms The properties of norms of matrix A ∈ Rm×n has a norm, k · k, if the norm satisfies, 1. kAk ≥ 0, and if kAk = 0 then A = 0, 2. kA + Bk ≤ kAk + kBk, 3. kαAk = |α| kAk, and we must add the fourth property; 4. kABk ≤ kAk kBk. As an example of a norm, kAk = max j 47 X i |aij | (4.4.10) Nitsche and Benner Unit 4. Rectangular Matrices which is the maximum absolute value of the column sum. If kAk < 1, then 0 ≤ kAn k ≤ kAkn → 0 as n → ∞. So kAn k → 0 as n → ∞ and An → 0 as n → ∞. When is A−1 B small? −1 −1 A B ≤ A kBk, kBk = A−1 kAk, kAk kBk . = kAk (4.4.11a) (4.4.11b) (4.4.11c) Thus, −1 A 6≤ Note, we have shown kA−1 k 6≤ 1. If this is the case, 1 , kAk 1 . kAk (4.4.12) since kAA−1 k = kIk which we suppose to be equal to 1 = AA−1 , = kAkA−1 . (4.4.13a) (4.4.13b) So, 1 ≤ A−1 . kAk (4.4.13c) However, we would get, −1 kA−1 kkAk A = , kAk = A−1 κ (A) . (4.4.13d) (4.4.13e) Condition Number For example pertaining to the condition number , we suppose we have Ax = b, and we have the perturbation (A + B) x̃ = b, where we know that kA−1 Bk < 1, or in other words that B is sufficiently small. We can get the relative change in x introduced by the change in A, −1 A b − (A + B)−1 b kx − x̃k = , kxk kxk −1 A − (A + B)−1 b = , kxk 48 (4.4.14a) (4.4.14b) 4.5. HW 3: Due September 27, 2013 Applied Matrix Theory If we use (A + B)−1 ≈ A−1 − A−1 BA−1 kA−1 BA−1 bk , kxk kA−1 Bkkxk ≤ , kxk kA−1 kkBkkAk ≤ , kAk kBk κ(A). = kAk ≈ (4.4.14c) (4.4.14d) (4.4.14e) (4.4.14f) Thus, κ(A) measures the amplification of the errors. 4.5 Homework Assignment 3: Due Friday, September 27, 2013 For the first four problems, you may use the Matlab commands rref(a) and a\b to check your work. 1. Textbook 2.2.1: Row Echelon Form, Rank, Consistency, General solution of Ax = b. Determine the reduced row echelon form for each of the following matrices and then express each nonbasic column in terms of the basic columns: 1 2 3 3 (a) 2 4 6 9 2 6 7 6 2 1 1 3 0 4 1 4 2 4 4 1 5 5 2 1 3 1 0 4 3 (b) 6 3 4 8 1 9 5 0 0 3 −3 0 0 3 8 4 2 14 1 13 3 2. Textbook 2.3.3 If A is an m × n matrix with rank(A) = m, explain why the system [A|b] must be consistent for every right-hand side b. 3. Textbook 2.5.1 Determine the general solution for each of the following non homogeneous systems. (a) x1 + 2x2 + x3 + 2x4 = 3, 2x1 + 4x2 + x3 + 3x4 = 4, 23x1 + 6x2 + x3 + 4x4 = 5. 49 (4.5.1a) (4.5.1b) (4.5.1c) Nitsche and Benner Unit 4. Rectangular Matrices (b) 2x + y + z 4x + 2y + z 6x + 3y + z 8x + 4y + z = 4, = 6, = 8, = 10. (4.5.2a) (4.5.2b) (4.5.2c) (4.5.2d) (c) x1 + x2 + 2x3 3x1 + 3x3 + 3x4 2x1 + x2 + 3x3 + x4 x1 + 2x2 + 3x3 − x4 = 3, = 6, = 3, = 0. (4.5.3a) (4.5.3b) (4.5.3c) (4.5.3d) (d) 2x + y + z 4x + 2y + z 6x + 3y + z 8x + 5y + z = 2, = 5, = 8, = 8. (4.5.4a) (4.5.4b) (4.5.4c) (4.5.4d) 2x + 2y + 3z = 0, 4x + 8y + 12z = −4, 6x + 2y + αz = 4. (4.5.5a) (4.5.5b) (4.5.5c) 4. Textbook 2.5.4 Consider the following system: (a) Determine all values of α for which the system is consistent. (b) Determine all values of α for which there is a unique solution, and compute the solution for these cases. (c) Determine all values of α for which there are infinitely many different solutions, and give the general solution for these cases. 5. Textbook 3.3.1: Linear Functions Each of the following is a function from R2 into R2 . Determine which are linear functions. x x = . (a) f y 1+y x y (b) f = . y x 50 4.5. HW 3: Due September 27, 2013 Applied Matrix Theory Figure 4.2. Figures for Textbook problem 3.3.4. (c) (d) (e) (f) x 0 f = . y xy 2 x x f = . y y2 x x f = . y sin y x x+y f = . x−y y 6. Textbook 3.3.4 Determine which of the following three transformations in R2 are linear. 7. Textbook 3.5.4: Matrix Multiplication Let ej denote the j th unit column that contains a 1 in the j th position and zeros everywhere else. For a general matrix An×n , describe the following products. (a) Aej (b) e|j A (c) e|j Aej 8. Textbook 3.5.6 (please use induction) 1/2 α , determine limn→∞ An . Hint: Compute a few powers of A and try 0 1/2 to deduce the general form of An . For A = 9. Textbook 3.5.9 If A = [aij (t)] is a matrix whose entries are functions of a variable t, the derivative of A with respect to t is defined to be the matrix of derivatives. That is, daij dA = . dt dt 51 Nitsche and Benner Unit 4. Rectangular Matrices Derive the product rule for differentiation d(AB) dA dB = B+A . dt dt dt 10. Textbook 3.6.2 For all matrices An×k and Bk×n show that the block matrix I − BA B L= 2A − ABA AB − I has the property L2 = I. Matrices with this property are said to be involuntary, and they occur in the science of cryptography. 11. Textbook 3.6.3 For the matrix 1 0 0 A= 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 , 1/3 1/3 1/3 determine A300 . Hint: A square matrix C is said to be idempotent when it has the property that C2 = C. Make use of the idempotent submatrices in A. 12. Textbook 3.6.5 If A and B are symmetric matrices that commute, prove that the product AB is also symmetric. If AB 6= BA, is AB necessarily symmetric? 13. Textbook 3.6.7 For each matrix An×n , explain why it is impossible to find a solution for Xn×n in the matrix equation AX − AX = I. (4.5.6) Hint: Consider the trace function. 14. Textbook 3.6.11 Prove that each of the following statements is true for conformable matrices (a) tr (ABC) = tr(BCA) = tr(CAB). (b) tr (ABC) can be different from tr (BAC). (c) A| B = tr(AB| ) 15. Textbook 3.7.2: Inverses Find the matrix X such that X = AX + B, where 0 −1 0 1 2 0 −1 and B = 2 1 . A = 0 0 0 0 3 3 52 4.5. HW 3: Due September 27, 2013 Applied Matrix Theory 16. Textbook 3.7.6 If A is a square matrix such that I − A is nonsingular, prove that A (I − A)−1 = (I − A)−1 A. 17. Textbook 3.7.8 If A, B, and A + B are each nonsingular, prove that A (A + B)−1 = B (A + B)−1 A = A−1 + B−1 −1 . 18. Textbook 3.7.9 Let S be a skew-symmetric matrix with real entries. (a) Prove that I − S is nonsingular. Hint: x| x = 0 means x = 0. (b) If A = (I + S) (I − S)−1 , show that A−1 = A| . 19. Textbook 3.9.9: Sherman–Morrison formula, rank 1 matrices Prove that rank(An×n ) = 1 if and only if there are nonzero columns um×1 and vn×1 such that | A = uv . 20. Textbook 3.9.10 Prove that rank(An×n ) = 1, then A2 = τ A, where τ = tr(A). 53 Nitsche and Benner Unit 4. Rectangular Matrices 54 UNIT 5 Vector Spaces 5.1 Lecture 14: September 20, 2013 Topics in Vector Spaces We will be discussing the following topics in this lecture (and possibly the next couple). • Field • Vector Space • Subspace • Spanning Set • Basis • Dimension • The four subspaces of Am×n Field We define a field as a set F with the properties such that, • Closed under addition (+) and multiplication ( · ). Thus if α, β ∈ F, then α + β ∈ F and α · β ∈ F. • Addition and multiplication are commutative. • Addition and multiplication are associative. This means that (α + β) + γ = α + (β + γ) and (αβ)γ = α(βγ). • Addition with multiplication is distributive. α(β + γ) = αβ + αγ. • There exists an additive and multiplicative identity α + 0 = α, α · 1 = α. • There exists an additive and multiplicative inverse α + (−α) = 0, α(α−1 ) = 1. For example the reals and the complex numbers are fields. The natural numbers are not, the rational numbers are. The set L2 = {0, 1} has the three operations 0 + 0 = 1, 0 + 1 = 1, 1 + 1 = 0. 55 Nitsche and Benner Unit 5. Vector Spaces Vector Space We may define a vector space V over a field F is a set V with operations + and · such that, • v + w ∈ V for any v, w ∈ V. • αv ∈ V for any v ∈ V, α ∈ F. • v + w = w + v for any v, w ∈ V. This is the commutative property of addition. • (u + v) + w = u + (v + w) for any u, v, w ∈ V, which is the associative law of addition. • For each 0 ∈ V contains u + 0 = u, for any u ∈ V. • For each −u ∈ V contains u + (−u) = 0, for any u ∈ V. • (αβ)u = α(βu) for any α, β ∈ F, u ∈ V. • (α + β)u = αu + βu for any α, β ∈ F, u ∈ V. This is the first form of the distributive property. • 1 · u = u, the 1 multiplication identity in F. • α(u + v) = αu + αv for any α ∈ F, and u, v ∈ V. Examples of vector spaces of R is Rn = Rn×1 , Rn×m , Cm×n , all functions such that [0, 1] → R, all polynomials which map R → R. Theorem 5.1. A subset S of a vector space V over F is a vector space over F if • v + w ∈ S, for any v, w ∈ S. • αv ∈ S for any α ∈ F, v ∈ S. Several examples include All continuous functions: [0, 1] → R = C[0, 1], all polynomials of degree n, S = {0} contained in V. Definition 5.2. Let {v1 , . . . , vn } ∈ V, then span{v1 , . . . , vn } = {α1 v1 + α2 v2 + · · · + αn vn , Theorem 5.3. This gives the theorem: The span of {v1 , . . . , vn } is a subspace. Definition 5.4. The set {v1 , . . . , vn } is a spanning set of span{v1 , . . . , vn }. Note the 0 ∈ span{v . . , vn }, and 0 ∈ subspace. 1 , . 1 1 −2 For example, span contained in R2 = span , . This gives rise to 2 2 −4 1 the basis vector , thus the system is one-dimensional. The basis vector is illustrated 2 along with the solution on Figure 5.1. Definition 5.5. A basis for a vector space is a minimal spanning set. Theorem 5.6. Any two passes for a vector space have the same number of elements. Definition 5.7. The number of elements in the basis is equal to the dimension of the space. 56 αk ∈ F}. 5.1. Lecture 14: September 20, 2013 Applied Matrix Theory x2 x1 Figure 5.1. Basis vector of example solution. For example, P2 = {a1 + a2 x + a3 x2 } the basis of this set is {1, x, x2 } and we observe that it must have three dimensions. Therefore, for a polynomial of degree n the dimensions of the polynomial function space are dim(Pn ) = n + 1. As another example, S = {0} = ∅ the basis is the null set, and we have a zero-dimensional system. Thus, zero cannot be an element of a basis. Definition 5.8. A set {v1 , . . . , vn } is linearly independent if α1 v1 + α2 v2 + · · · + αn vn = 0 implies α1 = α2 = · · · = αn = 0. It follows that {0} is not a linearly independent space since, α0 = 0, for any α 6= 0. (5.1.1) Similarly, any set containing 0 is not linearly independent. Examples of function spaces On example is the solutions to y 00 = 0. This is the set {y = αx + b | α, β ∈ R}. The vector space has two dimensions and the basis is {1, x}. Another example is the set of solutions of y 00 = y. The set of solutions is {y = c1 ex + c2 e−x }, which has the twodimensional basis {ex , e−x }. A third example is the set of solutions of y 00 = −y. This set is {y = c1 sin(x) + c2 cos(x)} which is also the two-dimensional space {sin(x), cos(x)}. A final example of interest is y 00 = 2. This gives the solution set {y = x2 + αx + β}. This however is not a vector space because we are restricted by the defined coefficient of x2 being one! This results from the fact that this is a non homogeneous system, unlike the other examples which may be rearranged into homogeneous form. a b 2×2 In the general example of R = , the basis of this system is c d 1 0 0 1 0 0 0 0 , , , . 0 0 0 0 1 0 0 1 57 Nitsche and Benner 5.2 Unit 5. Vector Spaces Lecture 15: September 23, 2013 The four subspaces of Am×n We now define the four fundamental subspaces of Am×n : Rn → Rm . These are: 1. R(A) = {y : y = Ax, x ∈ Rn } ⊂ Rm This is the column space. 2. N(A) = {x ∈ Rn : Ax = 0} ⊂ Rn . This is the null space of A. 3. R(A| ) = {y : y = A| x, x ∈ Rm } ⊂ Rn . This is equivalently, R(A| ) = {y : y| = x| A, x ∈ Rn } ⊂ Rm . This determines why this is the row space of A. | 4. N(A| ) = {x ∈ Rm : A| x = 0 or x| A = 0 } ⊂ Rm . This is called the left null space of A. We want to show that R(A) is a vector space. So we let y1 , y2 ∈ R(A) Then, y1 = Ax1 and y2 = Ax2 for some x1 , x2 . This tells us that y1 + y2 = Ax1 + Ax2 , = A (x1 + x2 ) ∈ R(A). (5.2.1a) (5.2.1b) αy1 = αAx1 , = Aαx1 ∈ R(A). (5.2.2a) (5.2.2b) Also Thus R(A) is a subspace of Rm . An example: Find the spanning 1 2 2 4 A= 1 2 2 4 set for all 4 subspaces of, 1 3 3 1 2 0 2 0 4 4 0 0 1 1 → 3 5 5 0 0 0 0 0 4 2 0 0 0 0 0 0 1 0 (5.2.3) So the row space, 3 1 1 2 0 , , 4 ⊂ R4 . R(A) = span 1 3 5 2 2 0 (5.2.4) To find the column space, we need the solution of the homogeneous equation Ax = 0. x1 = −2x2 − 2x4 , x3 = −x4 , x5 = 0, 58 (5.2.5a) (5.2.5b) (5.2.5c) 5.2. Lecture 15: September 23, 2013 Applied Matrix Theory or −2 −2 1 0 + x4 −1 . 0 x = x2 0 1 0 0 (5.2.6) −2 −2 0 1 5 N(A) = span 0 , −1 ⊂ R . 0 1 0 0 (5.2.7) A → EA : Pm×m Am×n = EA,m×n . (5.2.8) Thus, Now say, We have that Pm×m is square and invertible (it is a product matrix). We also know that PA = EA where the rows EA are a linear combination of rows of A. Similarly, A = P−1 EA has that the rows of A are linearly commutations of the rows of EA or that the row space of A is equal to the row space of EA . So, | | R(A ) = row space of A, 1 0 0 2 0 0 = span 0 , 1 , 0 , 2 1 0 0 0 1 | | | = y : y = A x or y = x A 59 (5.2.9a) (5.2.9b) (5.2.9c) Nitsche and Benner Unit 5. Vector Spaces To find the fourth space, N(A| ), 1 2 1 2 4 2 1 0 3 3 4 5 3 4 5 2 1 0 4 0 → 0 4 0 2 0 1 0 → 0 0 0 1 0 → 0 0 0 1 0 → 0 0 0 1 0 → 0 0 0 2 0 −2 −2 −2 2 1 0 0 0 2 1 0 0 0 2 1 0 0 0 0 1 0 0 0 1 2 0 0 2 −2 , 2 −2 2 4 1 2 −1 1 0 −2 , 0 0 0 0 1 2 −1 1 0 −2 , 0 0 0 0 1 0 −1 0 0 1 , 0 0 0 0 3 0 −1 0 0 1 . 0 0 0 0 (5.2.10a) (5.2.10b) (5.2.10c) (5.2.10d) (5.2.10e) So the solution for A| x = 0, x1 = −3x3 , x2 = x3 , x3 = x3 . or (5.2.11a) (5.2.11b) (5.2.11c) −3 1 x = x3 1 . 0 (5.2.12) −3 1 | N(A ) = span ⊂ R4 . 1 0 (5.2.13) This finally gives us that, 60 5.3. Lecture 16: September 25, 2013 Applied Matrix Theory So the dimension of the row space of A is dim (R(A)) = r, (5.2.14) which is also known as the rank of A. The dimensions of the other spaces are dim (N(A)) = n − r. (5.2.15) | dim R(A ) = r. (5.2.16) | dim N(A ) = n − r. (5.2.17) For Finally, Alternative to fin the left null space of A. That is | | N(A ) = x : x A = 0 . (5.2.18) We use — PA = — b1 .. . br .. . — — — 0 — (5.2.19a) with r rows occupied and n − r zero rows. From this we can use block matrices, P1 P= P2 (5.2.20) P1 P1 A PA = A= . P2 P2 A (5.2.21) So We know that P2 A = 0. So we claim that the rows of P2 span the left null space of A = N(A| ) and | | R(P2 ) = N(A ). (5.2.22) 5.3 Lecture 16: September 25, 2013 Dr. Nitsche is not in town October 18 or Wednesday before thanksgiving. May have to have alternate times for class. 61 Nitsche and Benner Unit 5. Vector Spaces The Four Subspaces of A To recall what we discussed last class, • R(A) is the range of A or the column space. This has dimensions r. • N(A) is the column space of A| = {A| y}. This has dimensions n − r. • R(A| ) is the rowspace transpose of A = {(yA| )| } and is also known as the left range of A.This has dimensions r. • N(A| ) = {x : A| x = 0} = {x : x| A = 0} and this is the left null space of A. This has dimensions m − r. Returning to the manipulation A → EA with PA = EA with Pm×m is invertible. P1 P1 A A= , (5.3.1a) P2 P2 A B1 = , (5.3.1b) 0 where P2 A = 0. Theorem 5.9. | | N(A ) = R(P2 ) (5.3.2) where the right hand side is the rowspace of P2 . Proof. For proof of ⊇, Assume y ∈ R(P|2 ). Then y = P|2 x for some x. Reformuating, y| = x| P2 . So y| A = x| P2 A = x| 0, which gives y ∈ N(A| ). | | | −1 Also assume ⊆, assume y ∈ N(A ), Then y A = 0. This gives y P EA = 0 = U y| [Q1 |Q2 ] . So, 0 = (y| Q1 )Ur×m where we have that U is full rank. This gives 0 y| Q1 = 0. P We know that QP = I. [Q1 |Q2 ] 1 = I, Q1 P1 + Q2 P2 = I, Q1 P1 = I − Q2 P2 . This P2 gives, 0 = y| Q1 P1 = y| (I − Q2 P2 ). So, y| = y| Q2 P2 and so we have | | | (5.3.3) y = P2 Q2 y ∈ R(P ). As an example, 1 2 1 3 2 4 0 4 1 2 3 5 2 4 0 4 3 4 5 2 1 0 0 0 0 1 0 0 0 0 1 0 0 1 2 0 0 0 → 0 0 0 1 0 0 62 0 1 0 0 2 1 0 0 0 0 1 0 0 − 12 0 1 1 1 0 − 23 3 2 1 0 0 − 12 2 1 1 − 3 − 13 0 (5.3.4a) 5.3. Lecture 16: September 25, 2013 Applied Matrix Theory Note that the N(A| ) is orthogonal to R(A). We also find from this manipulation that 1 1 3 2 0 −4 R(A) = span , , (5.3.5) 1 3 5 2 0 2 and 3 −1 | | . N(A ) = R(P2 ) = −1 0 (5.3.6) Linear Independence Definition 5.10. A set {v1 , . . . , vn } is linearly independent if α1 v1 + · · · + αn vn = 0 implies α1 = · · · = αn = 0. From this we get the equivalent statements; • {v1 , . . . , vn } linearly independent, • A = [v1 · · · vn ] has full rank r, • N(A) = {Aα = 0} = {0}. For example we have the polynomial basis set to order n, {1, x, x2 , . . . , xn } which is linearly independent because, c0 + c1 x + c2 x2 + · · · + cn xn = 0 implies that c0 = · · · = xn = 0. As another example we can show that the zero set, {0} is linearly independent. This is because α0 = 0 for any α 6= 0. Any set containing 0, e.g. {v1 , . . . , vn , 0} is linearly dependent. Another example is any set of distinct unit vectors, {ei1 , ei2 , . . . , ein } where ei ∈ Rm and n ≤ m. This is also a linear independent since, 0 0 1 0 0 0 . 1 0 0 A= (5.3.7) 0 1 0 0 0 0 We take as another example the Van der Monde matrix which has applications in polynomial interpolation. Let x1 , . . . , xm be distinct real numbers, 1 x1 x21 · · · x1n−1 1 x2 x2 · · · xn−1 2 2 A= (5.3.8) .. . n−1 1 xm x2m · · · xm where n ≤ m. Then we have Ac = y, where c = [c0 · · · cn−1 ]| . Because p(x1 ) = y1 and p(xm ) = ym . Solution to Ac = y gives a polynomial that interpolates (xk , yk ). For Ac = 0 then we have p with m roots x1 , . . . , xm , but another polynomial of degree n − 1 can only have n − 1 distinct roots since m > n − 1. So p ≡ 0 and therefore c ≡ 0. 63 Nitsche and Benner Unit 5. Vector Spaces y (xn , yn ) (x1 , y1 ) x (xk , yk ) Figure 5.2. Interpolating system. 5.4 Lecture 17: September 27, 2013 Linear functions (rev) Is f linear? Here it was good to find the formula. Some could be done by inspection. Here we also should check f (p1 + p2 ) = f (p1 ) + f (p2 ) and f (αp) = αf (p). So let’s talk about the finding the functions; say the flipping function: 1 0 x f (x, y) = (x, −y) = (5.4.1) 0 −1 y For the mapping of the projection, 1 x+y x+y f (x, y) = , = 12 2 2 2 1 2 1 2 x y (5.4.2) For the rotation, x = r cos(ψ), and y = r sin(ψ). If we denote the shifted with primes, x0 = r cos(ψ + θ) and y 0 = r sin(ψ + θ). We can use identities to get x0 = r(cos ψ cos θ − sin ψ sin θ) = x cos θ − y sin θ and y 0 = r(sin ψ cos θ + cos ψ sin θ) = y cos θ + x cos θ. This gives us the function, 0 x cos(θ) − sin(θ) x f (x, y) = = . (5.4.3) y0 sin(θ) cos(θ) y Note this is a skew symmetric matrix with determinant equal to 1. Review for exam Anything on the first three homework’s is fair game. We have been doing computations of the LU, PLU, REF, RREF. We have solved Ax = b. Writing systems of linear equations in matrix form. We have talked about the elementary matrices and the process of premultiplication as well as there invertibility. 64 5.4. Lecture 17: September 27, 2013 Applied Matrix Theory We have also discussed some proof, especially this last one. We showed this major | | −1 one: tr(AB) = tr(AB), (AB) = B| A| , (AB)−1 = B−1 A−1 , (A−1 ) = (A| ) = A−| . Similarly we have shown that the LU P decomposition exists if all principle submatrices are invertible. The relation (I − A)−1 = nk=0 Ak if Ak → 0. We also discussed (A + B)−1 with perturbation matrices. Finally, we discussed rank one matrices, so we need to know −1 the Sherman–Morrison formula, (I + uv| ) . Previous lecture continued Comment on previous lecture: 1 x1 x21 · · · xn−1 1 1 x2 x2 · · · xn−1 2 2 A= .. . 2 n−1 1 xm xm · · · xm m×n (5.4.4) When we consider Ac = y is equivalent to p(xi ) = yi , where i = 1, · · · , m. Thus we have the equation c0 + c1 xi + c2 x2 + · · · + cn−1 xin−1 = 0, where i = 1, · · · , m, and we have a linear system in the coefficients, ck . m ≥ n. In terms of vectors, these are linearly independent because the set n−1 2 x1 1 x x 1 1 .. .. .. .. (5.4.5) . , . , . , · · · , . , 1 n−1 2 x x x m m m has rank(A) = n. To show that this is linearly independent, we set up the system n−1 x1 1 x1 x21 0 .. .. .. .. .. c0 . + c1 . + c2 . + · · · + cn−1 . = . . 1 xm x2m xn−1 m (5.4.6) 0 Here we must show that we have at least m distinct roots, but p ∈ Pn−1 has at most n − 1 roots. We know this by the fundamental theorem of algebra. So, m > n − 1 and the polynomial must be identically equal to the zero polynomial, p ≡ 0, and ck = 0 for all k. So we want to interpolate the polynomial p(x) ∈ Pn−1 . We set up p(xi ) = yi for i = 1, . . . , m. If n − 1 = m then we will have a unique solution to the interpolation. If instead m > n then we have either no solution or infinitely many solutions. We defined the span of a set as the set ofP all linear combinations that are a vector set over the field of reals: span {v1 , . . . , vn } = { cn vn , cn ∈ R}. The basis for a vector space V is the set {v1 , . . . , vk }, that spans V and is linearly independent. We also know that the basis for {0} is the empty set ∅. Thus, for convenience, we define span {∅} = {0}. Theorem 5.11. If {v1 , . . . , vn } is a linearly independent basis of V, then {u1 , . . . , um } m > n is linearly dependent. 65 Nitsche and Benner 5.5 Unit 5. Vector Spaces Lecture 18: October 2, 2013 Exams and Points We decided that we will have three exams total, but only the best two will each count for 20% of our semester grade. Homework will be worth 60%. Lecture notes will be posted online. Continuation of last lecture Theorem 5.12. If {u1 , . . . , un } spans V and S = {v1 , . . . , vn } ⊂ V with m > n, then S is linearly dependent. P Pn Proof. Consider m i=1 αi v1 = 0. Using vi = j=1 cij uj , m X αi i=1 cij uj = 0, (5.5.1a) j=1 n X m X j=1 i=1 | n X ! uj = 0. αi cij {z (α| C)j (5.5.1b) } Since C|n×m αP= 0 has ranks free, recall there exists α 6= 0 such that C| α = 0. So (C| α)j = 0 for any j so i αi vi = 0. Definition 5.13. A basis of V is a linearly independent spanning set of V. Theorem 5.14. Any two basis have the same number of elements. Equivalent characterizations of basis, • linearly independent spanning set • minimal spanning set • max linearly independent subset of V. Definition 5.15. dim(V) is equal to the number of elements in the basis. Recalling the four subspaces for a matrix, | | | Am×n = a1 a2 · · · an ; | | | m×n • R(A) ⊂ Rm , dim = r; • N(A) ⊂ Rn , dim = n − r; • R(A| ) ⊂ Rn , dim = r; 66 (5.5.2) 5.5. Lecture 18: October 2, 2013 Applied Matrix Theory • N(A| ) ⊂ Rm , dim = m − r. Definition 5.16. If X and Y are two subspaces of V then X + Y = {x + y, x ∈ X , y ∈ Y} . (5.5.3) Is X + Y a subspace? We shall illustrate this in two parts 1. Given z ∈ X + Y, is αz ∈ X + Y? If this is the case, z = x + y and αz = αx + αy ∈ X + Y, where we recalled that the vectors x and y are within their respective sets. 2. Given z1 , z2 ∈ X + Y, is z1 + z2 ∈ X + Y? Here we substitute for the summed vectors of each of the z vectors, (x1 +y1 )+(x2 +y2 ) = (x1 + x2 ) + (y1 + y2 ) ∈ X + Y. Theorem 5.17. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y). Proof. Let BX ∩Y = {z1 , . . . , zk } be the basis for X ∩ Y. Then we can extent the set to bases for X and Y. BX = {z1 , . . . , zk , x1 , . . . , xn } , BY = {z1 , . . . , zk , y1 , . . . , ym } . (5.5.4a) (5.5.4b) We now claim that we have a set S = {z1 , . . . , zk , x1 , . . . , xn , y1 , . . . , ym } = BX +Y . We now consider: does S span X + Y? We let z ∈ X + Y. Then, we know z = x + y for every x ∈ X and y ∈ Y. So, ! ! X X X X z= (5.5.5a) αi zi + β i xi + αi0 zi + γi yi , = i i X αi0 ) zi (αi + i i + X β i xi + i i X γi yi , ∈ span(S). (5.5.5b) i Is S linearly independent? Consider X X X αi zi + β i xi + γi yi = 0, (5.5.6a) X X αi zi + βi xi , ∈ X ∩ Y, γi yi = − | {z } | {z } ∈Y ∈X X = δi zi , X X γi yi + δi zi = 0, X This indicates γi = δi ≡ 0. X X αi zi + βi xi = 0, (5.5.6b) (5.5.6c) (5.5.6d) (5.5.6e) which also indicates αi = δi ≡ 0. 67 Nitsche and Benner Unit 5. Vector Spaces From our example the range was spanned by the vectors, 1 1 3 2 0 , , 4 ⊂ R4 , R(A) = span 1 3 5 2 0 2 −2 −2 1 0 5 N(A) = span 0 , −1 ⊂ R , 0 0 0 0 1 0 0 2 0 0 | R(A ) = span 0 , 1 , 0 ⊂ R5 , 2 1 0 0 0 1 3 −1 | N(A ) = span ⊂ R4 . −1 0 (5.5.7a) (5.5.7b) (5.5.7c) (5.5.7d) Theorem 5.18. (a) R(A) is orthogonal to N(A| ) and (b) R(A) ∪ N(A| ) = {0}. Which means R(A) + N(A| ) = Rm and R(A| ) + N(A) = Rn . Any Am×n gives an orthogonal decomposition of Rn and Rm . Proof. (a) Let y ∈ R(A) gives y = Az for some z. Then x ∈ N(A| ) which means that A| x = 0 and additionally x| A = 0. Considering x| y = x| Az = 0, so x must be orthogonal to y; therefore R(A) ⊥ N(A). (b) If x ∈ R(A) and x ∈ N(A| ), then x| x = 0 which implies xi = 0 and x = 0. 68 UNIT 6 Least Squares 6.1 Lecture 19: October 4, 2013 Least Squares We will now be covering the concept of least squares. If we are given an equation Ax = b, we may multiply by the transpose of the matrix to find the least squares solution; A| Ax = A| b. We will show that this is consistent even if Ax = b is inconsistent. Previously we showed, Theorem 6.1. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X , Y are subspaces of V. We now consider, Theorem 6.2. Given conformal matrices A and B, rank(A + B) ≤ rank(A) + rank(B) . {z } | {z } | {z } | dim(R(A+B)) dim(R(A)) (6.1.1) dim(R(B)) Proof. R(A + B) ⊂ R(A) + R(B) since, if y ∈ R(A + B) then y = (A + B)x, = Ax + Bx, ∈ R(A) + R(B), ⊂ R(A) ⊂ R(B). (6.1.2a) (6.1.2b) Further, dim(R(A + B)) ≤ dim(R(A) + R(B)), = dim(R(A)) + dim(R(B)) − dim(R(A) ∩ R(B)), ≤ dim(R(A)) + dim(R(B)), = rank(A) + rank(B). (6.1.3a) (6.1.3b) (6.1.3c) (6.1.3d) 69 Nitsche and Benner Unit 6. Least Squares Theorem 6.3. rank(AB) = rank(B) − dim(N(A) ∩ R(B)) Proof. Let S = {x1 , . . . , xs } be a basis of N(A) ∩ R(B). Since N(A) ∩ R(B) ⊂ R(B) can extend S to a basis for R(B), BR(B) = {x1 , . . . , xs , z1 , . . . , zt } . (6.1.4) To prove dim(R(AB)) = t we claim {Az1 , . . . , Azt } is a basis for R(AB). First we show that it spans. We let b ∈ R(AB). So b = ABy, for some y where By ∈ R(B). So ! X X b=A αi x1 + βi z i , (6.1.5a) i = X = X i i αi Axi + |{z} =0 βi Azi , X βi Azi , since x1 ∈ N(A). (6.1.5b) i ∈ span(S1 ). (6.1.5c) i P P αi zi ) = 0 0. Rearranging, A ( NextP we show that S2 is lineally independent; i αi Azi = iP P P and i αi zi ∈ N(A) ∩ R(B) since zi ∈ R(B). Thus, i αi zi = i βi xi and i αi z i − P β x = 0. Therefore, α = β = 0 since {z , x } are linearly independent. i i i i i i i Theorem 6.4. Given matrices Am×n and Bn×p , then rank(A) + rank(B) − n ≤ rank(AB) ≤ min(rank(A), rank(B)) (6.1.6) Proof. We will consider the right inequality first and the left inequality second. First, rank(AB) ≤ rank(B). We know that rank((AB)| ) = rank(B| A| ) ≤ rank(A| ) = rank(A) and finally rank((AB)| ) = rank(AB). For the left inequality, N(A) ∩ R(B) ⊂ N(A). Thus, dim(N(A) ∩ R(B)) ≤ dim(N(A)) = n − rank(A). So, rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≥ rank(B) − (n − rank(A)). Theorem 6.5. (1) rank(A| A) = rank(A) and rank(AA| ) = rank(A| ) = rank(A). (2) R(A| A) = R(A| ) and R(AA| ) = R(A). (3) N(A| A) = N(A) and N(AA| ) = N(A| ). Proof. For part (1), rank(A| A) = rank(A) − dim(N(A| ) ∩ R(A)), but N(A| ) ⊥ R(A) and so N(A| ) ∩ R(A) = {0}. Since, if we let x ∈ N(A| ) and x ∈ R(A) then A| x = 0 and x = Ay. Which gives x| x = y| Ax = 0 which implies that x = 0. So dim(N(A| ) ∩ R(A)) = 0. to be continued... 6.2 Lecture 20: October 7, 2013 We will have two weeks for the next homework. 70 6.2. Lecture 20: October 7, 2013 Applied Matrix Theory Properties of Transpose Multiplication In review we covered the following theorems last time: Theorem 6.6. dim(X + Y) = dim(X ) + dim(Y) − dim(X ∩ Y), where X , Y are subspaces of V. We also had the theorem, Theorem 6.7. rank(AB) = rank(B) − dim(N(A) ∩ R(B)) And finally we showed the relation Theorem 6.8. rank(AB) = rank(B) − dim(N(A) ∩ R(B)) We left off at the theorem covering multiplication relations and the rank and dimensions of the matrix, Theorem 6.9. (1) rank(A| A) = rank(A) and rank(AA| ) = rank(A| ) = rank(A). (2) R(A| A) = R(A| ) and R(AA| ) = R(A). (3) N(A| A) = N(A) and N(AA| ) = N(A| ). We proved the first one using the third of the theorems above. We now prove the second and third parts of this theorem. Proof. For part 2, Let y ∈ R(A| A) then A| Ax = y for some x. So y = A| z for some z. Thus y ∈ R(A| ) So R(A| A) ⊂ R(A| ), since dim(R(A| A)) = dim(R(A| )) and R(A| A) = R(A| ). This is because BR(A| A) ⊂ BR(A| ) but since these have the same number of elements so BR(A| A) = BR(A| ) . For the third part we want to show that the basis are contained in the other and then we compare the domimensions. So we let x ∈ N(A) the Ax = 0 or A| Ax = 0 and x ∈ N(A| A) so N(A) ⊂ N(A| A). But also dim(N(A)) = n + r or dim(N(A| A)) = n − r therefore the two sets must be the same; N(A) = N(A| A). The Normal Equations Definition 6.10. The normal equations for a system Ax = b is | | A Ax = A b. (6.2.1) Theorem 6.11. For any A, A| Ax = A| b is consistent. Proof. RHS in R(A| ); by the previous theorem RHS ∈ R(A| A) for every x such that A| Ax = RHS. Note: the solution to the normal equation is unique when rank(A) = n. 71 Nitsche and Benner Unit 6. Least Squares Example 6.12. Fit (xi , yi ), i = 1, . . . , m by a polynomial of degree 2. So p(x) = c0 + c1 x + c2 x2 , where m > 3. Our problem it solve is p(xi ) = yi , i = 1, . . . , m or c0 + c1 xi + ci x2i = yi , i = 1, . . . , m. The system is therefore linear in the system of the unknowns c0 , c1 , and c2 . We can write this in matrix form, 1 x1 x21 y1 1 x 2 x 2 c0 y2 2 (6.2.2) c1 = .. .. . . c 2 1 xm x2m ym or alternatively we have the system Ac = that 1 1 1 y. What is the rank of the matrix A? We know x1 x21 x2 x22 .. . 2 xm xm is invertible, since Ac = 0 implies that c = 0. So, p(x1 ) p(x2 ) Ac = .. . . p(xm ) (6.2.3) Now, 3 ≤ m − 1 and we know that 1 x1 x21 1 x2 x2 2 rank =3 .. . 2 1 xm xm (6.2.4) and A| Ax = A| b has a unique solution. To solve the normal equations, 1 x1 x21 y1 c0 1 1 1 y 1 1 1 1 x x22 2 2 x1 x2 · · · xm c1 = x1 x2 · · · xm .. , .. . . x21 x22 x2m c2 x21 x22 x2m 2 1 xm xm ym P P 2 P c0 Pm Pxi Pxi 3 P yi xi x2i x c = 1 P 2 P 3 P i4 P x2i yi xi xi xi c2 xi y i (6.2.5a) (6.2.5b) Suggestion: have an outline of the major proofs we have shown in class in your mind. Go back and give them a study over. | Theorem 6.13. A| Ax = A| b gives x which minimizes kAx − bk22 = (Ax − b) (Ax − b). 72 6.2. Lecture 20: October 7, 2013 Applied Matrix Theory b N (A) Ax Figure 6.1. Minimization of distance between point and a plane. y x Figure 6.2. Parabolic fitting by least squares By corollary this is an if and only if statement. Here every solution the normal Pof m equations minimizes the sum of the squares of the entries of the vector i=1 (Ax − b)2i . P Note here kxk22 = x| x = x2i . We illustrate this in Figure 6.1 where the minimal line connecting a point to a plane is shown. Example 6.14. What does the solution the normal equations minimize from Pto P P our example? m 2 2 The solution c0 , c1 , and c2 minimizes i=1 (Ac − y)i = ((Ac)i − yi ) = (p(xi ) − yi )2 . We can visualize our parabolic least squares method as shown in Figure 6.2. Exam 1 We had a range from 36–98, with a median of 66. For this exam: 70–100 is an A-range score, 50–70 is about a B, and below is a C (as long as as you are showing involvement in the class). First two problems went fine; four was covered in class, five was on the homework, we will confer the solution of the sixth problem in class next time. 73 Nitsche and Benner 6.3 Unit 6. Least Squares Lecture 21: October 9, 2013 Need to have a couple classes early because of missing next Friday. So, next Monday and Wednesday we will start at 8:35. We will review problem 6 from the exam, then finish up least squares; cover linear dependence and finally linear transformations. Exam Review We review exam problems 6. Given u, v ∈ Rn . (a) Show A = I + uv| is A−1 = I + αuv| . Find α. So we check that AA−1 = A−1 A = I. Now | | AA−1 = I + αuv I + αuv , (6.3.1a) | | | | (6.3.1b) = I + αuv + uv + uv αuv , | | | | = I + αuv + uv + αu v u v , (6.3.1c) | | | | = I + αuv + uv + α v u uv , (6.3.1d) | | = I + uv 1 + α(1 + v u) . (6.3.1e) This is equal to I if 1 + α(1 + v| u) = 0 or when α = 1+v1 | u . Thus, the Sherman–Morrison formula is, 1 | | −1 (6.3.2) I + uv =I− | uv . 1+v u | | For part (b) B = A + αêi êj = A I + αA−1 êi êj where A is invertible. For the inverse of B: | −1 −1 B−1 = I + αA−1 êi êj A , (6.3.3a) # " 1 | αA−1 êi êj A−1 . (6.3.3b) = I+ | −1 1 + êj αA êi | This exists if 1 + αêj A−1 êi 6= 0 and can make α sufficiently small. | {z } A−1 ji Least squares and minimization | Theorem 6.15. x solves A| Ax = is equivalent to x minimizes (Ax − b)| (Ax − b) = PA b | 2 2 kAx − bk2 , where kxk2 = xx = i x2i . Note: f (x) = f (x1 , x2 , . . . , xn ), | = (Ax − b) (Ax − b), | | (6.3.4a) (6.3.4b) | = (x A − b )(Ax − b), | | | | | (6.3.4c) | = x A Ax − x A b − b Ax + b b. 74 (6.3.4d) 6.3. Lecture 21: October 9, 2013 Applied Matrix Theory | | For scalars x, we have that x| = x. So, v| Ax = b Ax = x| A| b. This manipulates our previous result to, | | | | | = x A Ax − 2x A b + b b. (6.3.5) This is a quadratic form and the minimum occurs when ∂f ∂xi = 0. Proof. To prove from the right hand side to the left; suppose x minimizes f (x), then ∂f , ∂xi ∂x ∂x| | ∂x| | | | A Ax + x A A −2 A b, = ∂xi ∂xi ∂xi | | | | = 2êi A Ax − 2êi A b. (6.3.6a) 0= (6.3.6b) (6.3.6c) This gives us | | | | êi A Ax = êi A b (6.3.7) and | | (A Ax)i = (A b)i , any i. (6.3.8) This finally means that we have formulated equivalently to A| Ax = A| b. ASIDE: ∂ ∂u ∂v (uv) = v+u . ∂xi ∂xi ∂xi (6.3.9) To prove going the other direction, suppose that x solves A| Ax = A| b then show that f (x) < f (y) for any y 6= x. First, we consider | | | | | | | | f (y) − f (x) = y A Ay − 2y |{z} A b +b b − x A Ax − 2x |{z} A b +b b, | | A Ax | | (6.3.10a) | A Ax | | | = y A Ay − 2yA Ax − x A Ax − 2xA Ax, | (6.3.10b) = (Ay − Ax) (Ay − Ax) , (6.3.10c) x)k22 , (6.3.10d) (6.3.10e) = kA (y − ≥ 0. If A has full rank (no nontrivial null space), then this must be greater than zero. So any solution to the normal equations minimizes this norm, or and solution A| Ax = A| b minimizes kAx − bk22 . Further, if A has full rank then we are guaranteed a unique least squares solution x. Finally, if A has a nontrivial null space (r < n) then we have infinitely many least squares solutions. In Matlab we can do help \ to find out what solution it gives for underdetermined solutions. What does it minimize? 75 Nitsche and Benner 6.4 Unit 6. Least Squares Homework Assignment 4: Due Monday, October 21, 2013 1. Textbook 4.1.1: Vector spaces, subspaces, fundamental subspaces of a matrix. Determine which of the following subsets of Rn are in fact subspaces of Rn (n > 2). (a) {x | xi ≥ 0}, (b) {x | x1 = 0}, (c) {x | x1 x2 = 0}, n P o n (d) x x = 0 , j=1 j n P o n (e) x x = 1 , j j=1 (f) {x | Ax = b, where Am×n 6= 0 and bm×1 6= 0}. 2. Textbook 4.1.2 Determine which of the following subsets of Rn×n are in fact subspaces of Rn×n . (a) (b) (c) (d) (e) (f) (g) (h) (i) The symmetric matrices. The diagonal matrices. The nonsingular matrices. The singular matrices. The triangular matrices. The upper-triangular matrices. All matrices that commute with a given matrix A. All matrices such that A2 = A. All matrices such that tr(A) = 0. 3. Textbook 4.1.6 Which of the following are spanning sets for R3 ? (a) (b) (c) (d) (e) 1 1 1 1 1 1 0 0 2 2 1 , 0 , 0 0 , 0 1 , 2 1 , 2 0 1 0 0 1 , 0 , 0 0 1 , 1 1 1 , −1 , 4 4 1 , −1 , 4 4 0 . 4. Textbook 4.1.7 For a vector space V, and for M, N ⊆ V, explain why span(M ∪ N ) = span(M) + span(N ). 76 6.4. HW 4: Due October 21, 2013 Applied Matrix Theory 5. Textbook 4.2.1 Determine spanning sets for each of the four 1 2 A = −2 −4 1 2 fundamental subspaces associated with 1 1 5 0 4 −2 . 2 4 9 6. Textbook 4.2.3 Suppose that A is a 3 × 3 matrix such that 1 1 −2 R = 2, −1 and N = 1 3 2 0 spanR(A) and N(A), respectively, and consider a linear system Ax = b, where 1 b = −7. 0 (a) Explain why Ax = b must be consistent. (b) Explain why Ax = b cannot have a unique solution. 7. Textbook 4.2.7 A1 If A = is a square matrix such that N(A1 ) = R(A|2 ), prove that A must be A2 nonsingular. 8. Textbook 4.2.8 Consider a linear system of equations Ax = b for which y| b = 0 for every y ∈ N(A| ). Explain why this means the system must be consistent. 9. Textbook 4.3.1(abc): Linear independence, basis. Determine which of the following sets are linearly independent. For those sets that are linearly dependent, write one of the vectors as a linear combination of the others. 2 1 1 (a) 2, 1, 5 3 0 9 1 2 3 , 0 4 5 , 0 0 6 , 1 1 1 (b) 1 2 3 2 , 0 , 1 (c) 1 0 0 10. Textbook 4.3.4 Consider a particular species of wild flower in which each plant has several stems, leaves, and flowers, and for each plant let the following hold. 77 Nitsche and Benner Unit 6. Least Squares S = the average stem length (in inches). L = the average leaf width (in inches). F = the number of flowers. Four particular plants are examined, and the matrix: S #1 1 #2 2 A= #3 2 #4 3 information is tabulated in the following L 1 1 2 2 F 10 12 15 17 For these four plants, determine whether or not there exists a linear relationship between S, L, and F . In other words, do there exist constants α0 , α1 , α2 , and α3 such that α0 + α1 S + α2 L + α3 F = 0? 11. Textbook 4.3.13 Which of the following sets of functions are linearly independent? (a) {sin(x), cos(x), x sin(x)}. (b) {ex , xex , x2 ex }. (c) sin2 (x), cos2 (x), cos(2x) . 12. Textbook 4.4.2 Find a basis for each of the four fundamental subspaces associated with 1 2 0 2 1 A = 3 6 1 9 6 2 4 1 7 5 (6.4.1) 13. Textbook 4.4.8 Let B = {b1 , b2 , . . . , bn } be a basis for a vector space V. Prove that each v ∈ V can be expressed as a linear combination of the bi ’s, v = α1 b1 + α2 b2 + · · · + αn bn , in only one way—i.e., the coordinates αi are unique. 14. Textbook 4.5.5 For A ∈ Rm×n , explain why A| A = 0 implies A = 0. 15. Textbook 4.5.8 Is rank(AB) = rank(BA) when both products are defined? Why? 16. Textbook 4.5.14 P Prove that if the entries of Fr×r satisfy rj=1 |fij | < 1 for each i (i.e., each absolute row sum < 1), then I + F is nonsingular. Hint: Use the triangle inequality for scalars |α + β| ≤ |α| + |β| to show N(I + F) = 0. 17. Textbook 4.5.18 If A is n × n, prove that the following statements are equivalent: 78 6.4. HW 4: Due October 21, 2013 Applied Matrix Theory (a) N(A) = N(A2 ) (b) R(A) = R(A2 ) (c) R(A) ∩ N(A) = {0} 18. Textbook 4.6.1: Least Squares. Hookes law says that the displacement y of an ideal spring is proportional to the force x that is applied—i.e., y = kx for some constant k. Consider a spring in which k is unknown. Various masses are attached, and the resulting displacements shown in the figure are observed. Using these observations, determine the least squares estimate for k. 19. Textbook 4.6.2 Show that the slope of the line that passes through the origin in R2 and comes closest in the least squares to passing through the points {(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )} P sense P is given by m = i xi yi / i x2i . 20. Textbook 4.6.6 After studying a certain type of cancer, a researcher hypothesizes that in the short run the number (y) of malignant cells in a particular tissue grows exponentially with time (t). That is, y = α0 eα1 t . Determine least squares estimates for the parameters α0 and α1 from the researchers observed data given below. t (days) y (cells) 1 16 2 27 3 45 4 74 5 122 Hint: What common transformation converts an exponential function into a linear function? 79 Nitsche and Benner Unit 6. Least Squares 80 UNIT 7 Linear Transformations 7.1 Lecture 22: October 14, 2013 Theorem 7.1. Given a vector space V. If {u1 , . . . , un } spans V and {vi }m i=1 ⊂ V, then m {vi }i=1 is linearly dependent if m > n (because there is more vectors in the set). Pn Pm Pn Pm Proof. Consider j=1 cij uj = 0 and i=1 αi j=1 cij uj then i=1 αi vi = 0. Use vi = n X | Pm α · · · α So c| α = 0 has nonzero α c u = 0 If we consider α = 1 n i ij j i=1 j=1 | {z } (α| C)j =(C| α)j solutions α, since m−n > 0 free variables. So for every αi 6= 0, c| α = 0 and P αi vi = 0. Any two bases for V have the same number of elements. Definition 7.2.Let V be a vector space with basis B = {b1 , . . . , bn } The coordinates of c1 P .. x ∈ V are cj = . such that x = nj=1 cj bj . cn c1 .. Theorem 7.3. Coordinates of x ∈ V with respect to the basis B are unique. [x]B = . . cn Example 7.4. We take as an example a vector x ∈ R3 , 1 x = 2 , 3 (7.1.1a) = 1ê1 + 2ê2 + 3ê3 , (7.1.1b) = ı̂ + 2̂ + 3k̂. (7.1.1c) 81 Nitsche and Benner Unit 7. Linear Transformations with the standard bidis in Rn = {ê1 , . . . , ên } = S or 1 2 = [x]S 3 (7.1.2) We can have another basis for R3 ; 1 2 1 1 , 1 , 0 = B 0 1 0 (7.1.3) This is linearly independent because the matrix 2 1 1 0 1 1 0 0 1 is nonsingular. So −1 [x]B = 3 − 12 (7.1.4) c1 Now find c2 such that c3 2 1 1 c1 1 + c2 1 + c3 0 = 1ê1 + 2ê2 + 3ê3 . 0 1 0 (7.1.5) In matrix form, 1 Bc = 2 , 3 1 1 2 c1 1 1 1 0 c2 = 2 . 0 1 0 c3 3 (7.1.6a) (7.1.6b) Solving for the individual variables, c2 = 3; c1 = 2 − c1 , = −1; 2c3 = 1 − 3 + 1, 1 c3 = − . 2 (7.1.7a) (7.1.7b) (7.1.7c) (7.1.7d) (7.1.7e) Summary • For any vector space, V, there exists a basis B. • Any x ∈ V is represented uniquely by a tuple of numbers, the coordinates [x]B . 82 7.1. Lecture 22: October 14, 2013 Applied Matrix Theory Linear Transformations Definition 7.5. Given the vector spaces U, V, a map T : U → V such that, • T(x + y) = T(x) + T(y) • T(αx) = αT(x) is a linear transformation of U → V. We also recognize that a linear transformation is a linear function on vector spaces. Definition 7.6. A linear transformation U → U is a linear operator on U. Our goal now is two fold: • Show that the set of all linear transformations U → V is a vector space L(U, V). • Find the basis and coordinate unit basis of any T ∈ L(U, V). Examples of Linear Functions Example 7.7. T(x) = Am×n xn×1 so T : Rn → Rm . • Rotation A = R(θ) • projection • reflection Example 7.8. f (x) = ax, f : R → R df , D : Pn → dx ´b = a f (x) dx, I Example 7.9. D(f ) = Pn−1 or D : C 1 → set of all functions. Example 7.10. I(f ) : C0 → R Example 7.11. One final example regarding matrices, T(Bn×k ) = Am×n Bn×k , T : Rn×k → Rm×k . Matrix representation of linear transformations Every linear transformation on finite dimensional spaces has a matrix representation. Suppose T : U → V and B = {u1 , . . . , un } forms the basis for U and B 0 = {v1 , . . . , vn } forms the basis for V. Then the action of T on U is ! n X T(u) = T ξi ui , (7.1.8a) i=1 = = = n X i=1 n X ξi T (ui ) , ξi n X αij vj , (7.1.8c) αij ξi vj , (7.1.8d) i=1 j=1 n n XX i=1 j=1 where αij describes the action of T. 83 (7.1.8b) Nitsche and Benner Unit 7. Linear Transformations Theorem 7.12. The set of all linear transformations T : U, V = L(U, V) is a vector space. Proof. Given T1 , T2 ∈ L(U, V), then (T1 + T2 ) x = T1 x + T2 x and T1 + T2 ∈ L(U, V). Further (αT1 )x = αT1 (x) which gives αT1 ∈ L(U, V). Some other properties of note: 0x = 0 and 0 ∈ L(U, V); (T1 − T1 ) = 0, etc. Theorem 7.13. Given U with basis B = {u1 , . . . , un } and V with basis B 0 = {v1 , . . . , vn } then a basis for L(U, V) is {Bij }i=1,...,n; j=1,...,m , where Bij : U → V by Bij (u) = ξi vj where P u = nk=1 ξk uk . It follows that dim(L(U, V)) = dim(U) dim(V) = nm. P Proof. Let’s prove linear independence: Consider ηij Bij = 0, then ! X 0= ηij Bij (uk ), (7.1.9a) ij = X ηij (Bij uk ), (7.1.9b) ηkj vj (7.1.9c) ij = X j ASIDE: Note that Bij uk = ξi vj = 0, i 6= k; vj , i = k With [uk ]B = 0 kth position. ··· 1 ··· | 0 , with the 1 at the Since {vj } are linearly independent it follows that ηkj ≡ 0 for all j and each k. Therefore Bij are linearly independent. 7.2 Lecture 23: October 16, 2013 The next major things we are going to try to cover are: • Basis for L(U, V) coordinates for T ∈ L(U, V) • Action of T • Change of coordinates of u ∈ U under change of basis • Change of coordinates of T ∈ L(U, V) under change of basis Basis of a linear transformation The linear set, L(U, V) = {T : U → V | T linear transformation} (7.2.1) Theorem 7.14. Bji : U → V by Bji u = ξiP vj where B = {u1 , . . . , un } is a basis for U and 0 B = {v1 , . . . , vn } is a basis for V and u = nk=1 ξk uk . Also, {Bij } are basis for L(U, V). 84 7.2. Lecture 23: October 16, 2013 Applied Matrix Theory Proof. First, we observe that we have linear independence. Second we check the span. If we let T ∈ L(U, V), then X T(u) = T( ξj uj ), (7.2.2a) j = X ξj T(uj ), (7.2.2b) j = X ξj j Here we recognize that T(uj ) = Pm i=1 m XX j = i=1 j i (7.2.2c) αij ξj vi , |{z} (7.2.2d) m XX j P P αij vi . i=1 αij vi . = for any u. Thus, T = m X Bij (u) ! αij Bij (u). (7.2.2e) i=1 αij Bij ; so {Bij } spans L(U, V). It follows that [T]BB0 = {αij } , α11 α12 α1n α21 α22 · · · α2n = .. .. , . . . . . αm1 αm2 · · · αmn = [T(u1 )]B0 [T(u2 )]B0 · · · [T(un )]B0 . (7.2.3a) (7.2.3b) (7.2.3c) If T : U → U is a linear operator that goes to the same space then [T]BB = [T]B for convenience. Example 7.15. Let D : Pn → Pn−1 by D(p) = 85 dp . dx Our basis is B = {1, x, . . . , xn } and we Nitsche and Benner Unit 7. Linear Transformations also have the operated basis B 0 = {1, x, . . . , xn−1 }. So, [D(1)]B0 = [0]B0 , 0 .. = . ; (7.2.4a) (7.2.4b) 0 [D(x)]B0 = [1]B0 , 1 0 = .. ; . 0 D(x2 ) B0 = [2x]B0 , 0 2 = 0 ; .. . 0 n−1 n [D(x )]B0 = nx 0 , B 0 .. = .. 0 n This allows us to represent the differentiation operator by the matrix, 0 1 0 0 0 0 0 2 0 · · · 0 .. . 0 [D]BB0 = 0 0 0 3 . . . . . . . . . .. .. 0 0 0 0 · · · n n×(n+1) (7.2.4c) (7.2.4d) (7.2.4e) (7.2.4f) (7.2.4g) (7.2.4h) (7.2.5) dp . This will be the same as the previous Example 7.16. Let D : Pn → Pn by D(p) = dx example except we will add a row of zeros at the bottom and give us a square matrix. 0 1 0 0 0 0 0 2 0 · · · 0 .. . 0 0 0 0 3 [D]B = . . (7.2.6) . . . . .. .. . . . 0 0 0 0 · · · n 0 0 0 0 0 (n+1)×(n+1) 86 7.2. Lecture 23: October 16, 2013 Applied Matrix Theory We may do this for any operator. For example we could do this for projection. What we want is to find a basis that gives us a nice representation of the operator. Highly sparse basis are nice. Action of linear transform The action of T : U → V. Recall, T(u) = T n X ! ξj uj , (7.2.7a) j=1 = = = n X j=1 n X ξj T (uj ) , ξj m X j=1 i=1 n X m X j=1 i=1 | (7.2.7b) αij vi , (7.2.7c) ! vi αij ξj {z [Aξ]i (7.2.7d) } This gives us the coordinates of the V basis. [T(u)]B0 = Aξ, = [T]BB0 [u]B . (7.2.8a) (7.2.8b) Thus the action is represented by matrix multiplication. Now return to our example, dp . Our basis is B = {1, x, . . . , xn } and we Example 7.17. Let D : Pn → Pn−1 by D(p) = dx also have the operated basis B 0 = {1, x, . . . , xn−1 }. If we consider p(x) = α0 +α1 x+· · ·+αn xn and D(p(x)) = α1 + 2α2 x + · · · + nαn xn−1 . This gives our vector representation of α1 2α2 3α3 [D(p)]B0 = (7.2.9a) , .. . nαn α0 0 1 0 0 0 0 0 2 0 · · · 0 α1 α2 .. . 0 = 0 0 0 3 (7.2.9b) α3 . . . . . . . . . .. .. .. . 0 0 0 0 ··· n αn It follows that [L + T]BB0 = [L]BB0 + [T]BB0 and [αL]BB0 = α [L]BB0 . We may also consider the composition of linear operators. Say L(T(x)) = (LT)(x), also [LT]BB00 = [L]BB0 [T]B0 B00 . 87 Nitsche and Benner Unit 7. Linear Transformations Change of Basis If we change the coordinates of our system when given vector space U. Let B = {u1 , . . . , un } is a basis for U and B 0 = {v1 , . . . , vn } be two bases for U. The relation between [u]B and [u]B0 is given by [u]B0 = P [u]B . (7.2.10) P is called the change of basis matrix from B to B 0 . Recall, the coordinates of [T(u)]B0 = [T(u)]BB0 [u]B . Clearly P is [T(u)]BB0 when T = I or P = [I(u)]BB0 . We will use our differentiation operator as an example once more. Example 7.18. Given U = P2 we have the bases B = {1, x, x2 } and B 0 = {1, 1 + x, 1 + x + x2 }, then (7.2.11a) [I(u)]BB0 = [I(u1 )]B0 [I(u2 )]B0 [I(u3 )]B0 , = [u1 ]B0 [u2 ]B0 [u3 ]B0 , (7.2.11b) 1 −1 0 1 −1, = 0 (7.2.11c) 0 0 1 = P. (7.2.11d) We know this is true for any u. We can find the representation of the polynomial p(x) = 3 + 2t + 4t2 in the [p]B0 . So, 1 −1 0 3 1 −1 2, [p]B0 = 0 (7.2.12a) 0 0 1 4 1 = −2. (7.2.12b) 4 Finally, let U be a vector space with basis B = {u1 , . . . , un } and B 0 = {v1 , . . . , vn }. Then if we have T : U → U. We know the relation between [T]B and [T]B0 and we may let P = [I]BB0 . We have, [T(u)]B0 = [T]BB0 [u]B , = A [u]B . (7.2.13a) (7.2.13b) [u]B0 = P [u]B , [T(u)]B0 = P [T(u)]B . (7.2.14a) (7.2.14b) P [T(u)]B = A . . . (7.2.15) Further we have So to be continued. . . Note: No class Friday. 88 7.3. Lecture 24: October 21, 2013 7.3 Applied Matrix Theory Lecture 24: October 21, 2013 Change of Basis (cont.) If we have T : U → U, let U be a vector space with basis B = {u1 , . . . , un } and B 0 = {v1 , . . . , vn }. 1. Basis for L(U, V) = {Bij : Bij u= ξi vj , where u = [Tu1 ]B0 [Tu2 ]B0 · · · [Tun ]B0 . P k ξk uk } coordinates of T, [T] = 2. Achar of T [T(u)]B0 = [T]BB0 [u]B . 3. Given x ∈ U with B, B 0 are two bases for U, then [x]B0 = P [x]B . and P = [I]BB0 . 4. T : U → U with B, B 0 are two bases for U, then we want to relate [T]B and [T]B0 . To show property 4, [Tu]B0 = [T]B0 B0 [u]B0 , [Tu]B = [T]BB [u]B . (7.3.1a) (7.3.1b) [Tu]B0 = P [Tu]B , (7.3.2a) [u]B0 = P [u]B . (7.3.2b) P [Tu]B = · · · (7.3.3a) [T]BB = P−1 [T]B0 B0 P, [T]B = P−1 [T]B0 P. (7.3.4a) (7.3.4b) But also, Considering P = [I]BB0 So And we get, The matrix representation of T under different basis are self similar. Definition 7.19. If A = C−1 BC for some C, then A and B are self-similar (A, B, C ∈ Rn×n ). Theorem 7.20. Given any two self-similar matrices A, B, they represent the same linear transformation under two different bases. 89 Nitsche and Benner Unit 7. Linear Transformations Example 7.21. Example illustrating the self-similarity: [T]B = P−1 [T]B0 P. Let T ∈ L(U, U) be defined by 0 1 x Tu = (7.3.5) −2 3 y where u = xu1 + yu2 . Tu = y −2x + 3y = yu1 + (−2x + 3y)u2 . (7.3.6) In basis notation we may consider this, [Tu]B = M [u]B . (7.3.7) 1 1 Now let’s consider a different basis. Let S = {ê1 , ê2 } and S = , . Now 1 2 0 [T]S = [Tê1 ]S [Tê2 ]S , 0 1 = , −2 S 3 S 0 1 = , −2 3 = M. (7.3.8a) (7.3.8b) (7.3.8c) (7.3.8d) Now in our different basis, [T]S 0 1 1 T T = , 1 S 2 S 1 2 = , 1 S0 4 S0 1 0 = . 0 2 (7.3.9a) (7.3.9b) (7.3.9c) This helps us by diagonalizing the operator. Now we want to find P, P = [I]BB0 , (7.3.10a) = [Tu1 ]B0 [Tu2 ]B0 , 1 0 = , 0 B0 1 B0 2 −1 = . −1 1 (7.3.10b) (7.3.10c) (7.3.10d) Similarly, −1 P 1 1 = . 1 2 90 (7.3.11) 7.4. Lecture 25: October 23, 2013 Applied Matrix Theory We can verify this, P −1 1 1 1 0 2 −1 [T]S 0 P = , 1 2 0 2 −1 1 1 1 2 −1 = , 1 2 −2 2 0 1 = . −2 3 (7.3.12a) (7.3.12b) (7.3.12c) So this checks out. Example 7.22. Let M ∈ L(U, V) defined by [M(u)]S = M [u]S where S is the standard basis. Then [M]S = M, (7.3.13a) = [Mê1 ]S [Mê2 ]S · · · [Mên ]S (7.3.13b) and we define S 0 = {q1 , . . . , qn }. When we have Q = [I]S 0 S , [M]S 0 = Q−1 MQ, (7.3.14a) = [q1 ]S [q2 ]S · · · [qn ]S , = q1 q2 · · · qn . (7.3.14b) (7.3.14c) . Now let A = Q−1 BQ with S = {ê1 , . . . , ên } and S 0 = {q1 , . . . , qn } and Let L(u) = Bu. [L]S = B and [I]S 0 S = Q so [L]S 0 = Q−1 BQ. If T ∈ L(U, U) and X ⊂ U such that T(X ) ⊂ X where T(X ) = {T(x) such that x ∈ X } then X is an invariant subspace of U under T. Example 7.23. If (λ, v) are an eigen-pair of A then (λI − A) v = 0, λv = Av. (7.3.15a) (7.3.15b) and span{v} is an invariant subspace under A. 7.4 Lecture 25: October 23, 2013 Properties of Special Bases If we consider B and B 0 as bases for U with operation T : U → U. Then we have, [T]BB0 = [T(u1 )]B0 · · · [T(un )]B0 , (7.4.1a) [T]B = [T(u1 )]B · · · [T(un )]B , (7.4.1b) −1 = P [T]B0 P (7.4.1c) 91 Nitsche and Benner Unit 7. Linear Transformations And we also have P = (I)BB0 . We consider T on Rn , T(x) = Ax and [T]S = A. So A = P−1 BP for appropriate B and P, with B = [T]B0 Note: A tuple is an ordered set of numbers. Now we have two goals: 1. Find a basis such that [T]B is simple 2. FInd invariant quantities Example 7.24. tr(P−1 BP) = tr(BPP−1 ) = tr(B) Example 7.25. For T : Pn → Pn by T(p) = Dp , 0 1 0 ··· 0 0 2 [T]B = . .. 0 0 ··· 0 0 .. . n 0 (7.4.2) tr(T) = 0 Example 7.26. rank(P−1 BP) = rank(B) Example 7.27. Nilpotent operator of index k N Nk = 0, but Nk−1 6= 0. : U → U 2such thatk−1 On the homework we will have to show that x, Nx, N x, . . . , N x a basis for Rk and x is defined such that Nk−1 (x) 6= 0. So, 0 0 ··· 0 . . . .. . 1 0 (7.4.3) [N]B = . . =J .. . . 0 0 0 ··· 1 0 Example 7.28. An idempotent operator E : U → U has the property E2 = E. This is because which can only return the same answer if done twice. these are projection operators B = x1 , . . . , xr , y1 , . . . , yn−r . {z } | {z } | BR(E) BN(E) Ir×r 0 [E] = (7.4.4) 0 0 Example 7.29. If A has a full set of ê-vectors qj , j = 1, . . . , n. Then, Aqj = λj qj with bases S, P. So [I]PS = q1 , . . . , qn , (7.4.5a) = Q, (7.4.5b) −1 [T]P = Q [T]S Q, (7.4.5c) −1 Λ = Q AQ (7.4.5d) 92 7.4. Lecture 25: October 23, 2013 Applied Matrix Theory So λ1 0 [T]P = . .. 0 ··· ... 0 .. λ2 . , ... ... 0 · · · 0 λn 0 T(x) = Ax. (7.4.6a) (7.4.6b) Invariant Subspaces Let T be a linear operator T : U → U. Definition 7.30. A subset X ⊂ U is invariant under T if Tx ∈ X for any x ∈ X (or T(X ) ⊂ X ). Also T1x : X → X . Example 7.31. Given T(x) = Ax, −1 −1 −1 −1 0 −5 −16 −22 . = 0 3 10 14 4 8 12 14 (7.4.7a) (7.4.7b) 2 −1 −1 2 X = span {q1 , q2 } where q1 = 0 and q2 = −1. Show that X is invariant under T. 0 0 So, −1 −15 T(q1 ) = (7.4.8a) −3, 0 = q1 + 3q2 ∈ span(X ); 0 6 T(q2 ) = −4, 0 = 2q1 + 4q2 ∈ span(X ). (7.4.8b) (7.4.8c) (7.4.8d) So for any T(α1 q1 +α2 q2 ) = α1 T(q1 )+α2 T(q2 ) ⊂ X . So for T : R4 → R4 with T1x : X → X , 1 2 [T1x ]q1 ,q2 = . (7.4.9) 3 4 93 Nitsche and Benner Unit 7. Linear Transformations 1 0 0 0 Now say we have [T]P , P = q1 , q2 , 0 , 0. Then, 0 1 1 3 [T]P = 0 0 2 x x 4 x x 0 −1 x 0 4 x (7.4.10) So we have gained some zero elements. This is since, 1 −1 0 0 T 0 = 0 , 0 4 0 −1 0 −12 T 0 = 14 1 14 (7.4.11a) (7.4.11b) Now if X , Y are subspaces of U and are invariant under T; T(X ) ⊂ X and T; T(Y) ⊂ Y. and X + Y = U. Then B = x1 , . . . , xr , y1 , . . . , yn−r . [T]B = [T(x1 )]B · · · [T(xr )]B [T(y1 )]B · · · [T1x ]Bx 0 = , 0 [T1y ]By = Q−1 AQ. 7.5 T(yn−r ) B , (7.4.12a) (7.4.12b) (7.4.12c) Homework Assignment 5: Due Monday, November 4, 2013 1. Explain how we proved in class that, for any A ∈ Rm×n , the linear A| Ax = Ab is consistent. Do not reproduce all proofs, but outline the train of thought, starting from basic linear algebra facts. 2. For the overdetermined linear system 1 2 1 1 2 x = 1 1 2 2 (a) Is the matrix A rank-deficient or of full rank? What is the rank of A| A? (b) Find all least squares solutions. 94 7.5. HW 5: Due November 4, 2013 Applied Matrix Theory (c) Find the solution that Matlab returns, using A\b. Also find the least squares solution of minimum norm. Do they agree? (d) What criterion does Matlabs use to choose a solution? (use help mldivide to find out) 3. Textbook 4.7.2: Linear transformations For A ∈ Rn×n , determine which of the following functions are linear transformations. (a) (b) (c) (d) T(Xn×n ) = AX − XA, T(xn×1 ) = Ax + b for b 6= 0, T(A) = A| , T(Xn×n ) = (X + X| ) /2. 4. Textbook 4.7.6 For the operator T : R2 → R2 defined by T(x, y) = (x + y, −2x + 4y), determine [T]B , 1 1 where B is the basis B = , . 1 2 5. Textbook 4.7.11 Let P be the projector that maps each point v ∈ R2 to its orthogonal projection on the line y = x as depicted in Figure 4.7.4. Figure 7.1. Figure 4.7.4 (a) Determine the coordinate matrix of P with respect to the standard basis. α (b) Determine the orthogonal projection of v = onto the line y = x. β 6. Textbook 4.7.13 For P2 and P3 (the spaces of polynomials of degrees less than or equal to two ´and three, t respectively), let S : P2 → P3 be the linear transformation defined by S(p) = 0 p(x) dx. Determine [S]BB0 , where B = {1, t, t2 } and B 0 = {1, t, t2 , t3 }. 95 Nitsche and Benner Unit 7. Linear Transformations 7. Textbook 4.8.1: Change of basis Explain why rank is a similarity invariant. 8. Textbook 4.8.2 Explain why similarity is transitive in the sense that A ' B and B ' C implies A ' C. 9. Textbook 4.8.3 A(x, y, z) = (x + 2y − z, −y, x + 7z) is a linear operator on R3 . (a) Determine [A]S , where S is the standard basis. −1 (b) Determine [A] as the S 0 as well nonsingular matrix Q such that [A]S 0 = Q [A]S Q 1 1 1 for S 0 = 0 , 1 , 1 . 0 0 1 10. Textbook 4.8.11 (a) N is nilpotent of index k when Nk = 0 but Nk−1 6= 0. If N is a nilpotent operator of index n on Rn , and if Nn−1 (y) 6= 0, show B = {y, N(y), N2 (y), . . . , Nn−1 (y)} is a basis for Rn , and then demonstrate that? 0 0 0 .. . 0 0 ··· 1 0 0 1 [N]B = J = 0 .. . 0 0 1 .. . ··· ··· ··· .. . 0 0 0 .. . (b) If A and B are any two n × n nilpotent matrices of index n, explain why A ' B. (c) Explain why all n × n nilpotent matrices of index n must have a zero trace and be of rank n − 1. 11. Textbook 4.8.12 E is idempotent when E2 = E. For an idempotent operator E on Rn , let X = {xi }ri=1 and Y = {xi }n−r i=1 be bases for R(E) and N(E), respectively. (a) Prove that B = X ∪ Y is a basis for Rn . Hint: Show Exi = xi and use this to deduce that B is linearly independent. Ir 0 . (b) Show that [E]B = 0 0 (c) Explain why two n × n idempotent matrices of the same rank must be similar. (d) If F is an idempotent matrix, prove that rank(F) = tr(F). 96 7.5. HW 5: Due November 4, 2013 Applied Matrix Theory 12. Textbook 4.9.3: Invariant subspaces Let T be the linear operator on R4 defined by T(x1 , x2 , x3 , x4 ) = (x1 + x2 + 2x3 − x4 , x2 + x4 , 2x3 − x4 , x3 + x4 ), and let X = span {ê1 , ê2 } be the subspace that is spanned by the first two unit vectors in R4 . (a) Explain why X is invariant under T. (b) Determine T/X {ê1 ,ê2 } . (c) Describe the structure of [T]B , where B is any basis obtained from an extension of {ê1 , ê2 }. 13. Textbook 4.9.4 Let T and Q be the matrices −2 −1 −5 −2 −9 0 −8 −2 T= 2 3 11 5 3 −5 −13 −7 1 0 0 −1 1 1 3 −4 and Q = −2 0 1 0 3 −1 −4 3 (a) Explain why the columns of Q are a basis for R4 . (b) Verify that X = span {Q:1 , Q:2 } and Y = span {Q:3 , Q:4 } are each invariant subspaces under T. (c) Describe the structure of Q−1 TQ without doing any computation. (d) Now compute the product Q−1 TQ to determine T/Y {Q:3 ,Q:4 } . T/X {Q:1 ,Q:2 } and 14. Textbook 4.9.7 If A is an n × n matrix and λ is a scalar such that (A − λI) is singular (i.e., λ is an eigenvalue), explain why the associated space of eigenvectors N(A − λI) is an invariant subspace under A. 15. Textbook 4.9.8 Consider the matrix A = −9 4 . −24 11 (a) Determine the eigenvalues of A. (b) Identify all subspaces of R2 that are invariant under A. (c) Find a nonsingular matrix Q such that Q−1 AQ is a diagonal matrix. 97 Nitsche and Benner Unit 7. Linear Transformations 98 UNIT 8 Norms 8.1 Lecture 26: October 25, 2013 Homework 5 due Friday Difinition of norms Norm acts on a vector space V over R or C. Definition 8.1. A norm is a function k · k : V → R by : x → kxk such that 1. kxk ≥ 0 for any x ∈ V, and kxk = 0 if and only if x = 0 2. kαxk = |α|kxk 3. kx + yk ≤ kxk + kyk Vector Norms Some norms: • kxk2 = pPn • kxk1 = Pn i=1 i=1 x2i which is the 2-norm or the Euclidean norm |xi | P 1/p • kxkp = ( ni=1 xpi ) • kxk∞ = maxi |xi | = limp→∞ kxkp The two norm x A unit vector is kxk and the unit ball in R2 {x ∈ R2 : kxk = 1} We illustrate the unit balls for the three primary norms: kxk2 = 1 which gives a circle, kxk1 = 1 or |x1 | + |x2 | = 1 which gives a rhombus, kxk∞ = 1 or (x1 , x2 ) such that max(|x1 |, |x2 |) = 1 which gives a square. 99 Nitsche and Benner Unit 8. Norms Theorem 8.2. kxk∞ ≤ kxk2 ≤ kxk1 Proof. kxk∞ = max |xi |, i q = max x2i , i q = x2k , for some k, v u n uX ≤t x2i , (8.1.1a) (8.1.1b) (8.1.1c) (8.1.1d) i=1 = kxk2 ; qX |xi |2 , = r 2 X |xi | , ≤ = kxk1 . (8.1.1e) (8.1.1f) (8.1.1g) (8.1.1h) Our goal is now to prove the triangle inequality for the 2-norm. Note that kxk22 = x| x, where x| y is the standard inner product. P x2i = Theorem 8.3. The Cauchy–Schwarz inequality (or CBS): |x| y| ≤ kxkkyk Proof. Let α = x| y ; x| x note x| y = y| x. Also, x| y x (αx − y) = x x−y , x| x | |x y | = x | x − x y, x x x| y | | = | x x − x y, x x | | = x y − x y, = 0. | | (8.1.2a) (8.1.2b) (8.1.2c) (8.1.2d) (8.1.2e) Further, | 0 ≤ kαx − yk22 = (αx − y) (αx − y) , = αx (αx − y) − y (αx − y) , | | = −αy x + y y, x| y | | = − | y x + y y, x x |x| y| 2 =− 2 + kyk2 . kxk2 100 (8.1.3a) (8.1.3b) (8.1.3c) (8.1.3d) (8.1.3e) 8.1. Lecture 26: October 25, 2013 this gives kyk2 ≥ |x| y| kxk22 Applied Matrix Theory 2 and therefore kxk2 kyk2 ≥ |x| y| . Theorem 8.4. kx + yk2 ≤ kxk2 + kyk2 Proof. | kx + yk22 = (x + y) (x + y) , | | = x + y (x + y) , | | | = x x + 2x y + y y, | ≤ kxk2 + 2x y + kyk2 , ≤ kxk2 + 2kxk2 kyk2 + kyk2 , = (kxk2 + kyk2 )2 , q kx + yk2 ≤ (kxk2 + kyk2 )2 , ≤ kxk2 + kyk2 . (8.1.4a) (8.1.4b) (8.1.4c) (8.1.4d) (8.1.4e) (8.1.4f) (8.1.4g) (8.1.4h) Matrix Norms Definition 8.5. A matrix norm is a function k · k : Rn×m → R such that, 1. kAk ≥ 0 for any A ∈ Rn×m , and kAk = 0 if and only if A = 0 2. kαAk = |α|kAk 3. kA + Bk ≤ kAk + kBk The Frobenius Norm The Frobeius norm is defined kAkF = sX a2ij . (8.1.5) kAi,: k22 , (8.1.6a) kA:,j k22 , (8.1.6b) i,j or kAk2F = X i = X j = X | aj aj , (8.1.6c) j | = tr(A A). which gives us a convenient way of expressing this norm. 101 (8.1.6d) Nitsche and Benner Unit 8. Norms Induced Norms Given a vector norm on Rn we may define (where sup is the smallest upper bound ) kAk = sup x∈Rn kAxk = sup kAxk. kxk kxk=1 (8.1.7) we may also replace the smallest upper bound (sup) with the maximum (max). We can now take kAk2 , kAk1 , and kAk∞ 8.2 Lecture 27: October 28, 2013 Matrix norms (review) Definition 8.6. A norm on V 1. kAk ≥ 0 for any A ∈ Rn×m , and kAk = 0 if and only if A = 0 2. kαAk = |α|kAk 3. kA + Bk ≤ kAk + kBk Frobenius Norm The Frobenius norm is defined kAk2F = X |aij |2 , (8.2.1a) kAi,: k22 , (8.2.1b) kA:,j k22 , (8.2.1c) i = X i = X j | = tr(A A), = tr(A? A) for A ∈ Cn×m . In in the real set A? = A| . Properties of the Frobenius norm: 1. kAxk2 = kxk2 kAkF 2. kABkF = kAkF kBkF 102 (8.2.1d) (8.2.1e) 8.2. Lecture 27: October 28, 2013 Applied Matrix Theory Proof. Property (1): kAxk2 = X (Ax)2i , (8.2.2a) i = X (Ai,: x)2 , (8.2.2b) kAi,: k22 kxk22 , (8.2.2c) i ≤ X i = kxk22 X kAi,: k22 . (8.2.2d) i | {z kAk2F } Property (2): kABk2F = X k(AB)j,: k22 , (8.2.3a) kABj,: k22 , (8.2.3b) kAk2F kBj,: k22 , (8.2.3c) X (8.2.3d) j = X j ≤ X j = kAk2F kBj,: k22 . j | {z kBk2F } Example 8.7. 1 2 A= 0 2 1 AA= 2 1 = 2 | 0 2 2 8 1 2 , 0 2 (8.2.4) (8.2.5a) (8.2.5b) So p tr(A| A) , √ = 9, = 3. kAk2 = which may be called by norm(A, ’fro’) in Matlab. 103 (8.2.6a) (8.2.6b) (8.2.6c) Nitsche and Benner Unit 8. Norms Induced Matrix Norms Definition 8.8. For A ∈ Rn×m the induced norm of the matrix is kAxk , x∈R kxk = max kAxk kAk = maxn kxk=1 (8.2.7a) (8.2.7b) Example 8.9. 1 2 A= 0 2 kAk1 = max kAxk1 , kxk=1 X = max |(Ax)|. kxk=1 (8.2.8) (8.2.9a) (8.2.9b) This provides a remap of the vector x. For example we may find the image of the points of the corners of the unit rhombus for the x vector. Which can provide a way to find the 1-norm, but this is not the best physically. Returning to the ∞-norm, kAk∞ = max kAxk∞ , kxk∞ =1 = max max |(Ax)i |. kxk∞ =1 i (8.2.10a) (8.2.10b) Here we can remap the corners of the unit square to a stretched parallelogram. What is the maximum ∞-norm? From the figure, we can see it is 3. Now we are interested in the mapping of the 2-norm, which is the unit circle. kAk2 = max kAxk2 , kxk2 =1 ≈ 2.92. (8.2.11a) (8.2.11b) ASIDE: Say we have, (Ax)21 + (Ax)22 = (a11 x1 + a12 x2 )2 + (a21 x1 + a22 x2 )2 , (8.2.12a) = a211 x21 + x1 x2 (a11 a12 + a21 a22 ) + a222 x22 , (8.2.12b) = constant. (8.2.12c) which would give an ellipse. P Theorem 8.10. kAk = max j 1 i |aij | which gives the maximum column-sum, and kAk1 = P maxi j |aij | which is the maximum row-sum. 104 8.2. Lecture 27: October 28, 2013 Applied Matrix Theory Properties The induced norms of a matrix have similar properties to the Frobenius norm: 1. kAxk ≤ kAkkxk since kAxk kxk ≤ kAk 2. kABk = kAkkBk (Will be shown in the homework) Example 8.11. The induced norm of the identity matrix is 1; kIk = 1. Proof. kAk1 = max X kxk1 =1 |(Ax)|, (8.2.13a) i {z | kAxk1 } X X = max a x ij j , kxk1 =1 i j XX ≤ max |aij ||xj |, kxk1 =1 i = max (8.2.13c) j X kxk1 =1 ≤ max (8.2.13b) |xj | X i X kxk1 =1 |aij |, (8.2.13d) j X |xj | i |aij | , (8.2.13e) j | {z } independent of j = max max j kxk1 =1 X |aij | i X |xj |, (8.2.13f) j | {z } kxk1 =1 = max j X |aij |. (8.2.13g) i Now find an x such that the upper bound is attained. So let k = Now let x = êk , then kAxk1 = kAêk k, = kA:k k1 , X = |aij |, P i |aik | = maxj P i |aij |. (8.2.14a) (8.2.14b) (8.2.14c) i = max |aij |, (8.2.14d) = upper bound. (8.2.14e) j Further kAk22 = max kAxk22 such that kxk22 = 1. Then, kAk22 = max (x| A| Ax) such that x| x = 1. This arrizes Lagrange multipliers, or ∇f = λ∇g. 105 Nitsche and Benner 8.3 Unit 8. Norms Lecture 28: October 30, 2013 The 2-norm Given the 2-norm kAk2 = maxkxk2 =1 kAxk2 we have, f (x) = kAk22 = max (x| A| Ax) such that g(x) = x| x = 1 where f (x) : Rn → R. This needs Lagrange multipliers, or ∇f = λ∇g. For a minimization problem. ∂UV ∂U ∂V = V+U (8.3.1) ∂xj ∂xj ∂xj Lemma 8.12. If B is symmetric, ∇ (x| Bx) = 2Bx Note: ∇ (x| x) = 2x Proof. To prove this lemma, ∂ ∂x ∂ | | | , x Bx = x Bx + x B ∂xj ∂xj ∂xj | | = êj Bx + x Bêj , | | | = êj Bx + x Bêj , = | êj = =B| | 2êj Bx, B x+ |{z} | | êj B x, (8.3.2a) (8.3.2b) (8.3.2c) (8.3.2d) (8.3.2e) = 2 (Bx)j . (8.3.2f) Proof. Alternatively, we may consider, ∂ X xi (Bx)i ∂xj i ! ! ∂ X X = xi Bik xk , ∂xj i k ! X X ∂ = xi Bik xk , ∂xj i k X X = Bjk xk + xi Bij , k i = X Bjk xk + X k k = X Bjk xk + X k = 2 (Bx)j . (8.3.3a) (8.3.3b) (8.3.3c) Bkj xk , (8.3.3d) Bjk xk , (8.3.3e) k (8.3.3f) 106 8.3. Lecture 28: October 30, 2013 Applied Matrix Theory So, | 2A Ax = 2x, | A Ax = λx, (8.3.4a) (8.3.4b) and the solution (λ, x) is an eigenpair of A| A. Note, for these x, f (x) = x| A| Ax = x| λx = λx| x = λ. Thus, max(f ) = λmax = max λk (8.3.5) k | | and λk = eigenvalue of A A. Note further that A A is symmetric so the eigenvalues are real and therefore f (x) ≥ 0 and λk ≥ 0. Example 8.13. Given and 1 2 A= 0 2 (8.3.6) 1 2 AA= . 2 8 (8.3.7) | Then, 1 − λ 2 , det A A − λI = 2 8 − λ | = (1 − λ) (8 − λ) − 4, = λ2 − 9λ + 4. So, (8.3.8a) (8.3.8b) (8.3.8c) √ 81 − 16 , (8.3.9a) λ1,2 = √2 9 ± 65 = , (8.3.9b) 2 and √ 9 + 65 λmax = . (8.3.10) 2 Therefore: √ 9 + 65 kAk2 = ≈ 2.9208 . . . (8.3.11) 2 Now, kxk∞ ≤ kxk2 ≤ kxk1 . This inequality does not hold for matrices. Some properties, (where U| U = I and V| V = I) 9± • kAk2 = kA| k2 • kA| Ak2 = kAk22 A 0 • 0 B = max (kAk2 , kBk2 ) 2 • kU| AUk2 = kAk2 • kA−1 k2 = √ 1 λmin (A| A) 107 Nitsche and Benner Unit 8. Norms 108 UNIT 9 Orthogonalization with Projection and Rotation 9.1 Lecture 28 (cont.) Inner Product Spaces An inner product space V plus the the inner product. Definition 9.1. Given a vector space V, an inner product is a function f : V × V → R or C by f (x, y) = hx, yi such that • hx, yi = hy, xi • hx, αyi = α hx, yi, note hx, αyi = hy, xαi = α hy, xi = αhy, xi = α hx, yi • hx + z, yi = hx, yi + hz, yi • hx, xi ≥ 0 for any x ∈ V • hx, xi = 0 implies x = 0 1. hx, yi = x| y with V = Rn and hx, yi = x∗ y with V = Cn , where Example 9.2. x∗ = x| . 2. hx, yiA = x| A| Ay with V = Rn and hx, yiA = x∗ A∗ Ay with V = Cn √ This gives us a new norm kxkA = x| A| Ax = kAxk2 . q´ q´ ´b b b 0 3. hf, gi = a f (x)g(x) dx, V = C [a, b] and kf k = f (x)f (x) dx = |f (x)|2 dx a a 4. hf, gi = ´b ω(x)f (x)g(x) dx where ω(x) ≥ 0 p 5. hA, Bi = tr(A| B) and kAk = tr(A| A) = kAkF a 109 Nitsche and Benner 9.2 Unit 9. Orthogonalization with Projection and Rotation Lecture 29: November 1, 2013 Inner Product Spaces Reviewing properties of inner product spaces, • hx, yi = hy, xi • hx, αyi = α hx, yi • hx + z, yi = hx, yi + hz, yi • hx, xi ≥ 0 for any x ∈ V • hx, xi = 0 implies x = 0 p Now we may define norms kxk = hx, xi . Let’s say we want to define angles between vectors and ky − xk2 = kxk2 + kyk2 − 2kxkkyk cos(θ). Rearranged, cos(θ) = = = = −ky − xk2 + kxk2 + kyk2 , 2kxkkyk hx, xi + hy, yi − hy − x, y − xi , 2kxkkyk hy, xi + hx, yi , 2kxkkyk hx, yi , kxkkyk (9.2.1a) (9.2.1b) (9.2.1c) (9.2.1d) only if hx, yi ∈ R. For a more general definition hy, xi + hx, yi = hy, xi + hy, xi = 2 Re(hy, xi). So we would have the problem of the conjugate in finding the angle, but have reduced this issue. Definition 9.3. The angle between x, y is given by cos(θ) = hx, yi . kxkkyk (9.2.2) So, for x ⊥ y means hx, yi = 0. Note: If the inner product is not a real number, then hx, yi = 0 means kxk2 + kyk2 = ky − xk2 , but not vice-versa. Example 9.4. 1 −2 x= 3 −1 4 1 and y = −2 . −4 110 9.2. Lecture 29: November 1, 2013 Applied Matrix Theory So x ⊥ y in hx, yi = x| y, but x 6 ⊥y in hx, yiA = x| A| Ay where, 1 0 A= 0 0 2 1 0 0 0 0 1 0 0 0 . 0 1 Definition 9.5. A set {u1 , . . . , un } is orthonormal if kuk k = 1 for any k and huj , uk i = 0 for any j 6= k. Fourier Expansion Given an orthonormal basis for V we can write x ∈ V as x = c1 u1 + c2 u2 + · · · cn un (9.2.3) with hx, uj i = cj huj , uj i = cj . Example 9.6. Given a series ´π product −π f (x)g(x) dx. n √1 π on sin(kx) is orthonormal with respect to the inner k−1 ´ ´ 1−cos(2kx) How do we compute theP following integrals? sin(kx) dx = dx So if f ∈ 2 ´π n 1 1 span {sin(kx)} then f = √π k=1 ck sin(kx). Thus, ck = √π −π f (x) sin(kx) dx. In homework we will approximate a line on [−π, π] with the sine and cosine Fourier series. This is essentially the 2-norm approximation of the span of the Fourier series. The Gibbs phenomena will be observed with overshoot of the sines and cosines above the function. Thus, orthonormal bases are useful for partial differential equations applications. Orthogonalization Process (Gramm-Schmidt) Goal: Given basis {a1 , . . . , an } find an orthonormal basis {u1 , . . . , un } for V. This is the orthogonalization process. Method: find uk such that span {u1 , . . . , un } = span {a1 , . . . , an } for k = 1, . . . , n. Now let’s show the process. k = 1: u1 = a1 ka1 k k = 2: u2 = a2 − hu1 , a2 i u1 ka2 − hu1 , a2 i u1 k 111 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation As an example of the orthogonality of u1 and u2 a2 − hu1 , a2 i u1 hu1 , a2 − hu1 , a2 i u1 i = , ka2 − hu1 , a2 i u1 k `2 1 = hu1 , a2 − hu1 , a2 i u1 i , `2 1 = [hu1 , a2 i − u1 hu1 , a2 i u1 ] , `2 1 = hu1 , a2 i − hu1 , a2 i hu1 , u1 i , | {z } `2 (9.2.4a) (9.2.4b) (9.2.4c) (9.2.4d) 1 = 0. (9.2.4e) k = 3: ... k = k: uk = ak − hu1 , ak i u1 − hu2 , ak i u2 − · · · − huk−1 , ak i uk−1 . kak − hu1 , ak i u1 − hu2 , ak i u2 − · · · − huk−1 , ak i uk−1 k This is the Gramm–Schmidt orthogonalization process. If we want, we can write it as, uk = Here (9.2.5) Uk−1 9.3 (I − Uk−1 U∗k ) ak . k(I − Uk−1 U∗k ) ak k | | = u1 · · · uk−1 . | | Lecture 30: November 4, 2013 Gramm–Schmidt Orthogonalization Given basis {a1 , . . . , an } find an orthonormal basis {u1 , . . . , un } for that spans the same space. Algorithm, a1 u1 = , (9.3.1a) ka1 k a2 − (u1 a2 ) u1 u2 = , (9.3.1b) `2 (9.3.1c) with using projections, | hu1 , a2 i u1 = u1 a2 u1 , | = u1 u1 a2 . | {z } P11 112 (9.3.2a) (9.3.2b) 9.3. Lecture 30: November 4, 2013 Applied Matrix Theory From, a2 − (u1 a2 ) u1 , ka2 − (u1 a2 ) u1 k (I − u1 u|1 ) a2 , = k(I − u1 u|1 ) a2 k = P⊥ a2 . u2 = (9.3.3a) (9.3.3b) (9.3.3c) Example 9.7. Given the vectors, 0 a1 = 3 , 4 −20 a2 = 27 , 11 −14 and a3 = −4 −2 Then we can find the orthogonal vectors, 0 1 3 . u1 = 5 4 (9.3.4) Then, v2 = a2 − hu1 , a1 i u1 , −20 −20 0 1 1 27 − 0 3 4 27 3 , = 5 5 11 11 4 −20 0 125 27 − 3 = 25 11 4 (9.3.5a) (9.3.5b) ··· (9.3.5c) *** and 0 1 u1 = 3 , 5 4 −20 1 −12 , u2 = 25 −9 −15 1 −16 . and u3 = 25 12 113 (9.3.6a) (9.3.6b) (9.3.6c) Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation Now rewriting our system, a1 , `1 a2 − r12 u1 u2 = , `2 a3 − r13 u1 − r23 u2 u3 = , `3 ··· an − r1n u1 − r2n u2 − · · · − rn−1,n un−1 . un = `n (9.3.7a) u1 = (9.3.7b) (9.3.7c) (9.3.7d) (9.3.7e) where rij = hai , uj i. Now in different vector form, a1 = `1 u1 , a2 = r12 u1 + `2 u2 , a3 = r13 u1 + r23 u2 + `3 u3 , ··· an = r1n u1 + r2n u2 + · · · + rn−1,n un−1 + `n un . (9.3.8a) (9.3.8b) (9.3.8c) (9.3.8d) (9.3.8e) We can put this in a matrix form. If A is full rank (must have m ≥ n). Since can have at most m linearly independent vectors ai . With A = QR, | | | | | a1 a2 · · · an = u1 u2 | | | | | m×n `1 r12 r13 0 `2 r23 | · · · un 0 0 `3 .. . . | m×n . . 0 0 0 ··· ... .. . ··· r1n r2n r3n . (9.3.9) .. . `n n×n where rii = `i 6= 0 > 0 and R is invertible. This uniquely determines the Fourier coefficients of the Fourier expansion of this system. Thus, every matrix A of full rank has a unique decomposition, known as a QR factorization, Am×n = Qm×n Rn×n , where R is invertible. What do we know about Q| Q? (Q| Q)ij = u|i uj which is zero for i 6= j and one for i = j. So Q| Q = In×n . These are orthogonal matrices. Decompositions of A: • Am×n = Qm×n Rn×n , where Q| Q = I and R is invertible. • A = LU if |Ak | = 6 0. • PA = LU always exists. Now what about QQ| ? It will be an m × m matrix, but otherwise we know little about it. 114 9.4. Lecture 31: November 6, 2013 Applied Matrix Theory Example 9.8. Returning to our example, 0 −20 −14 0 −20/25 −15/25 5 25 r13 3 27 −4 = 3/5 12/25 −16/25 0 `2 r23 4 11 −2 4/5 −9/25 12/25 0 0 `3 (9.3.10) In this case Q has three linearly independent columns and three linearly independent rows. | So Q| has linearly independent columns. And interestingly (Q| ) Q| = QQ| = I. This is an orthogonal matrix: it is both invertible and has orthogonal columns. In general this is not the case because it is not n × n and QQ| is not necessarily the identity if m > n. Use A = QR: Example 9.9. Assume An×n invertible; solve Ax = b. Rewrite QRx = b, | Q QRx = Q b, | Rx = Q b. | (9.3.11a) (9.3.11b) (9.3.11c) This system is quick to solve (once Q and R are known). Example 9.10. Assume Am×n full rank m > n then Ax = b is an overdetermined system and least squares solution satisfies, | | A Ax = A b, | | | | R Q QRx = R Q b, | | | R Rx = R Q b, | −1 | | Rx = R R Q b, | Rx = Q b. (9.3.12a) (9.3.12b) (9.3.12c) (9.3.12d) (9.3.12e) Go through this proof and the solutions manual. Then we will see how well SVD can improve things later. 9.4 Lecture 31: November 6, 2013 In homework the reduced QR factorization reffered to is where we can always write Am×n = Qm×n Rn×n where Q| Q = I and Rn×n is triangular. This factorization is unique, but we may also x x ··· x | | .. . x 0 x QR = q1 · · · qn . . (9.4.1) .. . . . . . ... | | 0 0 ··· x 115 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation since {q1 , . . . , qn } is an orthogonal basis for R(A) ⊂ Rm . Now, | | | QR = q1 · · · qn qn+1 | | | x x 0 x . . ... | . · · · qm 0 0 | m×m 0 0 .. . 0 0 ··· ··· .. . x x .. . · · · x · · · 0 ··· 0 (9.4.2) m×n In this case, the reduced QR is not unique. Unitary (orthogonal) matrices The unitary refers to the complex case and the orthogonal refers to the real. Definition 9.11. A unitary matrix is Q ∈ Cn×n such that Q∗ Q = I. This means we have Q has n orthogonal columns. Additionally, since Q is square we have n orthogonal rows. So, Q ∈ Rn×n , with Q| Q = QQ| = In×n . Properties Some properties for a unitary Q: • Q∗ Q = QQ∗ = In×n • Q−1 = Q∗ • columns are orthonormal • rows are orthonormal • (Qx)∗ Qy = x∗ Q∗ Qy = x∗ y for any x, y. Note: kQxk = kxk, so Q is an isometry. Also, If u, v unitary, then uv is unitary since, (uv)∗ (uv) = v∗ u∗ uv, = v∗ v, = I, = (uv) (uv)∗ . (9.4.3a) (9.4.3b) (9.4.3c) (9.4.3d) Example 9.12. Q in full QR factorization of any A. In Matlab, [q, r] = qr(a) (is this full QR?) and [q, r] = qr(a,0) (is this reduced QR?). Now to compute the QR factorization, the Gramm–Schmidt algorithm is not numerically stable. Thus, small changes in the input matrix values can cause large changes in the result. The alternative is the modified Gramm–Schmidt which improves the stability properties. We 116 9.4. Lecture 31: November 6, 2013 Applied Matrix Theory will not cover this here, but it is discussed in future courses. A better algorithm is to obtain the QR by premuliplying by orthogonal matrices until it is triangular, or Qn · · · Q1 A = R, | {z } (9.4.4) Q∗ then A = QR. This is better because it does not use projections, which are not orthogonal. Rotations of orthogonal matrices as well as reflections are useful to introduce zeros. As an example, Rotation cos(θ) − sin(θ) . sin(θ) cos(θ) Example 9.13. Rotation in the xy plane about the origin. So the matrix P = cos(θ) sin(θ) −1 Now, P = P−θ = = P| . This again shows that it is orthogonal, par− sin(θ) cos(θ) ticularly the columns are orthogonal. These are rotations in the plane. Example 9.14. 3D Rotation Rotation in three dimensions about the z-axis: This is very similar, cos(θ) − sin(θ) 0 P = sin(θ) cos(θ) 0 . 0 0 1 (9.4.5) this rotates in the xy plane. We can further rotate in any plane ij for some vector in Rn ; i j 1 i P= j − sin(θ) cos(θ) , 1 sin(θ) cos(θ) (9.4.6) 1 x1 .. . xi−1 cos(θ)xi − sin(θ)xj xi+1 . . .. Px = x j−1 sin(θ)x − cos(θ)x i j xj+1 . . . xn 117 (9.4.7) Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation This is called a Givens rotation. We can choose our θ x x x x x x x x x x Pθ = x x x x = x x x x x x x x x x 0 such that (Qi x)j = 0. So if we have x x x x x x x x x (9.4.8) . x x x x x x So the QR factorization by Givens rotations, Pθn · · · Pθ2 Pθ1 A = R. {z } | (9.4.9) Q∗ Note, projections are not orthogonal. We can check this with PP∗ = I. However P(u1 ) = 0 and this means we have a non-trivial null-space so projections is not invertible. Therefore, this is not invertible. Reflection Example 9.15. Suppose we have the vectors u and x, where kuk = 1. We want to reflect x across the plane orthogonal to u. We will consider this operation Rx This operation is also orthogonal. Now we will generalize a vector u⊥ = {v : v| u = 0}. So the orthogonal projection onto u⊥ ; first we know hu, xi, Px = x − hu, xi u, = (I − uu∗ ) x, Rx = (I − 2uu∗ ) x. (9.4.10a) (9.4.10b) (9.4.10c) where P is the projection onto the subspace and R is the reflection across the subspace. Now R∗ = (I − 2uu∗ ) = R and R2 = I. This implies that R−1 = R∗ and R is orthogonal. 9.5 Homework Assignment 6: Due Monday, November 11, 2013 1 2 1. Let A = . Find kAkp for p = 1, 2, ∞, F. 3 4 P 2. Show that kAk∞ = max j |aij | (Hint: make sure you understand how the analogous i formula for kAk1 was derived in class.) defines a matrix 3. (a) Given a vector norm kxk, prove that the formula kAk = sup kAxk kxk x6=0 norm. (This is called the induced matrix norm.) (b) Show that for any induced matrix norm, kAxk ≤ kAkkxk. (c) Prove that any induced matrix norm also satisfies kABk ≤ kAkkBk. 118 9.5. HW 6: Due November 11, 2013 Applied Matrix Theory 4. Consider the formula kAk = max |aij | i,j (a) Show that it defines a matrix norm. (b) Show that it is not induced by a vector norm. 5. Meyer, Exercise 5.2.6 Establish the following properties of the matrix 2-norm. (a) kAk2 = (b) (c) max |y∗ Ax|, kxk2 =1, kyk2 =1 kAk2 = kA∗ k2 , kA∗ Ak = kAk22 , A 0 (d) 0 B = max {kAk2 , kBk2 } (take A, B to be real), 2 ∗ (e) kU AVk2 = kAk2 when UU∗ = I and V∗ V = I. q 1 −1 where λmin is the smallest eigenvalue of A| A. 6. Show that kA k = λmin 7. Show that hA, Bi = tr(A∗ B) defines an inner product. 8. Meyer, Exercise 5.3.4 For a real inner-product space with k ? k2 = h?, ?i, derive the inequality kxk2 + kyk2 hx, yi ≤ . 2 Hint: Consider x − y. 9. Meyer, Exercise 5.3.5 For n × n matrices A and B, explain why each of the following inequalities is valid. (a) |tr(B)|2 ≤ n[tr(B∗ B)]. (b) tr(B2 ) ≤ tr(B| B) for real matrices. (c) tr(A| B) ≤ 10. Given tr(A| A)+tr(B| B) 2 for real matrices. 1 1 A= 1 0 (a) (b) (c) (d) (e) 0 −1 2 1 , 1 −3 1 1 1 1 and b = 1. 1 Find an orthonormal basis for R(A), using the standard inner product. Find the (reduced) QR decomposition of A. For the matrix Q in (b), compute Q| Q and QQ| . Find the least squares solution of Ax = b, using your results above. Determine the Fourier expansion of b with respect to the basis you found in (a). 11. Explain why the (reduced) QR factorization of a matrix A of full rank is unique. 119 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation 12. Meyer, Exercise 5.5.11 Let V be the inner-product space of real-valued continuous functions defined on the interval [−1, 1], where the inner product is defined by ˆ 1 hf, gi = f (x)g(x) dx , −1 and let S be the subspace of V that is spanned by the three linearly independent polynomials q0 = 1, q1 = x, q2 = x2 . (a) Use the Gram–Schmidt process to determine an orthonormal set of polynomials {p0 , p1 , p2 } that spans S. These polynomials are the first three normalized Legendre polynomials. (b) Verify that pn satisfies Legendres differential equation (1 − x2 )y 00 − 2xy 0 + n(n + 1)y = 0 for n = 0, 1, 2. This equation and its solutions are of considerable importance in applied mathematics. 9.6 Lecture 32: November 8, 2013 From last time: Elementary orthogonal projectors Let u, where kuk = 1, then the projection of a vector x onto the sub plane orthogonal to u is P⊥ x = x − hu, xi u. And P|| = uu∗ and P = I − uu∗ . Now this projector, P, is not orthogonal. This is because an orthogonal matrix has the form Q∗ = Q−1 or Q∗ Q = QQ∗ = I. Now, P∗⊥ = I − (u∗ )∗ u∗ , = P⊥ . (9.6.1a) (9.6.1b) P∗ P = P2 , = P, 6= I. (9.6.2a) (9.6.2b) (9.6.2c) This further gives This property shows that once we project, projection a second time does not change the result. Also, N(P) 6= 0, so the projectors are not invertible. Now the null space of P|| is equal to u⊥ , or N(P|| ) = u⊥ . Similarly N(P⊥ ) = span(u). 120 9.6. Lecture 32: November 8, 2013 Applied Matrix Theory Elementary reflection Now Rx = x − hu, xi u, and in this case R is orthogonal. So R∗ = R and R∗ R = RR∗ = I. Also, (I − 2uu∗ ) (I − 2uu∗ ) = I − 2uu∗ − 2uu∗ + 4u (u∗ u) u∗ , = I − 4uu∗ + 4uu∗ , = I. Now use reflectors to compute A = QR. So say we have x x x x x x Ru = x x x . x x x (9.6.3a) (9.6.3b) (9.6.3c) (9.6.4) So Rx = (kuk, 0) = kukêi . Thus, u = x − kxkêi . Doing successive reflections, Ru · · · Ru Ru A = R. | N {z 2 }1 (9.6.5) Q This gives us the Householder method . Complimentary Subspaces of V Definition 9.16. If V = X + Y, where X , Y are subspaces such that X ∩ Y = {0}, which are called complimentary subspaces and V = X ⊕ Y is the direct sum of X , Y. Given the general picture, how do we define the angle between two subspaces? Note: If V = X ⊕ Y then any z ∈ V can be written uniquely as z = x + y, for x ∈ X and y ∈ Y. Further dim(V) = dim(X ) + dim(Y) and BV = BX ∪ BY . Proof. If z = x1 + y1 = x2 + y2 then x1 − x2 = y1 − y2 ∈ X ∩ Y. So x1 − x2 = y1 − y2 = 0 and X ∩ Y = {0}. Example 9.17. Say we have Rn = R(A) ⊕ N(A| ) for Am×n . Projectors Definition 9.18. We define general projectors: The projector P onto X along Y is the linear operator such that P(z) = P(x + y) = x. Note: If P projects onto X along Y then P2 = P because P2 (x+y) = P(x) = P(x+0) = x = P(z). Now the null space, N(P) = y because P(z) = P(x + y) = x = 0. Further, R(P) = x. Also, R(P) ⊕ N(P) = Rn as we showed in Homework 5. Ultimately, we want to find the Jordan canonical form of our matrices. In general R(A)+ N(A) 6= Rn . This is obvious if Am×n because they have different dimensions, so this only makes sense if An×n . But even if A is square, let y ∈ N(A) ∩ R(A) then Ay = 0 and y = Az for some z. Then A (Az) = A2 z = 0, and we have a non-trivial intersection. So if A2 has a nontrivial null space, then N(A) and R(A) have nontrivial intersection. 121 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation Example 9.19. Obviously this cannot be an invertible matrix, so say we have A= 0 1 0 0 0 0 and A = 0 0 2 This is an example of a null-potent matrix. But this is only true for projectors. Theorem 9.20. P is a projector if and only if P2 = P. These are also known as idempotent matrices. 9.7 Lecture 33: November 11, 2013 From last time: Definition 9.21. P : V → V is a projector if for each X , Y such that V = X ⊕ Y and V(x + y) for any z = x + y ∈ V. Note: R(V) = X and N(V) = Y. Projectors Theorem 9.22. P is a projector if and only if P2 = P. These are also known as idempotent matrices. Proof. Given the vector space V and the operator P = P2 , R(P) ⊕ N(P) = V, | {z } | {z } X (9.7.1a) Y P(x + y) = Px + Py, = P Px0 , |{z} (9.7.1b) (9.7.1c) x, some x0 = Px0 , = x. (9.7.1d) (9.7.1e) Going the other way, z = |{z} Pz + (z − Pz), | {z } (9.7.2a) V = R(P) ⊕ N(P). (9.7.2b) ∈R(P) ∈N(P) 122 9.7. Lecture 33: November 11, 2013 Applied Matrix Theory Representation of a projector We discuss the representation of P. Given {m1 , . . . , mr } as a basis for R(P) = X and {n1 , . . . , nn−r } as a basis for N(P) = Y. Then Pmi = mi and Pni = 0. Let B = [M | N]. Then PB = P[M | N], = [M | 0]. (9.7.3a) (9.7.3b) [P]s = P, = [M | 0]B−1 , Ir×r 0 = [M | N] B−1 , 0 0 Ir×r 0 =B B−1 , 0 0 (9.7.4a) (9.7.4b) (9.7.4c) (9.7.4d) = [I]BS [P]B [I]−1 BS . (9.7.4e) Definition 9.23. For any subspace M ⊂ V, M⊥ = v ∈ V such that v⊥ u = 0, u ∈ M . Theorem 9.24. For any subspace M ⊂ V, V = M ⊕ M⊥ Proof. Given basis {b1 , . . . , bm } of M, choose {bi } orthonormal complement by orthogonal set {bm+1 , . . . , bn } such that {b1 , . . . , bm , bm+1 , . . . , bn } is a basis for V. | {z } | {z } basis for M basis for M⊥ Example 9.25. Rn = R(A) ⊕ N(A| ) where R(A) ⊥ N(A| ). An orthogonal projector onto M is PM is I 0 PM = [M | N] [M | N]−1 , (9.7.5a) 0 0 M∗ M = 0, N∗ N = 0, (9.7.5b) (9.7.5c) (9.7.5d) Where | | M = m1 · · · mm | | n×m Note: | and N = nm+1 | | · · · nn . | n×(n−m) I 0 (M∗ M)−1 M∗ M N = 0 I (N∗ N)−1 N∗ {z } | (9.7.6) [M | N]−1 123 (9.7.7) Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation and PM ∗ −1 ∗ (M M) M = [M | 0] , (N∗ N)−1 N∗ (9.7.8a) = M (M∗ M)−1 M∗ . (9.7.8b) But if the basis were orthonormal, how does this change the formula? Given any basis {m1 , . . . , mm } for subspace M, orthogonal projector. PM = M (M∗ M)−1 M∗ . (9.7.9) If {m1 , . . . , mm } are orthogonal then M∗ M = I and PM = MM∗ . (9.7.10) Example 9.26. Elementary orthogonal projectors, P|| = uu∗ . (9.7.11) P⊥ = I − uu∗ (9.7.12) kx − PM xk22 = min kx − yk22 . (9.7.13) and Theorem 9.27. y∈M (we will prove this as an exercise) −1 Note: A (A| A) A| is the projector onto the range of A, or PR(A) where we assume that A has full rank. The normal equations to solve Ax = b is | | A Ax = A b. and (9.7.14) | −1 | A b. x= AA {z } | (9.7.15) pseudoinverse So, | Ax = A A A = PR(A) b. 9.8 −1 | A b, (9.7.16a) (9.7.16b) Lecture 34: November 13, 2013 Projectors We discussed a projector P onto X along Y and also that the projector is idempotent, P2 = P. Further, R(P) = X and N(P) = Y. Ir×r 0 [P]S = [M | N] [M | N]−1 . (9.8.1) 0 0 124 9.8. Lecture 34: November 13, 2013 Applied Matrix Theory The orthogonal projector onto M = R(M), where M = [m1 · · · mm ] is the basis of M, P = M (M∗ M)−1 M∗ (9.8.2) The normal equations for Ax = b, with A being a full rank matrix, are Ax = PR(A) b. (9.8.3) Projector P is orthogonal, then P∗ = P. Proof. P is an orthogonal projector, P = M (M∗ M)−1 M∗ , ∗ ∗ −1 P = M (M M) = P. (9.8.4a) ∗ M, (9.8.4b) (9.8.4c) further suppose that P = P2 and P = P∗ . Now we want to show that N(P) ⊥ R(P), where it is normal in the standard inner product. Let x ∈ R(P) and y ∈ N(P). Then consider the inner product, y∗ x = y∗ Px, = (|{z} P∗ y)∗ x, (9.8.5a) (9.8.5b) P = (Py)∗ x, | {z } (9.8.5c) = 0∗ x, = 0. (9.8.5d) (9.8.5e) 0∗ if {mi } are orthogonal, PM = MM∗ . Example 9.28. P|| = uu∗ , P⊥ = I − uu∗ . (9.8.6a) (9.8.6b) V = X ⊕ Y. Decompositions of Rn Given An×n , we know R(A) ⊕ N(A| ) = Rn and R(A| ) ⊕ N(A) = Rn , but R(A)⊥ = N(A| ). Let B = { u1 , . . . , ur , ur+1 , . . . , un } orthonormal. Further B = { v1 , . . . , vr , vr+1 , . . . , vn } | {z } | | {z } | {z } {z } basis for R(A| ) basis for R(A) basis for N(A| ) 125 basis for N(A) Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation also orthonormal. So, | | U AV = UR(A) UN(A| ) A VR(A| ) VN(A) , | UR(A) AVR(A| ) AVN(A) , = | UN(A| ) | UR(A) AVR(A| ) 0 , = U|N(A| ) AVN(A) 0 Cr×r 0 = . 0 0 UN(A| ) A | | = A UN(A| ) , = 0. (9.8.7a) (9.8.7b) (9.8.7c) (9.8.7d) (9.8.8a) (9.8.8b) Range Nullspace decomposition of An×n Theorem 9.29. Rn = R(Ak ) ⊕ N(Ak ) for some k. This is not necessarily an orthogonal decomposition. The smallest such k is called the index of A. Proof. First, note that R(Ak+1 ) ⊂ R(Ak ) for any k. This is because if y ∈ R(Ak+1 ), then y = Ak+1 z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2 ) ⊂ R(A3 ) ⊂ · · · ⊂ R(Ak ) = R(Ak+1 ) = R(Ak+2 ) = · · · contains equality for some k. to be continued. . . 9.9 Homework Assignment 7: Due Friday, November 22, 2013 You may use Matlab to compute matrix products, or to reduce a matrix to Row Echelon Form. 1. (a) Let A ∈ Rm×n . Prove R(A) 1 (b) Verify this fact for A = 2 1 and N(A| ) are orthogonal complements of Rm . 2 0 4 1 . 2 0 2. Prove: If X , Y are subspaces of V such that V = X ⊕ Y, then for any x ∈ V there exists a unique x ∈ X and y ∈ Y such that z = x + y. 3. Prove: If X , Y are subspaces of V such that V = X +Y and dim(X )+dim(Y) = dim(V) then X ∩ Y = {0}. 4. Textbook 5.11.3: 1 2 2 4 Find a basis for the orthogonal complement of M = span , . 0 1 3 6 126 9.9. HW 7: Due November 22, 2013 Applied Matrix Theory 5. Let P be a projector. Let P0 = I − P. (a) Show that P0 = I − P is also a projector. It is called the complementary projector of P. (b) Any projector projects a point z ∈ V onto X along Y, where X ⊕ Y = V, by P(z) = P(x + y) = x. What are the X and Y for P and I − P, respectively? 6. Textbook 5.9.1: Let X and Y be subspaces of R3 whose respective bases are 1 1 1 1 , 2 BX = and BY = 2 1 2 3 (a) Explain why X and Y are complementary subspaces of R3 . (b) Determine the projector P onto X along Y as well as the complementary projector Q onto Y along X . 2 (c) Determine the projection of v = −1 onto Y along X . 1 (d) Verify that P and Q are both idempotent. (e) Verify that R(P) = X = N(Q) and N(P) = Y = R(Q). 7. (a) Find the orthogonal projection of b = (4, 8)| onto M = span {u}, where u = (3, 1)| . (b) Find the orthogonal projection of b onto u⊥ , for b, u given in (a). (c) Find the orthogonal projection of b = (5, 2, 5, 3)| onto | | | M = span (3/5, 0, 4/5, 0) , (0, 0, 0, 1) , (4/5, 0, 3/5, 0) . (Note: the given columns are orthonormal.) (d) Find the orthogonal projection of b = (1, 1, 1)| onto the range of 1 0 A = 2 1 1 0 8. (a) Show that kPk2 ≥ 1 for every projector P 6= 0. When is kPk2 = 1? (b) Show that kI − Pk2 = kPk2 for all projectors P 6= 0, I. 9. (a) Show that the eigenvalues of a unitary matrix satisfy |λ| = 1. Show by a counterexample that reverse not true. (b) Show that the eigenvalues of a projector are either 0 or 1. Show by a counterexample that the reverse not true. 10. Let u be a unit vector. The elementary reflector about u⊥ is defined to be R = I−2uu∗ . 127 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation (a) Prove that all elementary reflectors are involutory (R2 = I), hermitian, and unitary. (b) Prove that if Rx = µêi , then µ = ±kxk2 , and that R:i = Rêi = ±x. (c) Find the elementary reflector that maps x = 13 (1, −2, −2)| onto the x-axis. (d) Verify by direct computation that your reflector in (c) is symmetric, orthogonal, involutory. (e) Extend the vector x in (c), to an orthonormal basis for R3 . (Hint: what do you know about the columns of R from parts (a,b) above?) 11. Textbook 5.6.17: Perform the following sequence of rotations in R3 beginning with 1 1 v0 = −1 1. Rotate v0 counterclockwise 45° around the x-axis to produce v1 . 2. Rotate v1 clockwise 90° around the y-axis to produce v2 . 3. Rotate v2 counterclockwise 30° around the z-axis to produce v3 . Determine the coordinates of v3 as well as an orthogonal matrix Q such that Qv0 = v3 . −2 0 −4 4. Find its core-nilpotent decomposition. 12. (a) Find the index of A = 4 2 3 2 2 (b) A matrix is said to be nilpotent if Ak = 0 for some k. Show that the index of a nilpotent matrix is the smallest k for which Ak = 0. Find its core-nilpotent decomposition. (c) Find the index of a projector that is not the identity. Find its core-nilpotent decomposition. (d) What is the index of the identity? 9.10 Lecture 35: November 15, 2013 Range Nullspace decomposition of An×n Theorem 9.30. For any An×n and some k; Rn = R(Ak ) ⊕ N(Ak ). The smallest such k is called the index of A. Example 9.31. Nilpotent matrices have some k such that Nk = 0, R(Nk ) = {0}, and N(Nk ) = Rn Proof. First, note that R(Ak+1 ) ⊆ R(Ak ) for any k. This is because if y ∈ R(Ak+1 ), then y = Ak+1 z for some z, then y = Ak (Az). Second, R(A) ⊂ R(A2 ) ⊂ R(A3 ) ⊂ · · · ⊂ R(Ak ) = R(Ak+1 ) = R(Ak+2 ) = · · · contains equality for some k. The dimensions decrease 128 9.10. Lecture 35: November 15, 2013 Applied Matrix Theory if proper. Third, once equality achieved, it is maintained through the rest of the chain. The proof: R(Ak+2 ) = R(Ak+1 A), (9.10.1a) = AR(Ak+1 ), (9.10.1b) = AR(Ak ), (9.10.1c) k = R(A ). (9.10.1d) Fourth, N(A0 ) ⊂ N(A) ⊂ N(A2 ) ⊂ · · · ⊂ N(Ak ) = N(Ak+1 ) = N(Ak+2 ) = · · ·. Why does the nullspace change at the same spot as the columnspace? Because dim(N(Ak )) = n−dim(R(Ak )), so once the dimensions are constant in the columnspace, then the dimensions will be constant for the nullspace. Fifth, R(Ak ) ∩ N(Ak ) = {0}: Let y ∈ R(Ak ) and y ∈ N(Ak ), then y = Ak x for some x, and Ak y = 0. So A2k x = 0 and x ∈ N(A2k ) = N(Ak ) k so A x = 0. Sixth, R(Ak ) + N(Ak ) = Rn since the dimensions add up and there is no |{z} y intersection of the two spaces (except for {0}). Now, how can we factor the matrix? Corresponding factorization of A k k Let {x1 , . . . , xr } be a basis for R(A ) and y1 , . . . , yn−r be a basis for N(A ). Then S = x1 , . . . , xr , y1 , . . . , yn−r , and we note that X = span {x1 , . . . , xr } and Y = span y1 , . . . , yn−r which are both invariant subspaces. So Cr×r 0 −1 S AS = . (9.10.2) 0 N(n−r),(n−r) k Note S−1 Ak S = (S−1 AS) because the inverse and normal S terms cancel out in the exponentiation. Thus, C̃ 0 −1 k S A S= , (9.10.3a) 0 Nk = S−1 Ak X Y , (9.10.3b) = S−1 Ak X Ak Y , (9.10.3c) −1 Ak X 0 , =S (9.10.3d) −1 k = S A X 0 . (9.10.3e) Thus Nk = 0 and N is nilpotent and C is invertible. So we have a core-nilpotent factorization of A. So we have a similarity factorization which always exists. We recall the decomposition for any A ∈ Rn×n = R(A) ⊕ N(A| ) = R(A| ) ⊕ N(A), corresponding factorization C 0 | U AV = . (9.10.4) 0 0 129 Nitsche and Benner Unit 9. Orthogonalization with Projection and Rotation 130 UNIT 10 Singular Value Decomposition 10.1 Lecture 35 (cont.) Singular Value Decomposition The singular value decomposition is a way to find the orthogonal matrices Un and Vn may be found such that we may diagonalize A. Or σ1 0 · · · 0 0 · · · 0 .. . . . . 0 σ2 . . .. .. . . .. .. ... 0 0 · · · 0 | | | Um · · · U2 U1 AV1 V2 · · · Vm = (10.1.1) 0 · · · 0 σr 0 · · · 0 0 ··· 0 0 0 ··· 0 . . . . . . . . . . . . . . . . . . 0 ··· 0 0 0 ··· 0 Theorem 10.1. For any Am×n there exists orthogonal U and V such that | Am×n = UDV , (10.1.2a) σ1 0 · · · 0 . . 0 σ2 . . .. . . .. .. ... 0 = [U]m×m 0 · · · 0 σr 0 ··· 0 0 . .. .. .. . . 0 ··· 0 0 0 ··· .. . 0 0 0 .. . ··· ··· ··· .. . 0 ··· 0 .. . 0 | 0 V n×n 0 .. . 0 (10.1.2b) where σi are real and greater than 0. Further σ1 ≥ σ2 ≥ · · · ≥ σr , where r = rank(A). Definition 10.2. σi are the singular values of A. Note: 131 Nitsche and Benner Unit 10. Singular Value Decomposition 1. σi are uniquely determined, but U, V are not unique 2. rank(A) = rank(D) 3. kAk2 = kDk2 A 0 0 B = max (kAk2 , kBk2 ) 2 4. If A is invertible, σ1 0 A = U . .. 0 1 σ A−1 1 0 = V .. . 0 1 σn 0 = Ṽ .. . 0 Now K(A) = kAk·kA−1 k = σ1 σn ··· 0 . . σ2 . . .. | V , .. .. . . 0 · · · 0 σr 0 ··· 0 . .. 1 . .. σ2 U| , .. .. . 0 . · · · 0 σ1n 0 ··· 0 . .. 1 . .. | σn−1 Ũ . .. .. . 0 . · · · 0 σ11 0 (10.1.3a) (10.1.3b) (10.1.3c) which means that we can have issues with singularities. Example 10.3. Prove kI − Pk2 = kPk2 . What is the norm of P and of I − P? From illustration we can use tangents to the unit ball. Then kPωk = k(I − P) ωk needs to be shown. 10.2 Lecture 36: November 18, 2013 We will do review for exam on Friday. Singular Value Decomposition SVD: Theorem 10.4. For any Am×n there exists orthogonal U, V such that | Am×n = Um×m Dm×n Vn×n , 132 (10.2.1) 10.2. Lecture 36: November 18, 2013 Applied Matrix Theory where σ1 0 ··· ... 0 .. . 0 ··· .. . 0 . .. D= 0 0 . .. σ2 ... ... ··· ··· 0 0 .. . 0 σr 0 .. . 0 0 0 .. . 0 ··· 0 0 0 ··· ··· ··· ··· .. . 0 .. . 0 0 0 .. . 0 m×n (10.2.2) and σi > 0, σ1 ≥ σ2 ≥ · · · ≥ σr > 0. Notes: 1. kAk2 = σ1 , kA−1 k2 = 1 , σn where A would have to be invertible. The condition number is κ(A) = σ1 . σn 2. r = rank(A). 3. |det(A)| = Qn i=1 σi . 4. A−1 = VD−1 U| . Existence of the Singular Value Decomposition Proof. We know that there exists U and V such that, | UA V = C 0 , 0 0 (10.2.3) where Cr×r is invertible. Let x be such that kxk = 1, and kCk2 = max kCyk2 , (10.2.4a) = kCxk2 , = σ1 , = kAk2 . (10.2.4b) (10.2.4c) (10.2.4d) kyk2 =1 Let y = Cx kCxk2 and further the two orthogonal matrices, [x | X] and [y | Y]. Now, | C 0 y Cx CX , [y | Y] [x | X] = | 0 0 Y | y Cx y| CX = Y| Cx Y| CX | 133 (10.2.5a) (10.2.5b) Nitsche and Benner Unit 10. Singular Value Decomposition Further, x| C| Cx y Cx = , kCxk2 | kCxk22 , = kCxk2 = kCxk2 , = σ1 . (10.2.6a) (10.2.6b) (10.2.6c) (10.2.6d) Similarly, YCx = 0, | x| C| CX , kCxk2 x| C| Cxx| X = , kCxk2 x| C| Cx | = x X, kCxk2 | = σ1 x X , |{z} y CX = (10.2.7a) (10.2.7b) (10.2.7c) (10.2.7d) orthogonal = 0. (10.2.7e) So we have reduced to, σ1 0 . | 0 C̃ We may then repeat this by maximizing the two-norm to get the full singular value decomposition. Notes: Am×n ··· 0 0 . . . 0 σ2 . . .. .. . . .. .. ... 0 0 = [U]m×m 0 · · · 0 σr 0 0 ··· 0 0 0 . .. .. .. .. . . . 0 ··· 0 0 0 σ1 0 | | 0 σ2 = u1 · · · ur . . .. . | | m×r . 0 ··· σ1 0 ··· 0 .. . · · · 0 | [V ]n×n , · · · 0 · · · 0 . . .. . . · · · 0 m×n ··· 0 − v|1 − .. .. . . .. , . .. | . 0 − vr − r×n 0 σr r×r = ÛD̂V̂. (10.2.8a) (10.2.8b) (10.2.8c) 134 10.2. Lecture 36: November 18, 2013 Applied Matrix Theory from trimming out the zeros. Here σ1 , . . . , σr are unique, and u1 , . . . , ur and v1 , . . . , vr are unique up to sign. From the existence of A = UDV| , what can we deduce? We know that U| U = UU| = I and V| V = VV| = I. So, [AV]:j = [UD]:j , 0 .. . 0 = U σ j , 0 . .. 0 (10.2.9a) (10.2.9b) = σj uj . (10.2.9c) σj uj , 1 ≤ j ≤ r , 0, j>r (10.2.10a) where (AB):j = AB:j . Now, Avj = | | | A = VD U , | | A U = VD, A uj = σj vj , 1 ≤ j ≤ r . 0, j>r (10.2.10b) (10.2.10c) So, the four fundamental subspaces are • R(A) = span {u1 , . . . , ur } • N(A) = span {vr+1 , . . . , vn } • R(A| ) = span {v1 , . . . , vr } • N(A| ) = span {ur+1 , . . . , um } | AA | n×n | | = VD U UDV , | (10.2.11a) | = VD DV , 2 σ1 0 · · · 0 . . 0 σ22 . . .. . . .. .. ... 0 2 = V 0 · · · 0 σr 0 ··· 0 0 . .. .. .. . . 0 ··· 0 0 | | A AV :j = VD D :j , 1 σj vj , j ≤ r | A Avj = . 0, j>r 135 (10.2.11b) 0 ··· .. . 0 0 0 .. . ··· ··· ··· ... 0 ··· 0 .. . 0 | V , 0 0 .. . 0 n×n (10.2.11c) (10.2.11d) (10.2.11e) Nitsche and Benner Unit 10. Singular Value Decomposition p Thus, σj = λs (A| A) , for j = 1, . . . , r. Similarly, vj are the eigenvectors of A| A for j = 1, . . . , r and vj are orthogonal because eigenvectors of symmetric matrices are orthogonal. To construct the SVD, we will 1. find λj , which are the eigenvalues of A| A and the eigenvectors of A| A, vj . 2. Find u1 , . . . , ur for σj uj = Avj 3. Find complementary orthogonal set ur+1 , . . . , um and vr+1 , . . . , vn . 10.3 Lecture 37: November 20, 2013 Review and correction from last time From last time: C 0 U AV = 0 0 | (10.3.1) Then we said there exists an x such that kCxk = kCk2 = σ1 . Then we let y = Cx Consider, σ1 [x | X] and [y | Y]. In our system (we must correct this from last lecture), since we know that the x is the eigenvector corresponding to the λ and C| Cx = λx. Then x| C| C = x| λ = λx| x| C| CX , σ1 λx| X , = σ1 = 0. | y CX = (10.3.2a) (10.3.2b) (10.3.2c) SVD will not be on the exam, but will be on the final. Singular Value Decomposition We know, | (10.3.3a) | (10.3.3b) VA = UD. (10.3.4) A = UDV , = ÛD̂V̂ Similarly, This means that Avj = σj uj , j ≤ r 0, j>r (10.3.5) σj v j , j ≤ r 0, j>r (10.3.6) Then, | A uj = Thus,pvj are called the right singular vectors, the uj are the left singular vectors, and σj = λ(A| A) are the singular values. Also we may define the four subspaces, 136 10.3. Lecture 37: November 20, 2013 Applied Matrix Theory • R(A) = span {u1 , . . . , ur } • N(A) = span {vr+1 , . . . , vn } • R(A| ) = span {v1 , . . . , vr } • N(A| ) = span {ur+1 , . . . , um } So if we have the SVD, it is easy to describe these subspaces. So we will construct the SVD using these facts. Now, | | | | A A = VD U UDV , | | = VD DV , | | A AV = VD D. (10.3.7a) (10.3.7b) (10.3.7c) 1 1 A= . 2 2 (10.3.8) Example 10.5. Given Then r = 1 and 1 A A= 1 5 = 5 | 2 2 5 5 1 1 , 2 2 (10.3.9a) (10.3.9b) | A Av = λv, | det A A − λI = 0, 5 − λ 5 = 25 − 10λ + λ2 − 25, 5 5 − λ (10.3.10a) (10.3.10b) (10.3.10c) = λ2 − 10λ, = λ (λ − 10) . (10.3.10d) (10.3.10e) So to find v1 for (A| A − λI)v = 0. 5 − 10 5 v = 0, 5 5 − 10 −5 5 v1 0 = , 5 −5 v2 0 (10.3.11a) −5v1 + 5v2 = 0, 1 1 v1 = √ 2 1 v2 = v1 , (10.3.11b) (10.3.11c) (10.3.11d) So 1 1 1 2 1 2 √ Av1 = =√ 1 2 1 2 2 4 137 (10.3.12a) Nitsche and Benner Unit 10. Singular Value Decomposition Thus, 1 2 , u1 = √ 20 4 1 1 =√ 5 2 (10.3.13a) (10.3.13b) 1 1 √ 1 A= √ 10 √ 1 1 , 5 2 2 | = ÛD̂V̂ , √ 1 1 1 2 1 1 10 0 √ =√ , 0 0 5 2 −1 2 −1 1 | = UDV (10.3.14a) (10.3.14b) (10.3.14c) (10.3.14d) This is great to do by hand, but is not a very numerically stable way to find the SVD. Geometric interpretation The image of the unit sphere S2 = {x ∈ Rn , kxk2 = 1} y = Ax, | = UDV x, | | U y = DV x. (10.3.15a) (10.3.15b) (10.3.15c) y0 = Dx0 , yj0 = σj x0j . (10.3.16a) (10.3.16b) Let y0 = U| x and x0 = V| x. So 2 Now kxk22 = 1 and kx0 k22 = kV| xk2 = 1. Thus, 2 2 2 (x01 ) + (x02 ) + · · · + (x0n ) = 1, 0 2 0 2 0 2 y1 y2 yn + + ··· + = 1, σ1 σ2 σn (10.3.17a) (10.3.17b) which is a hyperellipse! Viewing the transformation of Axj = σj uj . This shows that the σj give the major and minor axes of the multi-dimensional ellipsoid. There is a nice fact about the SVD. For low rank approximations (the second step may be rationalized easily from the matrix form) | A = UDV , r X | = σj uj vj . (10.3.18a) (10.3.18b) j=1 This is a way to write any matrix as a sum of rank 1 matrices. Now the σj decrease, so we may P truncate the series when σj gets close to zero. Let Ak = kj=1 σj uj v|j with rank(Ak ) = k. 138 10.4. Lecture 38: November 22, 2013 Applied Matrix Theory Theorem 10.6. kA − Ak k2 = σk+1 and is the best approximation, or kA − Ak k2 = From .. . σk σk+1 .. . σr 0 .. = U kA − Bk2 . (10.3.19) σ1 Ak = U min rank(B)=k . 0 σk+1 .. . σr σ1 .. . σk | | 0 V − U V , . . . 0 .. . .. . 0 0 (10.3.20a) | V .. . 0 (10.3.20b) We will explore the proof and implications of this theorem later. 10.4 Lecture 38: November 22, 2013 Review for Exam 2 From homework, we need to be able to go through proofs like this • kAk∞ ⇐⇒ kAk1 • Matrix norm • QR unique • kA−1 k2 = √ • kAk2 = 1 λmin (A| A) p λmax (A| A) Norms To show that something is a norm (whether matrices or vectors), we must show the following properties, 139 Nitsche and Benner Unit 10. Singular Value Decomposition 1. kxk ≥ 0 for any x and kxk = 0 implies x = 0 2. kαxk = |α|kxk 3. kx + yk = kxk + kyk Several matrix norms, the induced and Frobenius, have the fourth property, kABk ≤ kAkkBk (10.4.1) More major topics The exam covers chapters 4 and 5 (minus the SVD). These are things to know: • Subspace (closed under addition and scalar multiplication) • Linear transformations (Definition: addition and scalar multiplication) • Coordinates and change of bases [x]S = x, X = xi êi = Ix, (10.4.2a) (10.4.2b) i = X ci ui = Uc. (10.4.2c) i where c = [x]B and this is clearly a problem of inverting a matrix. The formula is c = [x]B , (10.4.3a) = [ê1 ]B [ê2 ]B · · · [ên ]B [x]S . | {z } (10.4.3b) U−1 So we really care about what the representation is of some linear operator for some basis. [T]B = [T(u1 )]B [T(u2 )]B · · · [T(un )]B , (10.4.4a) [T(x)]B = [T]B [x]B . (10.4.4b) • Change coordinates [T]B ∼ [T]0B , T = ST0 S−1 . (10.4.5a) (10.4.5b) • Least squares: Ax = b. The normal equations are | | A Ax̂ = A b 140 (10.4.6) 10.4. Lecture 38: November 22, 2013 Applied Matrix Theory This connects with projections because, | −1 | Ax̂ = A A A A b = Pb. | {z } (10.4.7) P⊥R(A) The solution is unique of the matrix is full rank because then A| A is invertible. • Projectors Defined by P = P2 , similarly we have the properties of the complementary projector (I − P). These are orthogonal to each other, and P∗ = P (Unitary matrix is an orthonormal matrix (when real) Q∗ Q = I and Q∗ = Q−1 . The projector always projects onto its range. The proof of P and (I − P). • Gram–Schmidt needed to orthogonalize a set of matrices. or P = uu∗ = UU∗ = I−uu∗ • QR othogonalization (using Gram–Schmidt) Show A = QR is unique, rjj > 0, Q is orthonormal, R is upper triangular. Existence and uniqueness? From the Gramm–Schmidt construction process we know we can get it because we can always construct it. GS was a1 = r11 q1 , a2 = r12 q1 + r22 q2 , ··· an = r1n q1 + r2n q2 + · · · + r2,n qn . (10.4.8a) (10.4.8b) (10.4.8c) (10.4.8d) Uniqueness: this also shows uniqueness directly because you have these equations and may invert them. (Invertibility) a1 = r11 q1 implies ka1 k = kr11 q1 k = |r11 |kq1 k and we may find the r11 so then q1 = r111 a1 . Then induction may prove this is true for all the other values of n. First we would show true for n = 1 (all qk are uniquely determined), then show if true for n = k then it’s also still true for n = k + 1. This is done with showing r1,k+1 , . . . , rk+1,k+1 , qk+1 are uniquely determined. ak+1 = r1,k+1 q1 + · · · + rk,k+1 qk + rk+1,k+1 qk+1 (10.4.9a) This is a Fourier series and we may take ak+1 , qj = rj,k+1 qj , qj = rj,k+1 for j < k+1 therefore all we have left is to find the vector rk+1,k+1 qk+1 = ak+1 − r1,k+1 q1 − · · · − rk,k+1 qk and we can do the same argument again to finish with rk+1,k+1 and qk+1 rk+1,k+1 qk+1 = kbk, |rk+1,k+1 | qk+1 = kbk, | {z } (10.4.9b) (10.4.10a) (10.4.10b) 1 |rk+1,k+1 | = kbk, rk+1,k+1 = kbk. 141 (10.4.10c) (10.4.10d) Nitsche and Benner Unit 10. Singular Value Decomposition For positive rj,j . So we have several decompositions now to work with. • Invariant subspaces will give a block diagonal form of the matrix. will have class on Wednesday. 10.5 Homework Assignment 8: Due Tuesday, December 10, 2013 You may use Matlab to compute matrix products, or to reduce a matrix to Row Echelon Form. 1. Determine the SVDs of the following matrices (by hand calculation). 3 0 (a) 0 −2 0 2 (b) 0 0 0 0 1 1 (c) 1 1 1 2 2. Let 0 2 (a) Use Matlab to find the SVD of A. State U, Σ, V (4-decimal digit format is fine). (b) In one plot draw the unit circle C and indicate the vectors v1 , v2 , and in another plot draw the ellipse AC (i.e. the image of the circle under the transformation x → Ax) and indicate the vectors Av1 = σ1 u1 , Av2 = σ2 u2 . Use the axis(’square’) command in Matlab to ensure that the horizontal and vertical axes have the same scale. (c) Find A1 , the best rank-1 approximation to A in the 2-norm. Find kA − A1 k2 . 3. Let A ∈ Rm×n , with rank r. Use the singular value decomposition of A to prove the following. (a) N(A) and R(A| ) are orthogonal complementary subspaces of Rn . (b) Properties in 5.2.6 (b, c, d, e): Establish the following properties of the matrix 2-norm. (a) * (b) kAk2 = kA∗ k2 , (c) kA∗ Ak2 = kAk22 , 142 10.5. HW 8: Due December 10, 2013 Applied Matrix Theory A 0 = max {kAk , kBk } (take A, B to be real, (d) 2 2 0 A 2 ∗ ∗ (e) kU AVk2 = kAk2 when UU = I and V∗ V = I. p (c) kAkF = σ12 + σ22 + · · · + σr2 . 4. Show that if A ∈ Rn×n is symmetric then σj = |λj |. 5. Compute the determinants of the matrices given in 6.1.3 (a), 6.1.3 (c), 6.2.1 (b). 1 2 3 (a) A = 2 4 1 1 4 4 1 2 −3 4 4 8 12 −8 (b) A = 2 3 2 1 −3 −1 1 −4 0 0 −2 3 1 0 1 2 (c) 2 1 −1 1 0 2 −3 0 6. (a) (b) (c) (d) (e) Show that if A is invertible, then det(A−1 ) = 1/ det(A). Show that for any invertible matrix S, det(SAS−1 ) = det(A). If A is n × n, show that det(αA) = αn det(A). If A is skew-symmetric, show that A is singular whenever n is odd. Show by example that in general, det(A + B) 6= det(A) + det(B). 7. (a) Let An×n = diag {d1 , d2 , . . . , dn }. What are the eigenvalues and eigenvectors of A? (b) Let A be a nonsingular matrix and let λ be an eigenvalue of A. Show that 1/λ is an eigenvalue of A−1 . (c) Let A be an n × n matrix and let B = A − αI for some scalar α. How do the eigenvalues of A and B compare? Explain. (d) Show that all eigenvalues of a nilpotent matrix are 0. 8. For each of the two matrices, 3 2 1 2 0 , A = A1 = 0 −2 −3 0 −4 −3 −3 0 A = A2 = 0 −1 6 6 5 determine if they are diagonalizable. If they are, find (a) a nonsingular P such that P−1 AP is diagonal. (b) A100 (c) eA . 143 Nitsche and Benner Unit 10. Singular Value Decomposition 9. Use diagonalization to solve the system dx = x + y, dt dy = −x + y, dt x(0) = 100, y(0) = 100. 10. 7.4.1 Suppose that An×n is diagonalizable, and let P = [x1 |x2 | · · · |xn ] be a matrix whose columns are a complete set of linearly independent eigenvectors corresponding to eigenvalues λi . Show that the solution to u0 = Au, u(0) = c, can be written as u(t) = ξ1 eλ1 t x1 + ξ1 eλ1 t x1 in which the coefficients ξi satisfy the algebraic system Pξ = c. 11. 7.5.3 Show that A ∈ Rn×n is normal and has real eigenvalues if and only if A is symmetric. 12. 7.5.4 Prove that the eigenvalues of a real skew-symmetric or skew-hermitian matrix must be pure imaginary numbers (i.e., multiples of i). 13. 7.6.1 Which of the following matrices are positive definite? 1 −1 −1 20 6 8 5 1, B = 6 3 0, A = −1 −1 1 5 8 0 8 2 0 2 C = 0 6 2 . 2 2 4 14. 7.6.4 By diagonalizing the quadratic form 13x2 + 10xy + 13y 2 , show that the rotated graph of 13x2 + 10xy + 13y 2 = 72 is an ellipse in standard form as shown in Figure 7.2.1 on p. 505. 10.6 Lecture 39: November 27, 2013 We will have one more homework before the end. We will have a homework on SVD and eigenvalues with the diagonalization, and we will be covering the Jordan Canonical Form but may not be putting it on the homework. It will be due next Friday so we have time for the solutions before the final. The final is cumulative and will be held on Wednesday. Singular Value Decomposition We know that A = UΣV| for any matrix A. Here Σ is a diagonal matrix. We may rearrange, AV = UΣ, σj uj , j ≤ r Avj = 0 j>r 144 (10.6.1a) (10.6.1b) 10.6. Lecture 39: November 27, 2013 Applied Matrix Theory P P The SVD A = rj=1 σj uj v|j for a matrix of rank r. We may define, Ak = kj=1 σj uj v|j and have an aproximation of rank k. Theorem 10.7. kA − Ak k2 = σk+1 , = min rank(B)=k kA − Bk2 (10.6.2a) (10.6.2b) In words, Ak is a best approximation of rank k of A in the 2-norm. Proof. The first part is easily shown by the matrix form of the eigenvalues which are in the diagonal matrix of the SVD. For the second part, we assume there is a matrix B which has rank k and follows the condition kA − Bk2 < σk+1 . Then there exists a subspace W of dim(W) = n − k such that Bw = 0 for any w ∈ W. For such a w, kAwk2 = k(A − B) wk2 , ≤ k(A − B)k2 kwk2 , < σk+1 kwk2 . (10.6.3a) (10.6.3b) (10.6.3c) But subspace V of dim(V) = k + 1 such that, kAwk2 ≥ σk+1 for all w ∈ V, namely V = span {v1 , . . . , vk+1 }. This is impossible though because the subspaces do not have agreeing dimensions, or since dim(V) + dim(W) > n there exists w 6= 0 ∈ (V ∩ W). For this w must have kAwk2 < σk+1 kwk2 and kAwk2 ≥ σk+1 kwk2 which is an impossible contradiction. This proof is a little more elementary than the proof in the book. Thus, we can approximate a matrix by some lower-rank matrices. This is good because then we have fewer non-zero entries in our system and reduce our co. SVD in Matlab Example handed out in class: In Matlab if you say x = load(clown.mat) then type whos and you will see a matrix X. This may be displayed with image(X). Then we do [U,S,V] = svd(X). The first figure (Figure 10.1) plots the diagonal entries of S. So we see we can truncate the small values. As we increase the approximations for k = 3, 10, 30 we see a significantly improving image in Figure 10.2. So Ak = ŨΣ̃Ṽ| and this is done with Ak = U(:,1:k) * S(1:k,1:k) * V(:,1:k)’. Now we see that for k = 30 we have a good approximation which is significantly less expensive than the original matrix. Further in Table 10.1 we observe that the relative error decreases significantly. Listing 10.1. svdimag.m 1 2 3 4 5 6 7 % a p p l i c a t i o n o f t h e SVD t o image c o m p r e s s i o n % from ” A p p l i e d Numerical L i n e a r A l g e b r a ” , by J . Demmel , page 114 (SIAM) load clown . mat % X i s a m a t r i x o f p i c e l s o f dimension 200 by 320 [ U, S ,V]=svd (X ) ; %% figure (1) 145 Nitsche and Benner Unit 10. Singular Value Decomposition Table 10.1. Relative error of SVD approximation matrix Ak relative error compression ratio k σk+1 /σk 520k/(200 · 320) 3 0.155 0.024 10 0.077 0.081 30 0.027 0.244 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 plot ( diag ( S ) ) ; set ( gca , ’ F o n t S i z e ’ , 1 5 ) xlabel ( ’ k ’ ) ylabel ( ’ \ s i g m a k ’ ) title ( ’ Singular values of X’ ) %% figure (2) i f o n t =12 colormap ( ’ gray ’ ) subplot ( ’ p o s i t i o n ’ , [ . 0 7 , . 5 4 , . 4 0 , . 4 0 ] ) k=3; image (U ( : , 1 : k ) ∗ S ( 1 : k , 1 : k ) ∗V ( : , 1 : k ) ’ ) ; t i t l e ( ’ k=3 ’ ) set ( gca , ’ F o n t S i z e ’ , i f o n t ) set ( gca , ’ XTickLabel ’ , ’ ’ ) % subplot ( ’ p o s i t i o n ’ , [ . 5 , . 5 4 , . 4 0 , . 4 0 ] ) k =10; image (U ( : , 1 : k ) ∗ S ( 1 : k , 1 : k ) ∗V ( : , 1 : k ) ’ ) ; t i t l e ( ’ k=10 ’ ) set ( gca , ’ F o n t S i z e ’ , i f o n t ) set ( gca , ’ YTickLabel ’ , ’ ’ ) set ( gca , ’ XTickLabel ’ , ’ ’ ) % subplot ( ’ p o s i t i o n ’ , [ . 0 7 , . 0 6 , . 4 0 , . 4 0 ] ) k =30; image (U ( : , 1 : k ) ∗ S ( 1 : k , 1 : k ) ∗V ( : , 1 : k ) ’ ) ; t i t l e ( ’ k=30 ’ , ’ F o n t S i z e ’ , i f o n t ) set ( gca , ’ F o n t S i z e ’ , i f o n t ) % subplot ( ’ p o s i t i o n ’ , [ . 5 , . 0 6 , . 4 0 , . 4 0 ] ) image (X ) ; t i t l e ( ’ o r i g i n a l ’ ) set ( gca , ’ F o n t S i z e ’ , i f o n t ) set ( gca , ’ YTickLabel ’ , ’ ’ ) 146 10.6. Lecture 39: November 27, 2013 Applied Matrix Theory Singular values of X 8,000 σk 6,000 4,000 2,000 0 0 20 40 60 80 100 k 120 140 160 Figure 10.1. Singular values σk of matrix X versus k. k=3 k=10 k=30 original 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200 50 100 150 200 250 300 50 100 150 200 250 300 Figure 10.2. Rank k approximations of original image. 147 180 200 Nitsche and Benner Unit 10. Singular Value Decomposition 148 UNIT 11 Additional Topics 11.1 Lecture 39 (cont.) The Determinant We will quickly cover the essentials of chapter 6. The determinant is defined; Definition 11.1. det(A) = X σ(p)a1p1 a2p2 · · · anpn (11.1.1) p where p is the number of permutations of (1, . . . , n) → (p1 , p2 , . . . , pn ). Also, σ(p) is the sign of the permutation, +1, if even number of exchanges needed to obtain p from (1, . . . , n) σ(p) = (11.1.2) −1, if odd number of exchanges needed to obtain p from (1, . . . , n) If we have a non-zero determinant, then Ax = b has a unique solution. Theorem 11.2. We have several interesting properties of determinants. 1. Triangular matrices: a11 a12 · · · a1n n .. . a2n 0 a Y det . . 22 . = aii . . . . ... i=1 .. 0 0 · · · ann 2. det(A| ) = det(A) 3. det(AB) = det(A) det(B). 4. If B is obtained for A by • Exchange row i with row j; det(B) = det(A). • Multiply row i by α; det(B) = α det(A). • Add multiple of row i to row j; det(B) = det(A). 5. det(A) is a bilinear operator in the rows and columns of A 149 (11.1.3) Nitsche and Benner 11.2 Unit 11. Additional Topics Lecture 40: December 2, 2013 Further details for class Homework due Friday, with latest it can possibly be turned in on Tuesday before 4:30 (to get solutions). Final is on Wednesday at 7:30–9:30. (?) Today we will cover eigenvalues and eigenvectors. Then on Wednesday we will cover positive-definite matrices. For Final, we will review on Friday. Some homework problems may definitely be ignored because they were too involved. Diagonalizable Matrices We know that for any matrix, A∼B (11.2.1) A = SBS−1 . (11.2.2) means Now we want to know when A ∼ D which is a diagonal matrix . Eigenvalues and eigenvectors Say we have the eigen-pair (λ, v), when Av = λv, (A − λI) v = 0. (11.2.3a) (11.2.3b) which is only the case for v ∈ N(A − λI). Thus we care about det(A − λI) = 0. So, a11 − λ a12 ··· a1n a21 a22 − λ · · · a2n det (A − λI) = .. (11.2.4a) .. .. , ... . . . an1 an2 · · · ann − λ = (a11 − λ) (a22 − λ) · · · (ann − λ) + powers of λ of degree ≤ n − 2, (11.2.4b) = p(λ), (11.2.4c) = (−1)n λn + (−1)n−1 λn−1 (a11 + a22 + · · · + ann ) + lower order terms in λk , k ≤ n − 2, {z } | tr(A) (11.2.4d) n = (λ − λ1 ) (λ − λ2 ) · · · (λ − λn ) (−1) , = (−1)n λn + (−1)n−1 (λ1 + λ2 + · · · + λn ) + l.o.t., = (−1)n λn + λn−1 (−λ1 − λ2 − · · · − λn ) + l.o.t. , (11.2.4e) (11.2.4f) (11.2.4g) (11.2.4h) with the final step being from the fundamental theorem of algebra. From this we get the following: 150 11.2. Lecture 40: December 2, 2013 Applied Matrix Theory • Every matrix A has n eigenvalues. P • The sum λk = tr(A). Q • λk = p(0) = det(A). Q • If A is triangular det(A − λi I) = (aii − λi ) = 0 so the roots are simply the aii and λi = aii . Example 11.3. For a little reviewing find the eigenvalues and the eigenvectors of 1 −1 A= 1 1 So, 1 − λ −1 det(A − λI) = 1 1 − λ λ1,2 = (1 − λ)2 + 1, (11.2.5a) = λ2 − 2λ + 2, √ 2± 4−8 , = 2 = 1 ± i. (11.2.5b) (11.2.5c) (11.2.5d) Then for λ1 : 1 − (1 + i) −1 0 (A − λI) v = 0 1 1 − (1 + i) 0 1 −i 0 → , −i −1 0 1 −i 0 , → 0 0 0 = −i −1 0 1 i 0 , (11.2.6a) (11.2.6b) (11.2.6c) (11.2.6d) So, v1 − iv2 = 0, v1 = iv2 , i v1 = 1 (11.2.7a) (11.2.7b) λ2 = 1 − i, −i v2 = 1 (11.2.8a) (11.2.7c) Then Note that the eigenvectors v1 , v2 are linearly independent. 151 (11.2.8b) Nitsche and Benner Unit 11. Additional Topics Note: If A has a linearly independent set of eigenvectors, then, | | | V= v1 v2 · · · vn | | | is invertible and Avj = λj vj . Then, for a diagonal matrix D with the eigenvalues along the diagonal (AV):j = (VD):j , (11.2.9a) AV = VD, A = VDV−1 . (11.2.9b) (11.2.9c) So not all matrices are diagonalizable. Example 11.4. Given the matrix 1 1 A= , 0 1 has the double eigenvalue of 1; λ1 = λ2 = 1. So, 0 1 A − λI = , 0 0 dim(N(A − λI)) = 1, (11.2.10a) (11.2.10b) Thus there is only one eigenvector. Example 11.5. Given the matrix 1 0 A= , 0 1 has the double eigenvalue of 1; λ1 = λ2 = 1. But here, 0 0 A − λI = , 0 0 dim(N(A − λI)) = 2, (11.2.11a) (11.2.11b) and there are two linearly independent eigenvectors. 1 0 v1 = and v2 = . 0 1 Example 11.6. Any nilpotent matrix where Nk This is because, 0 ··· 1 0 A ∼ .. . 0 = 0 does not have a full set of eigenvalues. 0 0 . . . .. . 1 0 0 So λ1 = λ2 = · · · = λn = 0 and dim(N(A − λI)) = dim(N(A)) = 1. 152 (11.2.12a) 11.2. Lecture 40: December 2, 2013 Applied Matrix Theory Theorem 11.7. If A has n distinct eigenvalues, then the corresponding eigenvectors are distinct. Proof. Assume that {vk } are linearly dependent. Then, we canPwrite one of them as a linearly independent subset of the other eigenvectors. Then, vk = j6=k cj vj where {vj } are linearly independent. Then, X (A − λk I) vk = (A − λk I) cj v j , (11.2.13a) j6=k 0 = λk v k − λk v k , X = cj (Avj − λk vj ) , (11.2.13b) (11.2.13c) j6=k = X j6=k 0= X cj (λj − λk ) vj , | {z } (11.2.13d) αj v j . |{z} (11.2.13e) 6=0 6=0 This however means that the set {vj } is linearly dependent. But this is a contradiction so the assumption is not possible. So {vj } are linearly independent. Now if A = VDV−1 then, Ak = VDV−1 VDV−1 · · · VDV−1 , k = VD V −1 (11.2.14a) (11.2.14b) Similarly we can do a power series. This will be useful in solving systems of differential equations. 153 Nitsche and Benner Unit 11. Additional Topics 154 Index backward substitution, 9 basic columns, 38 basis, 56, 66, 84 bilinear operator, 149 Cauchy–Schwarz inequality, 100 change of basis, 88 column space, 58 complementary projector, 127 complimentary subspaces, 121 condition number, 27, 48 consistent system, 36 determinant, 149 diagonal matrix, 150 differentiation, 86 direct sum, 121 eigenvalues, 150 eigenvectors, vi, 150 elementary operations, 15 Euclidian norm, 19 exams, 73, 74 field, 55 finite difference, 2, 44 four fundamental subspaces, 58 Frobeius norm, 101 fundamental theorem of algebra, 65, 150 geometric series, 46 Givens rotation, 118 Gramm–Schmidt orthogonalization, 112 homogeneous solutions, 39 Householder method, 121 idempotent matrices, 122 idempotent operator, 92 ill-posed, 20 induced norm, 104 inner product, 109 interpolation, 63 invariant subspace, 91 isometry, 116 Laplace equation, 2 least squares, 69 left null space, 58 linear function, 39 linear system, 1 linear transformation, 83 action, 83, 87 linearly dependent, 66 linearly independent, 57, 63 lower triangular, 25 lower triangular system, 5 matrix form, 1 matrix norm, 101 minimization, 74 modified Gramm–Schmidt, 116 nilpotent matrix, 128 nilpotent operator, 92 nonbasic columns, 38 norm, 47, 99 normal equations, 71 null space, 58 operation count, 9 order, 3 orthogonal projector, 123 orthogonalization, 111 orthonormal, 111 155 Nitsche and Benner Index orthonormal basis, 111 partial differential equations, 111 particular solution, 38 periodic boundary conditions, 44 perturbations, 42 pivoting, 19, 22 PLU factorization, 22 projection, 118 QR factorization, 114 rank, 61 reduced row echelon form, 35 reflection, 118 review, 140 rotation, 117 row echelon form, 31 row space, 58 self-similar, 89 Sherman–Morrison formula, 44 singular value decomposition, 131 singular values, 131 smallest upper bound, 102 spanning set, 56 sparsity, 18 submatrices, 26 subspaces, 67 Taylor series, 3 trace, 40 tridiagonal matrix, 18 tuple, 92 Van der Monde matrix, 63 vector form, 1 vector space, 56 well-posed, 20 156 Figures 1.1 Finite difference approximation of a 1D boundary value problem. . . . . . . 2 2.1 2.2 One-dimensional discrete grids. . . . . . . . . . . . . . . . . . . . . . . . . . Two-dimensional discrete grids. . . . . . . . . . . . . . . . . . . . . . . . . . 10 11 3.1 Plot of linear problems and their solutions. . . . . . . . . . . . . . . . . . . . 21 4.1 4.2 Geometric illustration of linear systems and their solutions. . . . . . . . . . . Figures for Textbook problem 3.3.4. . . . . . . . . . . . . . . . . . . . . . . . 36 51 5.1 5.2 Basis vector of example solution. . . . . . . . . . . . . . . . . . . . . . . . . Interpolating system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 64 6.1 6.2 Minimization of distance between point and a plane. . . . . . . . . . . . . . Parabolic fitting by least squares . . . . . . . . . . . . . . . . . . . . . . . . 73 73 7.1 Figure 4.7.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 10.1 Singular values σk of matrix X versus k. . . . . . . . . . . . . . . . . . . . . 147 10.2 Rank k approximations of original image. . . . . . . . . . . . . . . . . . . . . 147 157 Nitsche and Benner Figures 158 Tables 3.1 Variation of error with the perturbation variable . . . . . . . . . . . . . . . . 20 10.1 Relative error of SVD approximation matrix Ak . . . . . . . . . . . . . . . . 146 159 Nitsche and Benner Tables 160 Listings 2.1 code stub for tridiagonal solver . . . . . . . . . . . . . . . . . . . . . . . . . 13 10.1 svdimag.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 161

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement