Moore–Penrose inverse

In mathematics, and in particular linear algebra, the Moore–Penrose inverse of a matrix , often called the pseudoinverse, is the most widely known generalization of the inverse matrix.[1] It was independently described by E. H. Moore in 1920,[2] Arne Bjerhammar in 1951,[3] and Roger Penrose in 1955.[4] Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

A common use of the pseudoinverse is to compute a "best fit" (least squares) solution to a system of linear equations that lacks a solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.

The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. It can be computed using the singular value decomposition. In the special case where is a normal matrix (for example, a Hermitian matrix), the pseudoinverse annihilates the kernel of and acts as a traditional inverse of on the subspace orthogonal to the kernel.

Notation edit

In the following discussion, the following conventions are adopted.

  •   will denote one of the fields of real or complex numbers, denoted  ,  , respectively. The vector space of   matrices over   is denoted by  .
  • For  , the transpose is denoted   and the Hermitian transpose (also called conjugate transpose) is denoted  . If  , then  .
  • For  ,   (standing for "range") denotes the column space (image) of   (the space spanned by the column vectors of  ) and   denotes the kernel (null space) of  .
  • For any positive integer  , the   identity matrix is denoted  .

Definition edit

For  , a pseudoinverse of A is defined as a matrix   satisfying all of the following four criteria, known as the Moore–Penrose conditions:[4][5]

  1.   need not be the general identity matrix, but it maps all column vectors of A to themselves:
     
  2.   acts like a weak inverse:
     
  3.   is Hermitian:
     
  4.   is also Hermitian:
     

Note that   and   are orthogonal projection operators, as follows from   and  . More specifically,   projects onto the image of   (equivalently, the span of the rows of  ), and   projects onto the image of   (equivalently, the span of the columns of  ). In fact, the above four conditions are fully equivalent to   and   being such orthogonal projections:   projecting onto the image of   implies  , and   projecting onto the image of   implies  .

The pseudoinverse   exists for any matrix  . If furthermore   is full rank, that is, its rank is  , then   can be given a particularly simple algebraic expression. In particular:

  • When   has linearly independent columns (equivalently,   is injective, and thus   is invertible),   can be computed as
     
    This particular pseudoinverse is a left inverse, that is,  .
  • If, on the other hand,   has linearly independent rows (equivalently,   is surjective, and thus   is invertible),   can be computed as
     
    This is a right inverse, as  .

In the more general case, the pseudoinverse can be expressed leveraging the singular value decomposition. Any matrix can be decomposed as   for some isometries   and diagonal nonnegative real matrix  . The pseudoinverse can then be written as  , where   is the pseudoinverse of   and can be obtained by transposing the matrix and replacing the nonzero values with their multiplicative inverses.[6] That this matrix satisfies the above requirement is directly verified observing that   and  , which are the projections onto image and support of  , respectively.

Properties edit

Existence and uniqueness edit

As discussed above, for any matrix   there is one and only one pseudoinverse  .[5]

A matrix satisfying only the first of the conditions given above, namely  , is known as a generalized inverse. If the matrix also satisfies the second condition, namely  , it is called a generalized reflexive inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Basic properties edit

Proofs for the properties below can be found at b:Topics in Abstract Algebra/Linear algebra.

  • If   has real entries, then so does  .
  • If   is invertible, its pseudoinverse is its inverse. That is,  .[7]: 243 
  • The pseudoinverse of the pseudoinverse is the original matrix:  .[7]: 245 
  • Pseudoinversion commutes with transposition, complex conjugation, and taking the conjugate transpose:[7]: 245 
     
  • The pseudoinverse of a scalar multiple of   is the reciprocal multiple of  :
     
    for  .
  • Ker and image of the pseudoinverse coincide with those of the conjugate transpose:   and  .

Identities edit

The following identity formula can be used to cancel or expand certain subexpressions involving pseudoinverses:

 
Equivalently, substituting   for   gives
 
while substituting   for   gives
 

Reduction to Hermitian case edit

The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences:

 
 

as   and   are Hermitian.

Pseudoinverse of products edit

The equality   does not hold in general. Rather, suppose  . Then the following are equivalent:[8]

  1.  
  2.  
  3.  
  4.  
  5.  

The following are sufficient conditions for  :

  1.   has orthonormal columns (then  ),   or
  2.   has orthonormal rows (then  ),   or
  3.   has linearly independent columns (then   ) and   has linearly independent rows (then  ),   or
  4.  , or
  5.  .

The following is a necessary condition for  :

  1.  

The fourth sufficient condition yields the equalities

 

Here is a counterexample where  :

 

Projectors edit

  and   are orthogonal projection operators, that is, they are Hermitian ( ,  ) and idempotent (  and  ). The following hold:

  •   and  
  •   is the orthogonal projector onto the range of   (which equals the orthogonal complement of the kernel of  ).
  •   is the orthogonal projector onto the range of   (which equals the orthogonal complement of the kernel of  ).
  •   is the orthogonal projector onto the kernel of  .
  •   is the orthogonal projector onto the kernel of  .[5]

The last two properties imply the following identities:

  •  
  •  

Another property is the following: if   is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix   the following equation holds:[9]

 

This can be proven by defining matrices  ,  , and checking that   is indeed a pseudoinverse for   by verifying that the defining properties of the pseudoinverse hold, when   is Hermitian and idempotent.

From the last property it follows that, if   is Hermitian and idempotent, for any matrix  

 

Finally, if   is an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, that is,  .

Geometric construction edit

If we view the matrix as a linear map   over the field   then   can be decomposed as follows. We write   for the direct sum,   for the orthogonal complement,   for the kernel of a map, and   for the image of a map. Notice that   and  . The restriction   is then an isomorphism. This implies that   on   is the inverse of this isomorphism, and is zero on  

In other words: To find   for given   in  , first project   orthogonally onto the range of  , finding a point   in the range. Then form  , that is, find those vectors in   that   sends to  . This will be an affine subspace of   parallel to the kernel of  . The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer   we are looking for. It can be found by taking an arbitrary member of   and projecting it orthogonally onto the orthogonal complement of the kernel of  .

This description is closely related to the minimum-norm solution to a linear system.

Limit relations edit

The pseudoinverse are limits:

 
(see Tikhonov regularization). These limits exist even if   or   do not exist.[5]: 263 

Continuity edit

In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence   converges to the matrix   (in the maximum norm or Frobenius norm, say), then   need not converge to  . However, if all the matrices   have the same rank as  ,   will converge to  .[10]

Derivative edit

The derivative of a real-valued pseudoinverse matrix that has constant rank at a point   may be calculated in terms of the derivative of the original matrix:[11]

 
For a complex matrix, the transpose is replaced with conjugate transpose.[12] For a real-valued symmetric matrix, the Magnus-Neudecker derivative is established.[13]

Examples edit

Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.

  • For   the pseudoinverse is   The uniqueness of this pseudoinverse can be seen from the requirement  , since multiplication by a zero matrix would always produce a zero matrix.
  • For   the pseudoinverse is  .
Indeed,  , and thus  . Similarly,  , and thus  .
Note that   is neither injective nor surjective, and thus the pseudoinverse cannot be computed via   nor  , as   and   are both singular, and furthermore   is neither a left nor a right inverse.
Nonetheless, the pseudoinverse can be computed via SVD observing that  , and thus  .
  • For    
  • For    . The denominators are here  .
  • For    
  • For   the pseudoinverse is  .
For this matrix, the left inverse exists and thus equals  , indeed,  


Special cases edit

Scalars edit

It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar   is zero if   is zero and the reciprocal of   otherwise:

 

Vectors edit

The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

 

Diagonal matrices edit

The pseudoinverse of a squared diagonal matrix is obtained by taking the reciprocal of the nonzero diagonal elements. Formally, if   is a squared diagonal matrix with   and  , then  . More generally, if   is any   rectangular matrix whose only nonzero elements are on the diagonal, meaning  ,  , then   is a   rectangular matrix whose diagonal elements are the reciprocal of the original ones, that is,  .

Linearly independent columns edit

If the rank of   is identical to its column rank,  , (for  ,) there are   linearly independent columns, and   is invertible. In this case, an explicit formula is:[14]

 

It follows that   is then a left inverse of  :    .

Linearly independent rows edit

If the rank of   is identical to its row rank,  , (for  ,) there are   linearly independent rows, and   is invertible. In this case, an explicit formula is:

 

It follows that   is a right inverse of  :    .

Orthonormal columns or rows edit

This is a special case of either full column rank or full row rank (treated above). If   has orthonormal columns ( ) or orthonormal rows ( ), then:

 

Normal matrices edit

If   is normal, that is, it commutes with its conjugate transpose, then its pseudoinverse can be computed by diagonalizing it, mapping all nonzero eigenvalues to their inverses, and mapping zero eigenvalues to zero. A corollary is that   commuting with its transpose implies that it commutes with its pseudoinverse.

EP matrices edit

A (square) matrix   is said to be an EP matrix if it commutes with its pseudoinverse. In such cases (and only in such cases), it is possible to obtain the pseudoinverse as a polynomial in  . A polynomial   such that   can be easily obtained from the characteristic polynomial of   or, more generally, from any annihilating polynomial of  .[15]

Orthogonal projection matrices edit

This is a special case of a normal matrix with eigenvalues 0 and 1. If   is an orthogonal projection matrix, that is,   and  , then the pseudoinverse trivially coincides with the matrix itself:

 

Circulant matrices edit

For a circulant matrix  , the singular value decomposition is given by the Fourier transform, that is, the singular values are the Fourier coefficients. Let   be the Discrete Fourier Transform (DFT) matrix; then[16]

 

Construction edit

Rank decomposition edit

Let   denote the rank of  . Then   can be (rank) decomposed as   where   and   are of rank  . Then  .

The QR method edit

For   computing the product   or   and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the QR decomposition of   may be used instead.

Consider the case when   is of full column rank, so that  . Then the Cholesky decomposition  , where   is an upper triangular matrix, may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides,

 

which may be solved by forward substitution followed by back substitution.

The Cholesky decomposition may be computed without forming   explicitly, by alternatively using the QR decomposition of  , where   has orthonormal columns,  , and   is upper triangular. Then

 

so   is the Cholesky factor of  .

The case of full row rank is treated similarly by using the formula   and using a similar argument, swapping the roles of   and  .

Using polynomials in matrices edit

For an arbitrary  , one has that   is normal and, as a consequence, an EP matrix. One can then find a polynomial   such that  . In this case one has that the pseudoinverse of   is given by[15]

 

Singular value decomposition (SVD) edit

A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition.[14][5][17] If   is the singular value decomposition of  , then  . For a rectangular diagonal matrix such as  , we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place, and then transposing the matrix. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB or GNU Octave function pinv, the tolerance is taken to be t = ε⋅max(m, n)⋅max(Σ), where ε is the machine epsilon.

The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.

The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix   has a singular value 0 (a diagonal entry of the matrix   above), then modifying   slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.

Block matrices edit

Optimized approaches exist for calculating the pseudoinverse of block-structured matrices.

The iterative method of Ben-Israel and Cohen edit

Another method for computing the pseudoinverse (cf. Drazin inverse) uses the recursion

 

which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of   if it is started with an appropriate   satisfying  . The choice   (where  , with   denoting the largest singular value of  )[18] has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before   enters the region of quadratic convergence.[19] However, if started with   already close to the Moore–Penrose inverse and  , for example  , convergence is fast (quadratic).

Updating the pseudoinverse edit

For the cases where   has full row or column rank, and the inverse of the correlation matrix (  for   with full row rank or   for full column rank) is already known, the pseudoinverse for matrices related to   can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.[20][21]

Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.[22][23]

Software libraries edit

High-quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

The Python package NumPy provides a pseudoinverse calculation through its functions matrix.I and linalg.pinv; its pinv uses the SVD-based algorithm. SciPy adds a function scipy.linalg.pinv that uses a least-squares solver.

The MASS package for R provides a calculation of the Moore–Penrose inverse through the ginv function.[24] The ginv function calculates a pseudoinverse using the singular value decomposition provided by the svd function in the base R package. An alternative is to employ the pinv function available in the pracma package.

The Octave programming language provides a pseudoinverse through the standard package function pinv and the pseudo_inverse() method.

In Julia (programming language), the LinearAlgebra package of the standard library provides an implementation of the Moore–Penrose inverse pinv() implemented via singular-value decomposition.[25]

Applications edit

Linear least-squares edit

The pseudoinverse provides a least squares solution to a system of linear equations.[26] For  , given a system of linear equations

 

in general, a vector   that solves the system may not exist, or if one does exist, it may not be unique. More specifically, a solution exists if and only if   is in the image of  , and is unique if and only if   is injective. The pseudoinverse solves the "least-squares" problem as follows:

  •  , we have   where   and   denotes the Euclidean norm. This weak inequality holds with equality if and only if   for any vector  ; this provides an infinitude of minimizing solutions unless   has full column rank, in which case   is a zero matrix.[27] The solution with minimum Euclidean norm is  [27]

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let  .

  •  , we have   where   and   denotes the Frobenius norm.

Obtaining all solutions of a linear system edit

If the linear system

 

has any solutions, they are all given by[28]

 

for arbitrary vector  . Solution(s) exist if and only if  .[28] If the latter holds, then the solution is unique if and only if   has full column rank, in which case   is a zero matrix. If solutions exist but   does not have full column rank, then we have an indeterminate system, all of whose infinitude of solutions are given by this last equation.

Minimum norm solution to a linear system edit

For linear systems   with non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimum Euclidean norm   among all solutions.

  • If   is satisfiable, the vector   is a solution, and satisfies   for all solutions.

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let  .

  • If   is satisfiable, the matrix   is a solution, and satisfies   for all solutions.

Condition number edit

Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix:

 

A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of   can lead to huge errors in the entries of the solution.[29]

Generalizations edit

In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators   between two Hilbert spaces   and  , using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.[29] Those that do are precisely the ones whose range is closed in  .

A notion of pseudoinverse exists for matrices over an arbitrary field equipped with an arbitrary involutive automorphism. In this more general setting, a given matrix doesn't always have a pseudoinverse. The necessary and sufficient condition for a pseudoinverse to exist is that  , where   denotes the result of applying the involution operation to the transpose of  . When it does exist, it is unique.[30] Example: Consider the field of complex numbers equipped with the identity involution (as opposed to the involution considered elsewhere in the article); do there exist matrices that fail to have pseudoinverses in this sense? Consider the matrix  . Observe that   while  . So this matrix doesn't have a pseudoinverse in this sense.

In abstract algebra, a Moore–Penrose inverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.

See also edit

Notes edit

  1. ^
  2. ^ Moore, E. H. (1920). "On the reciprocal of the general algebraic matrix". Bulletin of the American Mathematical Society. 26 (9): 394–95. doi:10.1090/S0002-9904-1920-03322-7.
  3. ^ Bjerhammar, Arne (1951). "Application of calculus of matrices to method of least squares; with special references to geodetic calculations". Trans. Roy. Inst. Tech. Stockholm. 49.
  4. ^ a b Penrose, Roger (1955). "A generalized inverse for matrices". Proceedings of the Cambridge Philosophical Society. 51 (3): 406–13. Bibcode:1955PCPS...51..406P. doi:10.1017/S0305004100030401.
  5. ^ a b c d e Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258. ISBN 978-0-8018-5414-9.
  6. ^ Campbell & Meyer 1991.
  7. ^ a b c Stoer, Josef; Bulirsch, Roland (2002). Introduction to Numerical Analysis (3rd ed.). Berlin, New York: Springer-Verlag. ISBN 978-0-387-95452-3..
  8. ^ Greville, T. N. E. (1966-10-01). "Note on the Generalized Inverse of a Matrix Product". SIAM Review. 8 (4): 518–521. Bibcode:1966SIAMR...8..518G. doi:10.1137/1008107. ISSN 0036-1445.
  9. ^ Maciejewski, Anthony A.; Klein, Charles A. (1985). "Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments". International Journal of Robotics Research. 4 (3): 109–117. doi:10.1177/027836498500400308. hdl:10217/536. S2CID 17660144.
  10. ^ Rakočević, Vladimir (1997). "On continuity of the Moore–Penrose and Drazin inverses" (PDF). Matematički Vesnik. 49: 163–72.
  11. ^ Golub, G. H.; Pereyra, V. (April 1973). "The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate". SIAM Journal on Numerical Analysis. 10 (2): 413–32. Bibcode:1973SJNA...10..413G. doi:10.1137/0710036. JSTOR 2156365.
  12. ^ Hjørungnes, Are (2011). Complex-valued matrix derivatives: with applications in signal processing and communications. New York: Cambridge university press. p. 52. ISBN 9780521192644.
  13. ^ Liu, Shuangzhe; Trenkler, Götz; Kollo, Tõnu; von Rosen, Dietrich; Baksalary, Oskar Maria (2023). "Professor Heinz Neudecker and matrix differential calculus". Statistical Papers. doi:10.1007/s00362-023-01499-w.
  14. ^ a b Ben-Israel & Greville 2003.
  15. ^ a b Bajo, I. (2021). "Computing Moore–Penrose Inverses with Polynomials in Matrices". American Mathematical Monthly. 128 (5): 446–456. doi:10.1080/00029890.2021.1886840. hdl:11093/6146.
  16. ^ Stallings, W. T.; Boullion, T. L. (1972). "The Pseudoinverse of an r-Circulant Matrix". Proceedings of the American Mathematical Society. 34 (2): 385–88. doi:10.2307/2038377. JSTOR 2038377.
  17. ^ Linear Systems & Pseudo-Inverse
  18. ^ Ben-Israel, Adi; Cohen, Dan (1966). "On Iterative Computation of Generalized Inverses and Associated Projections". SIAM Journal on Numerical Analysis. 3 (3): 410–19. Bibcode:1966SJNA....3..410B. doi:10.1137/0703035. JSTOR 2949637.pdf
  19. ^ Söderström, Torsten; Stewart, G. W. (1974). "On the Numerical Properties of an Iterative Method for Computing the Moore–Penrose Generalized Inverse". SIAM Journal on Numerical Analysis. 11 (1): 61–74. Bibcode:1974SJNA...11...61S. doi:10.1137/0711008. JSTOR 2156431.
  20. ^ Gramß, Tino (1992). Worterkennung mit einem künstlichen neuronalen Netzwerk (PhD dissertation). Georg-August-Universität zu Göttingen. OCLC 841706164.
  21. ^ Emtiyaz, Mohammad (February 27, 2008). "Updating Inverse of a Matrix When a Column is Added/Removed" (PDF).
  22. ^ Meyer, Carl D. Jr. (1973). "Generalized inverses and ranks of block matrices". SIAM J. Appl. Math. 25 (4): 597–602. doi:10.1137/0125057.
  23. ^ Meyer, Carl D. Jr. (1973). "Generalized inversion of modified matrices". SIAM J. Appl. Math. 24 (3): 315–23. doi:10.1137/0124033.
  24. ^ "R: Generalized Inverse of a Matrix".
  25. ^ "LinearAlgebra.pinv".
  26. ^ Penrose, Roger (1956). "On best approximate solution of linear matrix equations". Proceedings of the Cambridge Philosophical Society. 52 (1): 17–19. Bibcode:1956PCPS...52...17P. doi:10.1017/S0305004100030929. S2CID 122260851.
  27. ^ a b Planitz, M. (October 1979). "Inconsistent systems of linear equations". Mathematical Gazette. 63 (425): 181–85. doi:10.2307/3617890. JSTOR 3617890. S2CID 125601192.
  28. ^ a b James, M. (June 1978). "The generalised inverse". Mathematical Gazette. 62 (420): 109–14. doi:10.1017/S0025557200086460. S2CID 126385532.
  29. ^ a b Hagen, Roland; Roch, Steffen; Silbermann, Bernd (2001). "Section 2.1.2". C*-algebras and Numerical Analysis. CRC Press.
  30. ^ Pearl, Martin H. (1968-10-01). "Generalized inverses of matrices with entries taken from an arbitrary field". Linear Algebra and Its Applications. 1 (4): 571–587. doi:10.1016/0024-3795(68)90028-1. ISSN 0024-3795.

References edit

External links edit