This post provides proofs for some properties of pseudoinverse, it's an appendix for my essay: Least Squares Comparisons of 4 Ways
These proofs assume you understand the four fundamental subspaces of a matrix.
For any m by n matrix $A$, its rank is r(r <= m and r <= n), its Singular Value Decomposition(SVC) is:
$$ A=U\Sigma V^T $$These matrices' shapes are:
of $A^T$;
the diagonal;
1. The key point of SVD is that we need two sets of singular vectors. one is n right singular vectors $v_1, ..., v_n$ are orthogonal in $R^n$, the other is m left singular vector $u_1, ..., u_n$ are perpendicular to each other in $R^m$. The connection between these them are through A:
$$ \begin{align*} Av_1&=\sigma _1u_1\\ &...\\ Av_r&=\sigma _ru_r\\ &...\\ Av_{r+1} &= 0\\ &...\\ Av_n&=0 \end{align*} $$2. We can write SVD in the full form:
$$ \begin{align*} A&=U\Sigma V^T\\ &=\begin{bmatrix} \vdots & & \vdots \\ u_1 & \cdots & u_m \\ \vdots & & \vdots \end{bmatrix} \begin{bmatrix} \sigma _1 \\ & \ddots \\ & & \sigma _r \\ & & & \ddots \\ & & & & 0 \end{bmatrix} \begin{bmatrix} \cdots & v_1^T & \cdots \\ & \vdots \\ \cdots & v_n^T & \cdots \end{bmatrix} \end{align*} $$3. Due to the zeros in $\Sigma$, only the first r columns of $U$ and $V$ contribute to matrix multiplication, We can have a reduced SVD:
$$ \begin{align*} A&=U_r\Sigma _rV_r^T \\ &=\begin{bmatrix} \vdots & & \vdots \\ u_1 & \cdots & u_r \\ \vdots & & \vdots \end{bmatrix} \begin{bmatrix} \sigma _1 \\ & \ddots \\ & & \sigma _r \end{bmatrix} \begin{bmatrix} \cdots & v_1^T & \cdots \\ & \vdots \\ \cdots & v_r^T & \cdots \end{bmatrix} \end{align*} $$4. Or we can write it into r piece of rank 1:
$$ A=U_r\Sigma _rV_r^T =\sigma _1u_1v_1^T + ... + \sigma _ru_rv_r^T $$The psudoinverse of $A$ is $A^+=V\Sigma U^T$, we want to prove $A^+Ax=x$ when $x$ is in the row space. Proof:
$$ A^+Ax=V\Sigma ^+UU^T\Sigma V^Tx=V\Sigma ^+\Sigma V^Tx\\ \begin{align*} \Sigma ^+\Sigma &= \begin{bmatrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1 \\ & & & \ddots \\ & & & & 0 \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{I_r} & 0 \\ 0 & 0 \end{bmatrix} \end{align*} $$If $x$ is in the row space, it's a linear combination of the first $r$ column of $V$:
$$ \begin{align*} x&=\alpha _1v_1 + ... + \alpha _2v_r \\ &=\begin{bmatrix} v_1 & \cdots & v_r \end{bmatrix} \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \end{bmatrix} \\ &= V \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix} \end{align*} $$Apply these to the original formula,
$$ \begin{align*} V\Sigma ^+\Sigma V^Tx&=V\Sigma ^+\Sigma V^TV \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix} =V\Sigma ^+\Sigma \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix}\\ &=\begin{bmatrix}v_1 & \cdots & v_r & \cdots & v_n \end{bmatrix} \begin{bmatrix} \mathbf{I_r} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix}\\ &=\begin{bmatrix}v_1 & \cdots & v_r \end{bmatrix} \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \end{bmatrix}\\ &=x \end{align*} $$At last, we prove it $A^+Ax=x$ when x is in the row space, in that case, pseudoinverse is more like a left inverse of $A$.
The psudoinverse of $A$ is $A^+=V\Sigma U^T$, we want to prove $A^+Ax=x$ when
$x$ is in the row space.
Proof:
$z$ is in the nullspace of $A^T$, it's perpendicular to the first r columns
of $U$ and it's a combination of the last m-r
columns of $U$:
For a unsolvable equation $Ax=b$, the least squares try to minimize the error $||b-Ax||$, we want to prove $x^+=A^+b$ is the minimum solution.
$U^T$ is a orthogonal matrix, multiply by it doesn't change length.
$$ ||b-Ax||^2 = ||b-U\Sigma V^Tx||^2 = ||U^Tb-\Sigma V^Tx||^2 $$Set $\omega = V^Tx$ $z=U^Tb$
$$ ||b-Ax||^2=||z-\Sigma \omega||^2 =||\begin{bmatrix} z_1 \\ \vdots \\ z_r \\z_{r+1} \\ \vdots \\ z_m \end{bmatrix}- \begin{bmatrix} \sigma _1\omega_1 \\ \vdots \\ \sigma _r\omega_r \\ 0 \\ \vdots \\ 0 \end{bmatrix}||^2 $$So the minimum is:
$$ \text{Minimum } ||b-AX||^2 = ||\begin{bmatrix} 0 \\ \vdots \\ z_{r+1} \\ \vdots \\ z_m \end{bmatrix} ||^2 \\ When \quad \omega = \Sigma ^+U^Tb \\ \quad So \quad x = V\Sigma ^+U^Tb = A^+b $$
If there are non-zero vectors z
in the nullspace of A
, they can be added to
$x^+$, the error $||b-A(x^+ + z)||$ kept unchanged when $Az=0$, But the length
$||x^+ + z||^2$ will grow to $||x^+||^2 + ||z||^2$.
That's because $x^+$ is perpendicular to vectors in nullspace.
Proof:
$$ z^Tx^+=z^TA^+b=z^TV\Sigma ^+U^Tb $$Since z is in the nullspace, it's a linear combination of the last $n-r$ column of $V$
$$ \begin{align*} z&=\begin{bmatrix}v_1 & \cdots & v_n \end{bmatrix} \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \cdots \\ \alpha _n \end{bmatrix} =V \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \vdots \\ \alpha _n \end{bmatrix} \\ z^T&=\begin{bmatrix} 0 & \cdots & \alpha _{r+1} & \cdots & \alpha _n \end{bmatrix} V^T \\ z^Tx^+&= \begin{bmatrix} 0 & \cdots & \alpha _{r+1} & \cdots & \alpha _m \end{bmatrix} V^T V\Sigma ^+U^Tb \\ &= \begin{bmatrix} 0 & \cdots & \alpha _{r+1} & \cdots & \alpha _m \end{bmatrix} \Sigma ^+U^Tb \\ &=0U^Tb \\ &=0 \end{align*} $$$Z^Tx^+=0$ indicates they are perpendicular to each other.
Written by Songziyu @China Sep. 2023