Pseudoinverse Properties and Proof

1. SVD shape and 4 different Forms
2. $A^+Ax=x$ When $x$ is in the Row Space
3. $A^+z=0$ when $z$ is in the nullspace of $A^T$
4. Pseudoinverse is the Minimum Norm Solution to Least Square
4.1 Pseudoinverse is a solution
4.2 Pseudoinverse is the Minimum norm solution

This post provides proofs for some properties of pseudoinverse, it's an appendix for my essay: Least Squares Comparisons of 4 Ways

These proofs assume you understand the four fundamental subspaces of a matrix.

1. SVD shape and 4 different Forms

For any m by n matrix $A$, its rank is r(r <= m and r <= n), its Singular Value Decomposition(SVC) is:

$$ A=U\Sigma V^T $$

These matrices' shapes are:

$U$ is m by m matrix and made up of vectors from column space and null space

of $A^T$;

$\Sigma$ is m by n diagonal matrix with r non-zero element(singular value) in

the diagonal;

$V$ is n by n matrix and made up of vectors from row space and null space of $A$.

1. The key point of SVD is that we need two sets of singular vectors. one is n right singular vectors $v_1, ..., v_n$ are orthogonal in $R^n$, the other is m left singular vector $u_1, ..., u_n$ are perpendicular to each other in $R^m$. The connection between these them are through A:

$$ \begin{align*} Av_1&=\sigma _1u_1\\ &...\\ Av_r&=\sigma _ru_r\\ &...\\ Av_{r+1} &= 0\\ &...\\ Av_n&=0 \end{align*} $$

2. We can write SVD in the full form:

$$ \begin{align*} A&=U\Sigma V^T\\ &=\begin{bmatrix} \vdots & & \vdots \\ u_1 & \cdots & u_m \\ \vdots & & \vdots \end{bmatrix} \begin{bmatrix} \sigma _1 \\ & \ddots \\ & & \sigma _r \\ & & & \ddots \\ & & & & 0 \end{bmatrix} \begin{bmatrix} \cdots & v_1^T & \cdots \\ & \vdots \\ \cdots & v_n^T & \cdots \end{bmatrix} \end{align*} $$

3. Due to the zeros in $\Sigma$, only the first r columns of $U$ and $V$ contribute to matrix multiplication, We can have a reduced SVD:

$$ \begin{align*} A&=U_r\Sigma _rV_r^T \\ &=\begin{bmatrix} \vdots & & \vdots \\ u_1 & \cdots & u_r \\ \vdots & & \vdots \end{bmatrix} \begin{bmatrix} \sigma _1 \\ & \ddots \\ & & \sigma _r \end{bmatrix} \begin{bmatrix} \cdots & v_1^T & \cdots \\ & \vdots \\ \cdots & v_r^T & \cdots \end{bmatrix} \end{align*} $$

4. Or we can write it into r piece of rank 1:

$$ A=U_r\Sigma _rV_r^T =\sigma _1u_1v_1^T + ... + \sigma _ru_rv_r^T $$

2. $A^+Ax=x$ When $x$ is in the Row Space

The psudoinverse of $A$ is $A^+=V\Sigma U^T$, we want to prove $A^+Ax=x$ when $x$ is in the row space.
Proof:

$$ A^+Ax=V\Sigma ^+UU^T\Sigma V^Tx=V\Sigma ^+\Sigma V^Tx\\ \begin{align*} \Sigma ^+\Sigma &= \begin{bmatrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1 \\ & & & \ddots \\ & & & & 0 \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{I_r} & 0 \\ 0 & 0 \end{bmatrix} \end{align*} $$

If $x$ is in the row space, it's a linear combination of the first $r$ column of $V$:

$$ \begin{align*} x&=\alpha _1v_1 + ... + \alpha _2v_r \\ &=\begin{bmatrix} v_1 & \cdots & v_r \end{bmatrix} \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \end{bmatrix} \\ &= V \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix} \end{align*} $$

Apply these to the original formula,

$$ \begin{align*} V\Sigma ^+\Sigma V^Tx&=V\Sigma ^+\Sigma V^TV \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix} =V\Sigma ^+\Sigma \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix}\\ &=\begin{bmatrix}v_1 & \cdots & v_r & \cdots & v_n \end{bmatrix} \begin{bmatrix} \mathbf{I_r} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \\ \vdots \\ 0 \end{bmatrix}\\ &=\begin{bmatrix}v_1 & \cdots & v_r \end{bmatrix} \begin{bmatrix} \alpha _1 \\ \vdots \\ \alpha _r \end{bmatrix}\\ &=x \end{align*} $$

At last, we prove it $A^+Ax=x$ when x is in the row space, in that case, pseudoinverse is more like a left inverse of $A$.

3. $A^+z=0$ when $z$ is in the nullspace of $A^T$

The psudoinverse of $A$ is $A^+=V\Sigma U^T$, we want to prove $A^+Ax=x$ when $x$ is in the row space.
Proof: $z$ is in the nullspace of $A^T$, it's perpendicular to the first r columns of $U$ and it's a combination of the last m-r columns of $U$:

$$ \begin{align*} z&=\alpha _{r+1}u_{r+1} + ... + \alpha _mu_m\\ &=\begin{bmatrix} u_{r+1} & \cdots u_m\end{bmatrix} \begin{bmatrix} \alpha _{r+1} \\ \vdots \\ \alpha _m \end{bmatrix}\\ &= U \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \vdots \\ \alpha _m \end{bmatrix}\\ A^+z&=V\Sigma ^+U^Tz\\ &=V\Sigma ^+ U^TU \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \vdots \\ \alpha _m \end{bmatrix}\\ &=V \begin{bmatrix} \sigma _1 \\ & \ddots \\ & & \sigma _r \\ & & & \ddots \\ & & & & 0 \end{bmatrix} \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \vdots \\ \alpha _m \end{bmatrix} \\ &=V0 \\ &=0 \end{align*} $$

4. Pseudoinverse is the Minimum Norm Solution to Least Square

For a unsolvable equation $Ax=b$, the least squares try to minimize the error $||b-Ax||$, we want to prove $x^+=A^+b$ is the minimum solution.

4.1 Pseudoinverse is a solution

$U^T$ is a orthogonal matrix, multiply by it doesn't change length.

$$ ||b-Ax||^2 = ||b-U\Sigma V^Tx||^2 = ||U^Tb-\Sigma V^Tx||^2 $$

Set $\omega = V^Tx$ $z=U^Tb$

$$ ||b-Ax||^2=||z-\Sigma \omega||^2 =||\begin{bmatrix} z_1 \\ \vdots \\ z_r \\z_{r+1} \\ \vdots \\ z_m \end{bmatrix}- \begin{bmatrix} \sigma _1\omega_1 \\ \vdots \\ \sigma _r\omega_r \\ 0 \\ \vdots \\ 0 \end{bmatrix}||^2 $$

So the minimum is:

$$ \text{Minimum } ||b-AX||^2 = ||\begin{bmatrix} 0 \\ \vdots \\ z_{r+1} \\ \vdots \\ z_m \end{bmatrix} ||^2 \\ When \quad \omega = \Sigma ^+U^Tb \\ \quad So \quad x = V\Sigma ^+U^Tb = A^+b $$

4.2 Pseudoinverse is the Minimum norm solution

If there are non-zero vectors z in the nullspace of A, they can be added to $x^+$, the error $||b-A(x^+ + z)||$ kept unchanged when $Az=0$, But the length $||x^+ + z||^2$ will grow to $||x^+||^2 + ||z||^2$.

That's because $x^+$ is perpendicular to vectors in nullspace.

Proof:

$$ z^Tx^+=z^TA^+b=z^TV\Sigma ^+U^Tb $$

Since z is in the nullspace, it's a linear combination of the last $n-r$ column of $V$

$$ \begin{align*} z&=\begin{bmatrix}v_1 & \cdots & v_n \end{bmatrix} \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \cdots \\ \alpha _n \end{bmatrix} =V \begin{bmatrix} 0 \\ \vdots \\ \alpha _{r+1} \\ \vdots \\ \alpha _n \end{bmatrix} \\ z^T&=\begin{bmatrix} 0 & \cdots & \alpha _{r+1} & \cdots & \alpha _n \end{bmatrix} V^T \\ z^Tx^+&= \begin{bmatrix} 0 & \cdots & \alpha _{r+1} & \cdots & \alpha _m \end{bmatrix} V^T V\Sigma ^+U^Tb \\ &= \begin{bmatrix} 0 & \cdots & \alpha _{r+1} & \cdots & \alpha _m \end{bmatrix} \Sigma ^+U^Tb \\ &=0U^Tb \\ &=0 \end{align*} $$

$Z^Tx^+=0$ indicates they are perpendicular to each other.

Written by Songziyu @China Sep. 2023