This post prove Normal Equations from very basic matrix calculus, it's an appendix for my essay: Least Squares Comparisons of 4 Ways
$x$, $y$ are column vectors, $z$ is a real number, suppose $z$ is the inner product of $x$ and $y$, namely $z=y^Tx$.
$$ x=\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\x_n \end{bmatrix} y=\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\y_n \end{bmatrix} \\ z = y^Tx = y_1x_1 + y_2+x_2 + ... + y_nx_n $$The derivatives for vectors are partial derivative of elements in the vector:
$$ \frac{dz}{dx} = \begin{bmatrix} \frac{\partial z}{\partial x_1}\\ \frac{\partial z}{\partial x_2} \\ \vdots \\ \frac{\partial z}{\partial x_n} \\ \end{bmatrix} $$It's easy to get
$$ \frac{\partial z}{\partial x_1} = y_1 \\ \frac{\partial z}{\partial x_2} = y_2 \\ ... $$We get first derivative formula for vectors inner product:
$$ \mathbf{ \frac{d}{dx}(y^Tx) = y \text{......(1-1)} } $$Since $x^Ty=y^x$, we have the other formula:
$$ \mathbf{ \frac{d}{dx}(x^Ty) = y \text{......(1-2)} } $$If $x$ is a n by 1 column vector, $A$ is symmetric n by n Matrix, $y=x^TAx$, what's $\frac{dy}{dx}$?
It's not easy to see the answer. Let's say n equals 3 for example.
$$ \begin{align*} y&=\begin{bmatrix}x_1 & x_2 & x_3 \end{bmatrix} \begin{bmatrix}a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} &a_{23} \\ a_{31} &a_{32} &a_{33} \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ &=\begin{bmatrix}x_1 & x_2 & x_3 \end{bmatrix} \begin{bmatrix}a_{11}x_1 + a_{12}x_2 + a_{13}x_3 \\ a_{21}x_1 + a_{22}x_2 +a_{23}x_3 \\ a_{31}x_1 +a_{32}x_2 +a_{33}x_3 \end{bmatrix} \\ &= a_{11}x_1^2 + a_{12}x_1x_2 + a_{13}x_1x_3 \\ & + a_{21}x2x_1 + a_{22}x_2^2 +a_{23}x_2x_3 \\ & + a_{31}x_3x_1 +a_{32}x_3x_2 +a_{33}x_3^2 \\ \end{align*} $$Then we can get the partial derivative:
$$ \frac{dy}{dx_1} = 2a_{11}x1 + a_{12}x_2 + a_{13}x_3 + a_{21}x_2 + a{31}x_3 \\ \text{Since A is symmetric, so } a_{12} = a_{21} \ a_{13}=a_{31} \\ \frac{dy}{dx_1} = 2(a_{11}x1 + a_{12}x_2 + a_{13}x_3) \\ =2\begin{bmatrix}a_{11} & a_{12} & a_{13} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ $$Similarly, we can get:
$$ \frac{dy}{dx_2} =2\begin{bmatrix}a_{21} & a_{22} & a_{23} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ \frac{dy}{dx_3} =2\begin{bmatrix}a_{31} & a_{32} & a_{33} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \\ $$Put it together we get the final answer:
$$ \mathbf{ \frac{d}{dx}(x^TAx) = 2Ax \text{......(2)} } $$For a unsolvable equation $Ax=b$, the least squares try to minimize the error $||b-Ax||$ to get the best possible $\hat{x}$. we can resolve the minimize problem by derivatives:
$$ \begin{align*} ||b-Ax||^2 &= (b-Ax)^T(b-Ax)\\ &= b^Tb-b^TAx-x^TA^Tb+x^TA^TAx \end{align*} $$$b^Tb$ is a constant number, so its derivatives to x is zero.
Apply formula(1-1) to $b^TAx$ and formula(1-2) to $x^TA^Tb$, we can get
$$ \frac{d}{dx}(b^TAx) = (b^TA)^T = A^Tb \\ \frac{d}{dx}(x^TA^Tb) = A^Tb $$$A^TA$ is symmetric matrix, apply formula(2) to $x^TA^TAx$, we can get
$$ \frac{d}{dx}(x^TA^TAx) = 2A^TAx $$Put these derivatives together, we have:
$$ \frac{d}{dx}(||b-Ax||^2) = -2A^Tb + 2A^TAx $$We set the derivatives to zero and end up with the so called Normal Equations:
$$ \mathbf{ A^TA\hat{x}=A^Tb \text{......(3)} } $$Written by Songziyu @China Sep. 2023