Department of Mathematics
The Chinese University of Hong Kong
HSMMC Math Modelling Advanced Training Workshop
May 16, 2026
The least-squares method is the most intuitive way to obtain the best fit curve representing the data patterns.
Let’s find out how we can get the best possible solution using the method of least squares (curve fitting).
Consider 4 points in the \(xu\)-plane (2-dimensional data): \[ (0,1),\ (1,3),\ (2,4),\ (3,4) \]
We can see the left graph with \(x\) and \(u\) in the below figure. We can make the relationship between \(x\) and \(u\) by drawing a straight line. It is quite intuitive.
The least-squares method can be used to find out a straight line that best represents the given data. The straight line obtained is called the least-squares line or the best fit line.
import numpy as np
import matplotlib.pyplot as plt
# Define the original data points (exact matches LaTeX coordinates)
x = [0, 1, 2, 3]
u = [1, 3, 4, 4]
# Generate x values for plotting the least-squares line
x_fit = np.linspace(-0.5, 3.5, 100)
# Define the least-squares line equation: u = x + 1.5
u_fit = x_fit + 1.5
# Create a figure with size matching LaTeX plot (8cm x 6cm)
plt.figure(figsize=(8, 6))
# Plot red solid data points
plt.scatter(x, u, color='red', s=60, zorder=3)
# Plot the blue thick least-squares fitting line
plt.plot(x_fit, u_fit, color='blue', linewidth=2)
# Draw centered x and y axes (matching TikZ axis style)
plt.axhline(0, color='black', linewidth=1)
plt.axvline(0, color='black', linewidth=1)
# Set axis labels with math notation
plt.xlabel(r'$x$', fontsize=12)
plt.ylabel(r'$u$', fontsize=12)
# Set axis limits exactly as in LaTeX
plt.xlim(-0.5, 3.5)## (-0.5, 3.5)
## (0.0, 5.0)
## (array([-0.5, 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5]), [Text(-0.5, 0, '−0.5'), Text(0.0, 0, '0.0'), Text(0.5, 0, '0.5'), Text(1.0, 0, '1.0'), Text(1.5, 0, '1.5'), Text(2.0, 0, '2.0'), Text(2.5, 0, '2.5'), Text(3.0, 0, '3.0'), Text(3.5, 0, '3.5')])
## (array([0., 1., 2., 3., 4., 5.]), [Text(0, 0.0, '0'), Text(0, 1.0, '1'), Text(0, 2.0, '2'), Text(0, 3.0, '3'), Text(0, 4.0, '4'), Text(0, 5.0, '5')])
# Set figure title (matches LaTeX caption)
plt.title("Figure 1: Data points and least-squares line", fontsize=12)
# Adjust layout for clean display
plt.tight_layout()
# Show the final plot
plt.show()Let’s consider the simplest case of solving the best possible straight line \(u = a + bx\) to describe the given data \((x_i, u_i)\). It is finding the linear function that best fits \((x_i, u_i)\). The ideal situation is to find the \(u\)-intercept \(a\) and slope \(b\) that satisfies \(u_i = a + bx_i\) for all data \((x_i, u_i)\).
We can find out the solution using a system of linear equations in matrix form with two unknowns \(a\) and \(b\) as following.
Formulating the least squares problem
\[ \begin{array}{cccccccccccc} \hline \textrm{Data} ~(x, u) & \textrm{Linear function} u = a + bx & \textrm{System of linear equations} & \textrm{Matrix Form} \\ \hline (0, 1) & 1 = a + b \cdot 0 & & \\ (1, 3) & 3 = a + b \cdot 1 & \begin{cases} a = 1 \\ a + b = 3 \\ a + 2b = 4 \\ a + 3b = 4 \end{cases} & A\mathbf{u} = \mathbf{y} \\ (2, 4) & 4 = a + b \cdot 2 & & \textrm{where} ~A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}, ~\mathbf{u} = \begin{bmatrix} a \\ b \end{bmatrix}, ~\mathbf{y} = \begin{bmatrix} 1 \\ 3 \\ 4 \\ 4 \end{bmatrix} \\ (3, 4) & 4 = a + b \cdot 3 & & \\ \hline \end{array} \]
When we have only two data points, then it will be easy to find \(a\) and \(b\) for \(u = a + bx\). If we have more than two data points, this modeling requires us to use a linear system of equations. We usually use a larger number of data to find the best possible curve fitting.
As a result, we end up with a larger number of equations than the number of unknowns. In this case, we do not expect \(A\mathbf{u} = \mathbf{y}\) to have a (unique) solution \(\mathbf{u}\).
So, we try to find an approximate \(\hat{\mathbf{u}}\) that minimizes the distance between \(A\mathbf{u}\) and \(\mathbf{y}\), \[ \min_{\mathbf{u}} \| A\mathbf{u} - \mathbf{y} \| \]
We will try to obtain a straight line with the least error even though it does not pass all four points. Mathematically, it means \(\| A\mathbf{u} - \mathbf{y} \| = 0\).
This problem is called least squares problem and \(\hat{\mathbf{u}}\) is called the optimal solution (or the least square solution). Although \(\hat{\mathbf{u}} \approx \mathbf{y}\), even though \(\hat{\mathbf{u}}\) might not satisfy \(A\mathbf{u} = \mathbf{y}\).
Let \(\hat{y}_i\) be the value obtained by inputting \(x_i\) into \(y = a + bx\) from each data point \((x_i, y_i)\). There exists an error for some \(i\) when \(y_i\) and \(\hat{y}_i\) are not the same.
If \(y_i\) and \(\hat{y}_i\) are the same for all \(i\), then the line \(y = a + bx\) has a unique solution. Since there are cases where \(y_i\) and \(\hat{y}_i\) are not the same, in other words, \((y_i - \hat{y}_i)^2\) is not all zero, we will try to find \(a\) and \(b\) that minimize the error. Adding all the squared errors for all the given data gives the following error function \(E(\mathbf{u})\).
\[ E(\mathbf{u}) = E(a,b) = (a-1)^2 + (a+b-3)^2 + (a+2b-4)^2 + (a+3b-4)^2 = \|A\mathbf{u} - \mathbf{y}\|^2 \] \[ \min E(\mathbf{u}) \]
The error \(E(\mathbf{u})\) is eventually equal to the square of the distance between \(A\mathbf{u}\) and \(\mathbf{y}\). It is easy to see how a norm and an inner product are related to errors.
Solving the least-squares problem is solving a problem of finding \(\hat{\mathbf{u}}\) that minimizes the error function \(\min E(\mathbf{u})\). The optimal solution to this problem is the least-squares solution \(\hat{\mathbf{u}}\). To find the least-squares solution, we need the concept of projection.
\[ \begin{array}{cccccccccccc} \hline \textbf{Component} & \textbf{Definition} \\ \hline \textrm{Data} & (0,1),\ (1,3),\ (2,4),\ (3,4) \\ \textrm{Model} & y = a + bx \\ \textrm{Predicted value} & \hat{y}_i = a + bx_i \\ \textrm{Error term} & (y_i - \hat{y}_i)^2 \\ \textrm{Error function} & E(a,b) = \sum_{i=1}^4 (y_i - (a+bx_i))^2 \\ \textrm{Matrix form} & E(\mathbf{u}) = \|A\mathbf{u} - \mathbf{y}\|^2 , \textrm{where}~ A = \begin{bmatrix}1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3\end{bmatrix} , \mathbf{u} = \begin{bmatrix}a \\ b\end{bmatrix} , \mathbf{y} = \begin{bmatrix}1 \\ 3 \\ 4 \\ 4\end{bmatrix} \\ \textrm{Least-squares problem} & \displaystyle \min_{\mathbf{u}} E(\mathbf{u}) \\ \textrm{Optimal solution} & \textrm{Least-squares solution} ~\hat{\mathbf{u}} \\ \hline \end{array} \]
To understand the least-squares problem, we need to know about a projection. Let’s consider the problem of finding \(t\) satisfying the following: \[ \min \| t\mathbf{a} - \mathbf{x} \| \] where \(t\) is a real number.
In other words, this is the problem of finding \(t\) that minimizes the distance between the vector \(\mathbf{x}\) and the straight line containing \(\mathbf{a}\), where \(t\) is the scalar.
import matplotlib.pyplot as plt
import numpy as np
# ==============================================
# Set up the figure and axis
# ==============================================
fig, ax = plt.subplots(figsize=(8, 5))
# Hide all axis lines, ticks, and labels to match the clean diagram style
ax.set_xlim(-0.2, 3.2)## (-0.2, 3.2)
## (-0.2, 2.2)
## []
## []
for spine in ax.spines.values():
spine.set_visible(False)
# ==============================================
# Define key coordinates matching the diagram
# ==============================================
start_ta = (0, 0) # Start point of the ta segment
proj_point = (1.5, 0) # Foot of the perpendicular (end of ta)
top_point = (1.5, 2) # Top vertex of the triangle
end_a = (3, 0) # End point of the full vector a line
# ==============================================
# Draw the horizontal line (vector a) segments
# ==============================================
# 1. Black segment: ta (with arrow)
# FIX: Replaced incompatible quiver() with annotate() for cross-version compatibility
ax.annotate(
'', # Empty text (we only want the arrow)
xy=proj_point, # Arrow end point
xytext=start_ta, # Arrow start point
arrowprops=dict(
arrowstyle='->', # Simple arrow head
color='black',
linewidth=2
)
)
# 2. Blue segment: remaining part of vector a
ax.plot([proj_point[0], end_a[0]], [proj_point[1], end_a[1]],
color='blue', linewidth=2)
# ==============================================
# Draw vector x (purple line)
# ==============================================
ax.plot([start_ta[0], top_point[0]], [start_ta[1], top_point[1]],
color='purple', linewidth=2)
# ==============================================
# Draw the perpendicular dashed line (minimum distance)
# ==============================================
ax.plot([proj_point[0], top_point[0]], [proj_point[1], top_point[1]],
color='gray', linestyle='--', linewidth=1.5)
# Add right-angle symbol at the foot of the perpendicular
right_angle_size = 0.1
ax.plot([proj_point[0], proj_point[0]+right_angle_size],
[proj_point[1], proj_point[1]], color='gray', linewidth=1)
ax.plot([proj_point[0]+right_angle_size, proj_point[0]+right_angle_size],
[proj_point[1], proj_point[1]+right_angle_size], color='gray', linewidth=1)
# ==============================================
# Draw the faint gray lines for other t values
# (matches the multiple lines from the top point in your diagram)
# ==============================================
other_t_points = [0.5, 1.0, 2.0, 2.5] # Other points on the a line
for t_x in other_t_points:
ax.plot([t_x, top_point[0]], [0, top_point[1]],
color='gray', alpha=0.4, linewidth=1)
# ==============================================
# Add labels matching the diagram
# ==============================================
# Label for vector x (purple)
ax.text(0.4, 1.0, r'$x$', fontsize=18, color='purple')
# Label for ta (below the black segment)
ax.text(0.6, -0.15, r'$ta$', fontsize=16, color='black')
# Label for vector a (at the end of the blue segment)
ax.text(2.6, -0.15, r'$a$', fontsize=16, color='blue')
# Label for the distance ||ta - x||
ax.text(1.8, 1.0, r'$\| ta - x \|$', fontsize=16, color='black')
# --------------------------
# FINAL FIGURE SETTINGS
# --------------------------
# Add overall caption (matches LaTeX figure title)
fig.suptitle('Figure 2a: Projection of a vector onto a line', fontsize=12)
plt.tight_layout() # Adjust layout for clean display
plt.show() # Display the plotimport matplotlib.pyplot as plt
import numpy as np
# ==============================================
# Initialize figure and clean up axis
# ==============================================
fig, ax = plt.subplots(figsize=(8, 5))
# Hide all axis lines/ticks to match the clean diagram style
ax.set_xlim(-0.2, 3.2)## (-0.2, 3.2)
## (-0.2, 2.2)
## []
## []
for spine in ax.spines.values():
spine.set_visible(False)
# ==============================================
# Define key coordinates matching your diagram
# ==============================================
left_p = (0, 0) # Left end of the p segment
proj_point = (1.5, 0) # Right end of p / foot of the perpendicular
top_point = (1.5, 2) # Top vertex of the triangle
end_a = (3, 0) # Right end of the full a vector
# ==============================================
# Draw horizontal line segments
# ==============================================
# 1. Red segment: p (base of the projection)
ax.plot([left_p[0], proj_point[0]], [left_p[1], proj_point[1]],
color='red', linewidth=3)
# 2. Blue segment: a (remaining part of the vector line, with arrow)
ax.annotate(
'',
xy=end_a, xycoords='data',
xytext=proj_point, textcoords='data',
arrowprops=dict(arrowstyle='->', color='blue', linewidth=2)
)
# 3. Black arrow for ta (on top of the p segment)
ax.annotate(
'',
xy=proj_point, xycoords='data',
xytext=left_p, textcoords='data',
arrowprops=dict(arrowstyle='->', color='black', linewidth=2)
)
# ==============================================
# Draw vector x (purple line)
# ==============================================
ax.plot([left_p[0], top_point[0]], [left_p[1], top_point[1]],
color='purple', linewidth=2)
# ==============================================
# Draw perpendicular dashed line + right-angle symbol
# ==============================================
# Dashed perpendicular line (minimum distance)
ax.plot([proj_point[0], top_point[0]], [proj_point[1], top_point[1]],
color='gray', linestyle='--', linewidth=1.5)
# Small right-angle marker at the foot of the perpendicular
right_angle_size = 0.1
ax.plot([proj_point[0], proj_point[0]+right_angle_size],
[proj_point[1], proj_point[1]], color='gray', linewidth=1)
ax.plot([proj_point[0]+right_angle_size, proj_point[0]+right_angle_size],
[proj_point[1], proj_point[1]+right_angle_size], color='gray', linewidth=1)
# ==============================================
# Draw faint gray lines for other t values
# (matches the multiple lines from the top point in your diagram)
# ==============================================
other_t_points = [0.8, 1.2, 2.2, 2.6] # Other points on the horizontal line
for t_x in other_t_points:
ax.plot([t_x, top_point[0]], [0, top_point[1]],
color='gray', alpha=0.4, linewidth=1)
# ==============================================
# Add labels matching your diagram (with matching colors)
# ==============================================
# Label for vector x (purple)
ax.text(0.3, 1.0, r'$x$', fontsize=18, color='purple')
# Label for ta (black, above the red segment)
ax.text(0.6, 0.1, r'$ta$', fontsize=16, color='black')
# Label for p (red, below the red segment)
ax.text(0.6, -0.15, r'$p$', fontsize=16, color='red')
# Label for vector a (blue, below the blue segment)
ax.text(2.1, -0.15, r'$a$', fontsize=16, color='blue')
# Label for the distance ||ta - x||
ax.text(1.8, 1.0, r'$\| ta - x \|$', fontsize=16, color='black')
# --------------------------
# FINAL FIGURE SETTINGS
# --------------------------
# Add overall caption (matches LaTeX figure title)
fig.suptitle('Figure 2b: Projection of a vector onto a line', fontsize=12)
plt.tight_layout() # Adjust layout for clean display
plt.show() # Display the plotWe can see that \(\| t\mathbf{a} - \mathbf{x} \|\) represents the distance between \(t\mathbf{a}\) and \(\mathbf{x}\) as shown in Figure 2. Intuitively, the shortest distance can be obtained when \(\mathbf{p} = t\mathbf{a}\) and \((\mathbf{x} - t\mathbf{a}) \perp \mathbf{a}\). Such \(t\) is a solution for \(\min \| t\mathbf{a} - \mathbf{x} \|\), and the vector \(\mathbf{p} = t\mathbf{a}\) is called the projection of \(\mathbf{x}\) onto \(\mathbf{a}\).
Since \((\mathbf{x} - t\mathbf{a}) \perp \mathbf{a}\), this \(t\) can be obtained as follows: \[ \mathbf{a} \cdot (\mathbf{x} - t\mathbf{a}) = 0 \implies t(\mathbf{a} \cdot \mathbf{a}) = \mathbf{a} \cdot \mathbf{x} \implies t = \frac{\mathbf{a} \cdot \mathbf{x}}{\mathbf{a} \cdot \mathbf{a}} = (\mathbf{a}^T\mathbf{a})^{-1}\mathbf{a}^T\mathbf{x} \]
Find the projection of \(\mathbf{y}\) onto \(\mathbf{x}\), for \(\mathbf{x} = (2, -1, 3)\), \(\mathbf{y} = (4, -1, 2)\).
Solution
# Import NumPy library for numerical computations and vector operations
import numpy as np
# Define the original vectors x and y (given in the example)
x = np.array([2, -1, 3]) # Vector x = (2, -1, 3)
y = np.array([4, -1, 2]) # Vector y = (4, -1, 2)
# Calculate the inner (dot) product of y and x: y · x
yx = np.inner(y, x)
# Calculate the inner (dot) product of x with itself: x · x
xx = np.inner(x, x)
# Compute the projection vector p of y onto x
# Formula: p = ( (y·x)/(x·x) ) * x
p = (yx / xx) * x
# Print the final projection result
print("Projection vector p of y onto x =", p)## Projection vector p of y onto x = [ 2.14285714 -1.07142857 3.21428571]
We can solve the least-squares problem \(\min \| A\mathbf{u} - \mathbf{y} \|\) similar to the problem of determining \(t\) that minimizes the difference between the projection (of \(\mathbf{x}\) onto \(\mathbf{a}\)) and the vector \(\mathbf{x}\). For this purpose, let \(A_1, A_2\) be the first and second column vector of \(A\), respectively. Then, we have \(A\mathbf{u} = aA_1 + bA_2\). \[ \min \| A\mathbf{u} - \mathbf{y} \| = \min \| (aA_1 + bA_2) - \mathbf{y} \| \]
Hence, the least-squares problem is related to the column space of column vectors of \(A\). The least-square problem can be interpreted as a problem of finding a projection.
The plane \(A\mathbf{u} = aA_1 + bA_2\) is a set of images given by \(A\mathbf{x}\). As shown in the figure, it can be understood as the problem of obtaining \(a\) and \(b\) that gives the minimal distance between the plane containing \(A_1\) and \(A_2\) and the vector \(\mathbf{y}\).
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# --------------------------
# Set up the 3D plot
# --------------------------
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
# Set viewing angle to match the diagram perspective
ax.view_init(elev=25, azim=45)
# Hide axis ticks and labels (clean diagram look)
ax.set_xticks([])## []
## []
## []
ax.set_axis_off()
# --------------------------
# Draw the plane (column space)
# --------------------------
# Define plane corner coordinates (z=0 plane)
plane_corners = np.array([
[-1, -1, 0],
[ 2, -1, 0],
[ 2, 1, 0],
[-1, 1, 0]
])
# Plot semi-transparent gray plane
ax.plot_trisurf(plane_corners[:,0], plane_corners[:,1], plane_corners[:,2],
color='gray', alpha=0.3)
# --------------------------
# Define and draw vectors
# --------------------------
# Vectors start at the origin (0,0,0)
origin = np.array([0, 0, 0])
# Vector A1 (blue, basis vector 1)
A1 = np.array([1, -0.5, 0])
ax.quiver(*origin, *A1, color='blue', linewidth=2, arrow_length_ratio=0.1)
ax.text(A1[0], A1[1], A1[2], r'$A_1$', fontsize=16, color='blue')
# Vector A2 (green, basis vector 2)
A2 = np.array([0.5, 1, 0])
ax.quiver(*origin, *A2, color='green', linewidth=2, arrow_length_ratio=0.1)
ax.text(A2[0], A2[1], A2[2], r'$A_2$', fontsize=16, color='green')
# Projection vector p (red, orthogonal projection of y onto the plane)
p = np.array([0.75, 0.25, 0])
ax.quiver(*origin, *p, color='red', linewidth=2, arrow_length_ratio=0.1)
ax.text(p[0]+0.1, p[1], p[2], r'$p$', fontsize=16, color='red')
# Linear combination vector aA1 + bA2 (black, arbitrary point in plane)
comb = 0.5*A1 + 0.5*A2
ax.quiver(*origin, *comb, color='black', linewidth=2, arrow_length_ratio=0.1)
ax.text(comb[0]+0.1, comb[1], comb[2], r'$aA_1 + bA_2$', fontsize=14, color='black')
# Target vector y (purple, off-plane vector being projected)
y = np.array([0.75, 0.25, 1.5])
ax.quiver(*origin, *y, color='purple', linewidth=2, arrow_length_ratio=0.1)
ax.text(y[0], y[1], y[2]+0.1, r'$y$', fontsize=16, color='purple')
# --------------------------
# Draw the perpendicular dashed line (error vector y-p)
# --------------------------
error_vector = y - p
ax.plot([y[0], p[0]], [y[1], p[1]], [y[2], p[2]],
color='black', linestyle='--', linewidth=1)
# --------------------------
# Adjust plot limits and show
# --------------------------
ax.set_xlim(-1, 2)## (-1.0, 2.0)
## (-1.0, 1.0)
## (-0.1, 1.6)
# ==============================================
# Add title and show plot
# ==============================================
plt.title("Figure 3: Projection of vector $\mathbf{y}$ onto the column space of $A$", fontsize=12)
plt.tight_layout()
plt.show()As shown in Figure 3, \(\| (aA_1 + bA_2) - \mathbf{y} \|\) is the length of the marked real line for the values of \(a\) and \(b\). It is easy to see that the vector with the shortest distance from \(\mathbf{y}\) to \(aA_1 + bA_2\) can be obtained by \(\mathbf{p} = aA_1 + bA_2\) when \((A\mathbf{u} - \mathbf{y}) \perp A_1\) and \((A\mathbf{u} - \mathbf{y}) \perp A_2\).
Such \(a\) and \(b\) give the solution for \(\min \| A\mathbf{u} - \mathbf{y} \| = \min \| (aA_1 + bA_2) - \mathbf{y} \|\). The conditions \((A\mathbf{u} - \mathbf{y}) \perp A_1\) and \((A\mathbf{u} - \mathbf{y}) \perp A_2\) imply that \(A^T(A\mathbf{u} - \mathbf{y}) = \mathbf{0}\) and give \(\hat{\mathbf{u}} = (A^TA)^{-1}A^T\mathbf{y}\). \[ \begin{cases} A_1 \cdot (A\mathbf{u} - \mathbf{y}) = 0 \\ A_2 \cdot (A\mathbf{u} - \mathbf{y}) = 0 \end{cases} \iff \begin{bmatrix} A_1 \cdot (A\mathbf{u} - \mathbf{y}) \\ A_2 \cdot (A\mathbf{u} - \mathbf{y}) \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \] \[ \iff \begin{bmatrix} A_1^T \\ A_2^T \end{bmatrix}(A\mathbf{u} - \mathbf{y}) = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \iff A^T(A\mathbf{u} - \mathbf{y}) = \mathbf{0} \] \[ \iff A^TA\mathbf{u} = A^T\mathbf{y} \iff \hat{\mathbf{u}} = (A^TA)^{-1}A^T\mathbf{y} \]
Note
If \(x_i \neq x_j\) for a data set \(\{(x_i, y_i)\}\), then \(A^TA\) is always invertible (Vandermonde determinant) and \(\hat{\mathbf{u}} = (A^TA)^{-1}A^T\mathbf{y}\).
Summary of Projection and Least-Squares Concepts
\[ \begin{array}{cccccccccccc} \hline \textbf{Concept} & \textbf{Definition/Formula} \\ \hline \textrm{Line projection problem} & \min \| t\mathbf{a} - \mathbf{x} \| \\ \textrm{Optimal scalar}~ t & t = \frac{\mathbf{a} \cdot \mathbf{x}}{\mathbf{a} \cdot \mathbf{a}} = (\mathbf{a}^T\mathbf{a})^{-1}\mathbf{a}^T\mathbf{x} \\ \textrm{Least-squares problem} & \min \| A\mathbf{u} - \mathbf{y} \| = \min \| aA_1 + bA_2 - \mathbf{y} \| \\ \textrm{Orthogonality conditions} & (A\mathbf{u} - \mathbf{y}) \perp A_1,\ (A\mathbf{u} - \mathbf{y}) \perp A_2 \\ \textrm{Normal equations} & A^T(A\mathbf{u} - \mathbf{y}) = \mathbf{0} \implies A^TA\mathbf{u} = A^T\mathbf{y} \\ \textrm{Least-squares solution} & \hat{\mathbf{u}} = (A^TA)^{-1}A^T\mathbf{y} \\ \hline \end{array} \]
Find a least-squares solution of \(A\mathbf{u} = \mathbf{y}\) from the given data. \[ A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}, \quad \mathbf{u} = \begin{bmatrix} a \\ b \end{bmatrix}, \quad \mathbf{y} = \begin{bmatrix} 1 \\ 3 \\ 4 \\ 4 \end{bmatrix} \]
Solution
The least-squares solution of \(A\mathbf{u} = \mathbf{y}\) is given by \[ \hat{\mathbf{u}} = (A^T A)^{-1} A^T \mathbf{y} \] which minimizes \(\|A\mathbf{u} - \mathbf{y}\|\).
Calculating: \[ \hat{\mathbf{u}} = \begin{bmatrix} a \\ b \end{bmatrix} = (A^T A)^{-1} A^T \mathbf{y} = \begin{bmatrix} \frac{7}{10} & -\frac{3}{10} \\ -\frac{3}{10} & \frac{2}{10} \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 2 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 4 \\ 4 \end{bmatrix} = \begin{bmatrix} \frac{3}{2} \\ 1 \end{bmatrix} = \begin{bmatrix} 1.5 \\ 1 \end{bmatrix} \]
# Import necessary libraries: NumPy for calculations, Matplotlib for plotting
import numpy as np
import matplotlib.pyplot as plt
# --------------------------
# Step 1: Define given matrix A and vector y
# --------------------------
# Design matrix A from the linear regression problem
A = np.array([[1, 0], [1, 1], [1, 2], [1, 3]])
# Target output vector y
y = np.array([1, 3, 4, 4])
# --------------------------
# Step 2: Compute the least-squares solution
# Formula: û = (AᵀA)⁻¹ Aᵀ y
# --------------------------
# Compute A transpose multiplied by A
ATA = A.T @ A
# Compute inverse of ATA
ATA_inv = np.linalg.inv(ATA)
# Compute A transpose multiplied by y
ATy = A.T @ y
# Calculate the least-squares solution vector û (contains a and b)
u_hat = ATA_inv @ ATy
# Extract intercept (a) and slope (b) from the solution
intercept = u_hat[0]
slope = u_hat[1]
# --------------------------
# Step 3: Print the result
# --------------------------
print("Least-squares solution u* =", u_hat)## Least-squares solution u* = [1.5 1. ]
## Fitted line equation: y = 1.5 + 1.0x
# --------------------------
# Step 4: Plot the data and the fitted line (matches TikZ figure)
# --------------------------
# Extract x values from the first column of data
x_data = np.array([0, 1, 2, 3])
y_data = y
# Create x values for plotting the fitted line
x_line = np.linspace(-0.5, 4, 100)
y_line = intercept + slope * x_line
# Create figure with matching size
plt.figure(figsize=(8, 6))
# Plot red data points (exact coordinates)
plt.scatter(x_data, y_data, color='red', s=60, zorder=5)
# Plot blue thick fitted line
plt.plot(x_line, y_line, color='blue', linewidth=2)
# Draw centered axes
plt.axhline(0, color='black', linewidth=1)
plt.axvline(0, color='black', linewidth=1)
# Set labels and axis limits (exactly like LaTeX)
plt.xlabel(r'$x$', fontsize=12)
plt.ylabel(r'$y$', fontsize=12)
plt.xlim(-0.5, 4)## (-0.5, 4.0)
## (0.0, 5.0)
# Add title (figure caption)
plt.title("Fig. 4: Least-squares line fitting the data", fontsize=12)
# Show grid and ticks
plt.xticks()## (array([-0.5, 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ]), [Text(-0.5, 0, '−0.5'), Text(0.0, 0, '0.0'), Text(0.5, 0, '0.5'), Text(1.0, 0, '1.0'), Text(1.5, 0, '1.5'), Text(2.0, 0, '2.0'), Text(2.5, 0, '2.5'), Text(3.0, 0, '3.0'), Text(3.5, 0, '3.5'), Text(4.0, 0, '4.0')])
## (array([0., 1., 2., 3., 4., 5.]), [Text(0, 0.0, '0'), Text(0, 1.0, '1'), Text(0, 2.0, '2'), Text(0, 3.0, '3'), Text(0, 4.0, '4'), Text(0, 5.0, '5')])
The least-squares line is \[ y = a + bx = \frac{3}{2} + x \] where \[ \hat{\mathbf{u}} = \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} 1.5 \\ 1 \end{bmatrix}. \]
We will study the process of finding a least-squares curve (line), called linear regression, in the statistics section later.
Summary of Example 2 least-squares solution
\[ \begin{array}{cccccccccccc} \hline \textbf{Component} & \textbf{Value} \\ \hline \textrm{Matrix}~ A & \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix} \\ \textrm{Vector}~ \mathbf{y} & \begin{bmatrix} 1 \\ 3 \\ 4 \\ 4 \end{bmatrix} \\ \textrm{Normal equations} & A^T A \mathbf{u} = A^T \mathbf{y} \\ \textrm{Least-squares solution} & \hat{\mathbf{u}} = (A^T A)^{-1} A^T \mathbf{y} = \begin{bmatrix} 1.5 \\ 1 \end{bmatrix} \\ \textrm{Fitted line} & y = 1.5 + x \\ \hline \end{array} \]
Consider 4 points in the \(xu\)-plane (2-dimensional data): \[ (0,1),\ (1,3),\ (2,4),\ (3,4) \]
Let us find the best fit curve (a quadratic approximation of \(u = a + bx + cx^2\)) to describe \((x_i, u_i)\). Since all 4 points of data should satisfy the quadratic equation \(u = a + bx + cx^2\), they can be expressed in matrix form as follows.
# Import numerical and plotting libraries
import numpy as np
import matplotlib.pyplot as plt
# =============================================================================
# Step 1: Define the given data points (x, u)
# Data: (0,1), (1,3), (2,4), (3,4)
# =============================================================================
x_data = np.array([0, 1, 2, 3])
u_data = np.array([1, 3, 4, 4])
# =============================================================================
# Step 2: Build the design matrix A for QUADRATIC fitting (u = a + bx + cx²)
# Each row: [1, x, x²]
# =============================================================================
A = np.array([
[1, 0, 0], # x=0 → 1, 0, 0²
[1, 1, 1], # x=1 → 1, 1, 1²
[1, 2, 4], # x=2 → 1, 2, 2²
[1, 3, 9] # x=3 → 1, 3, 3²
])
# Target vector y (observed u values)
y = np.array([1, 3, 4, 4])
# =============================================================================
# Step 3: Compute the least-squares solution
# Formula: û = (Aᵀ A)⁻¹ Aᵀ y
# =============================================================================
# Compute A transpose multiplied by A
ATA = A.T @ A
# Compute inverse of ATA
ATA_inv = np.linalg.inv(ATA)
# Compute A transpose multiplied by y
ATy = A.T @ y
# Solve for coefficients [a, b, c]
u_hat = ATA_inv @ ATy
# Extract the quadratic coefficients
a, b, c = u_hat[0], u_hat[1], u_hat[2]
# =============================================================================
# Step 4: Print the results
# =============================================================================
print("Least-squares solution u* =", u_hat)## Least-squares solution u* = [ 1. 2.5 -0.5]
## Quadratic curve: u = 1.0 + 2.5x + -0.5x²
# =============================================================================
# Step 5: Plot the data points and the fitted quadratic curve
# (Matches the LaTeX TikZ figure exactly)
# =============================================================================
# Create smooth x values for plotting the curve
x_curve = np.linspace(-0.5, 4, 100)
# Compute u values using the quadratic formula
u_curve = a + b * x_curve + c * x_curve**2
# Create figure with matching size
plt.figure(figsize=(8, 6))
# Plot red data points (solid circles)
plt.scatter(x_data, u_data, color='red', s=60, zorder=5)
# Plot blue thick quadratic fitting curve
plt.plot(x_curve, u_curve, color='blue', linewidth=2)
# Draw centered x and y axes
plt.axhline(0, color='black', linewidth=1)
plt.axvline(0, color='black', linewidth=1)
# Set axis labels (math notation)
plt.xlabel(r'$x$', fontsize=12)
plt.ylabel(r'$u$', fontsize=12)
# Set axis limits (exactly like LaTeX)
plt.xlim(-0.5, 4)## (-0.5, 4.0)
## (0.0, 5.0)
# Add figure title (matches LaTeX caption)
plt.title("Fig. 5: Quadratic least-squares curve fitting the data", fontsize=12)
# Clean layout and show plot
plt.tight_layout()
plt.show()Formulating the quadratic least-squares problem
\[ \begin{array}{cccccccccccc} \hline \textrm{Data}~(x, u) & \textrm{Quadratic function}~u = a + bx + cx^2 & \textrm{System of linear equations & Matrix representation} \\ \hline (0, 1) & 1 = a + b\cdot 0 + c\cdot 0^2 & & \\ (1, 3) & 3 = a + b\cdot 1 + c\cdot 1^2 & \begin{cases} a = 1 \\ a + b + c = 3 \\ a + 2b + 4c = 4 \\ a + 3b + 9c = 4 \end{cases} & A\mathbf{u} = \mathbf{y} \\ (2, 4) & 4 = a + b\cdot 2 + c\cdot 2^2 & & \textrm{where}~A = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \end{bmatrix}, ~\mathbf{u} = \begin{bmatrix} a \\ b \\ c \end{bmatrix}, ~\mathbf{y} = \begin{bmatrix} 1 \\ 3 \\ 4 \\ 4 \end{bmatrix} \\ (3, 4) & 4 = a + b\cdot 3 + c\cdot 3^2 & & \\ \hline \end{array} \]
Simply, we only need to find a vector \(\hat{\mathbf{u}} = (a, b, c)\) for \(u = a + bx + cx^2\) that minimizes \(\|A\mathbf{u} - \mathbf{y}\|\). To obtain this quadratic function for a data set, we will have a system of equations with no unique solution. So we will have a least-squares problem \(\min \|A\mathbf{u} - \mathbf{y}\|\) to solve.
The following function \(E(\mathbf{u})\) represents an error for this problem: \[ E(\mathbf{u}) = E(a,b,c) = (a-1)^2 + (a+b+c-3)^2 + (a+2b+4c-4)^2 + (a+3b+9c-4)^2 = \|A\mathbf{u} - \mathbf{y}\|^2 \]
We can find \[ \hat{\mathbf{u}} = \begin{bmatrix} a \\ b \\ c \end{bmatrix} = (A^T A)^{-1} A^T \mathbf{y} = \begin{bmatrix} 1 \\ \frac{3}{2} \\ -\frac{1}{2} \end{bmatrix} \]
# Import NumPy library
import numpy as np
# Design matrix for quadratic fitting
A = np.array([[1, 0, 0],
[1, 1, 1],
[1, 2, 4],
[1, 3, 9]])
# Target values
y = np.array([1, 3, 4, 4])
# Compute least-squares solution using standard matrix operations
u_hat = np.linalg.inv(A.T @ A) @ A.T @ y
# Print result
print("u* =", u_hat)## u* = [ 1. 2.5 -0.5]
So the least-squares curve is \[ u = 1 + \frac{3}{2}x - \frac{1}{2}x^2. \]