Department of Mathematics

The Chinese University of Hong Kong

1 Expectation, Variance, Standard Deviation

1.1 Definitions for Discrete Random Variables

The expected value (expectation) of a random variable is the probability-weighted average of all possible outcomes of a probabilistic experiment.

Variance quantifies the spread of the random variable’s values around its expectation.

Standard deviation is the positive square root of variance, with identical units as the random variable.

Let \(X\) be a discrete random variable with support \(\{x_1,x_2,\dots,x_n\}\) and probability mass function (PMF) \(P(X=x_i)=p_i\), where \(\sum_{i=1}^n p_i = 1,\ p_i\ge 0\).

  • Expectation: \[ \mathbb{E}[X] = \mu_X = \sum_{i=1}^n x_i \, p_i \]

  • Variance (two equivalent formulas): \[ \operatorname{Var}(X) = \sigma_X^2 = \mathbb{E}\big[(X-\mu_X)^2\big] = \sum_{i=1}^n (x_i-\mu_X)^2 p_i \] \[ \operatorname{Var}(X) = \mathbb{E}[X^2] - \big(\mathbb{E}[X]\big)^2,\quad \mathbb{E}[X^2]=\sum_{i=1}^n x_i^2 p_i \]

  • Standard deviation: \[ \operatorname{SD}(X) = \sigma_X = \sqrt{\operatorname{Var}(X)} \]

1.2 Example 1: Discrete Random Variable \(X\)

Given PMF table for discrete \(X\):

\[ \begin{array}{c|cccc|c} x & 0 & 1 & 2 & 3 & \textrm{Sum} \\ \hline P(X=x) & 0.010 & 0.840 & 0.145 & 0.005 & 1 \end{array} \]

1.2.1 Step-by-Step Manual Computation

Step 1: Compute \(\mathbb{E}[X]\) \[ \begin{aligned} \mathbb{E}[X] &= (0)(0.010) + (1)(0.840) + (2)(0.145) + (3)(0.005) \\ &= 0 + 0.840 + 0.290 + 0.015 \\ &= 1.145 \end{aligned} \]

Step 2: Compute \(\mathbb{E}[X^2]\) (for variance shortcut formula) \[ \begin{aligned} \mathbb{E}[X^2] &= (0^2)(0.010) + (1^2)(0.840) + (2^2)(0.145) + (3^2)(0.005) \\ &= 0 + (1)(0.840) + (4)(0.145) + (9)(0.005) \\ &= 0 + 0.840 + 0.580 + 0.045 \\ &= 1.465 \end{aligned} \]

Step 3: Compute Variance \(\operatorname{Var}(X)\) \[ \begin{aligned} \operatorname{Var}(X) &= \mathbb{E}[X^2] - \big(\mathbb{E}[X]\big)^2 \\ &= 1.465 - (1.145)^2 \\ &= 1.465 - 1.311025 \\ &= 0.153975 \end{aligned} \] Verification via direct squared deviation sum: \[ \begin{aligned} \sum (x_i-\mu_X)^2 p_i &= (0-1.145)^2(0.010) + (1-1.145)^2(0.840) \\ &\quad + (2-1.145)^2(0.145) + (3-1.145)^2(0.005) \\ &= (1.311025)(0.010) + (0.021025)(0.840) \\ &\quad + (0.731025)(0.145) + (3.441025)(0.005) \\ &= 0.01311025 + 0.017661 + 0.105998625 + 0.017205125 \\ &= 0.153975 \quad (\text{matches shortcut result}) \end{aligned} \]

Step 4: Compute Standard Deviation \(\sigma_X\) \[ \sigma_X = \sqrt{0.153975} \approx 0.39239648 \]

import math

# Define random variable values and probabilities
x_vals = [0, 1, 2, 3]
p_vals = [0.010, 0.840, 0.145, 0.005]

# Step 1: Expectation E[X]
E_X = sum(x * p for x, p in zip(x_vals, p_vals))

# Step 2: E[X^2] for variance shortcut
E_X2 = sum((x**2) * p for x, p in zip(x_vals, p_vals))

# Step3: Variance
Var_X = E_X2 - E_X**2

# Step4: Standard deviation
SD_X = math.sqrt(Var_X)

# Print high-precision results
print(f"E(X) = {E_X:.7f}")
## E(X) = 1.1450000
print(f"Var(X) = {Var_X:.7f}")
## Var(X) = 0.1539750
print(f"SD(X) = {SD_X:.7f}")
## SD(X) = 0.3923965

1.3 Continuous Random Variable Definitions

Let \(X\) be a continuous random variable with probability density function (PDF) \(f_X(t)\), supported on interval \([a,b]\), satisfying \(\displaystyle \int_a^b f_X(t)dt =1,\ f_X(t)\ge0\).

  • Expectation: \[ \mathbb{E}[X] = \mu_X = \int_{-\infty}^{\infty} t\, f_X(t) dt \]

  • Variance: \[ \operatorname{Var}(X) = \mathbb{E}\big[(X-\mu_X)^2\big] = \int_{-\infty}^{\infty} (t-\mu_X)^2 f_X(t)dt = \mathbb{E}[X^2] - (\mathbb{E}[X])^2,\quad \mathbb{E}[X^2]=\int_{-\infty}^\infty t^2 f_X(t)dt \]

  • Standard deviation: \[ \sigma_X = \sqrt{\operatorname{Var}(X)} \]

1.4 Key Expectation & Variance Properties

Let \(a,b\in\mathbb{R}\) be constants, \(X,Y\) random variables:

  • \(\mathbb{E}[aX + b] = a\mathbb{E}[X] + b\); \(\mathbb{E}[X+Y]=\mathbb{E}[X]+\mathbb{E}[Y]\)

  • \(\operatorname{Var}(aX + b) = a^2 \operatorname{Var}(X)\); \(\operatorname{Var}(b)=0\) (constant has zero variance)

  • Standardized random variable: define \(Z = \dfrac{X-\mu_X}{\sigma_X}\). By properties 1 and 2: \[ \mathbb{E}[Z] = \mathbb{E}\left[\frac{X-\mu_X}{\sigma_X}\right] = \frac{1}{\sigma_X}\big(\mathbb{E}[X]-\mu_X\big)=0 \] \[ \operatorname{Var}(Z) = \operatorname{Var}\left(\frac{X-\mu_X}{\sigma_X}\right) = \frac{1}{\sigma_X^2}\operatorname{Var}(X) = \frac{\sigma_X^2}{\sigma_X^2}=1 \] \(Z\) always has mean \(0\), variance \(1\).

1.5 Example 2: Continuous Random Variable \(X\), PDF \(f(t)=3t^2,\ 0\le t\le1\)

Define \(Y=4X+2\). Compute \(\mathbb{E}[X],\operatorname{Var}(X),\mathbb{E}[Y],\operatorname{Var}(Y)\).

Step-by-Step Analytical Integration

Step 1: Compute \(\mathbb{E}[X]\) \[ \mathbb{E}[X] = \int_0^1 t\cdot f(t) dt = \int_0^1 t\cdot 3t^2 dt = 3\int_0^1 t^3 dt = 3\left[\frac{t^4}{4}\right]_0^1 = \frac{3}{4} = 0.75 \]

Step 2: Compute \(\mathbb{E}[X^2]\) \[ \mathbb{E}[X^2] = \int_0^1 t^2\cdot 3t^2 dt = 3\int_0^1 t^4 dt =3\left[\frac{t^5}{5}\right]_0^1 = \frac{3}{5}=0.6 \]

Step 3: Variance of \(X\) \[ \operatorname{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \frac{3}{5} - \left(\frac{3}{4}\right)^2 = \frac{3}{5} - \frac{9}{16} = \frac{48 - 45}{80} = \frac{3}{80} = 0.0375 \]

Step 4: Expectation of \(Y=4X+2\) (linear property) \[ \mathbb{E}[Y] = \mathbb{E}[4X+2] =4\mathbb{E}[X]+2 =4\cdot \frac{3}{4}+2 = 3+2=5 \]

Step 5: Variance of \(Y=4X+2\) (variance scaling property) \[ \operatorname{Var}(Y) = 4^2 \operatorname{Var}(X) =16\cdot \frac{3}{80} = \frac{48}{80}=\frac{3}{5}=0.6 \] Full integral verification for \(\mathbb{E}[Y^2]\) (cross-check): \[ \begin{aligned} \mathbb{E}[Y^2] &= \int_0^1 (4t+2)^2 \cdot 3t^2 dt = 3\int_0^1 (16t^2+16t+4)t^2 dt \\ &=3\int_0^1 (16t^4+16t^3+4t^2)dt =3\left[ \frac{16t^5}{5}+\frac{16t^4}{4}+\frac{4t^3}{3} \right]_0^1 \\ &=3\left(\frac{16}{5}+4+\frac{4}{3}\right) =3\left(\frac{48+60+20}{15}\right) =3\cdot \frac{128}{15}=\frac{128}{5}=25.6 \end{aligned} \] \[ \operatorname{Var}(Y)=\mathbb{E}[Y^2]-(\mathbb{E}[Y])^2 = 25.6 - 25 =0.6 \quad (\text{matches property result}) \]

import sympy as sp

# Define symbolic variable
t = sp.Symbol('t', real=True)
# PDF function
f = 3 * t**2
# Support interval [0,1]
a, b = 0, 1

# E[X]
E_X = sp.integrate(t * f, (t, a, b))
# E[X^2]
E_X2 = sp.integrate(t**2 * f, (t, a, b))
Var_X = E_X2 - E_X**2

# Y =4X+2
Y_expr = 4*t + 2
E_Y = sp.integrate(Y_expr * f, (t, a, b))
E_Y2 = sp.integrate(Y_expr**2 * f, (t, a, b))
Var_Y = E_Y2 - E_Y**2

# Print exact symbolic and decimal values
print("=== X = continuous, f(t)=3t^2, 0<=t<=1 ===")
## === X = continuous, f(t)=3t^2, 0<=t<=1 ===
print(f"E[X] exact: {E_X}, decimal: {float(E_X)}")
## E[X] exact: 3/4, decimal: 0.75
print(f"Var(X) exact: {Var_X}, decimal: {float(Var_X)}")
## Var(X) exact: 3/80, decimal: 0.0375
print("\n=== Y = 4X +2 ===")
## 
## === Y = 4X +2 ===
print(f"E[Y] exact: {E_Y}, decimal: {float(E_Y)}")
## E[Y] exact: 5, decimal: 5.0
print(f"Var(Y) exact: {Var_Y}, decimal: {float(Var_Y)}")
## Var(Y) exact: 3/5, decimal: 0.6

2 Joint Probability Distribution

For two discrete r.v.s \(X,Y\) with joint PMF \(p(x,y)=P(X=x,Y=y)\): \[ \sum_{x}\sum_{y} p(x,y)=1,\quad p(x,y)\ge0 \] Marginal PMFs: \[ p_X(x)=\sum_y p(x,y),\quad p_Y(y)=\sum_x p(x,y) \] For continuous r.v.s with joint PDF \(f(x,y)\): \[ \iint_{\mathbb{R}^2} f(x,y)dxdy=1,\ f(x,y)\ge0 \] Marginals: \[ f_X(x)=\int_{-\infty}^\infty f(x,y)dy,\quad f_Y(y)=\int_{-\infty}^\infty f(x,y)dx \] Joint expectation:

  • \(\mathbb{E}[g(X,Y)]=\sum_x\sum_y g(x,y)p(x,y)\) (discrete)

  • \(\iint g(x,y)f(x,y)dxdy\) (continuous).

3 Covariance, Correlation Coefficient

3.1 Covariance Definition

\[ \operatorname{Cov}(X,Y) = \mathbb{E}\big[(X-\mu_X)(Y-\mu_Y)\big] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y] \] Properties:

  • \(\operatorname{Cov}(X,X)=\operatorname{Var}(X)\)

  • \(\operatorname{Cov}(aX+b,cY+d)=ac\operatorname{Cov}(X,Y)\)

  • \(\operatorname{Cov}(X_1+X_2,Y)=\operatorname{Cov}(X_1,Y)+\operatorname{Cov}(X_2,Y)\)

3.2 Pearson Correlation Coefficient

\[ \rho_{XY} = \frac{\operatorname{Cov}(X,Y)}{\sigma_X \sigma_Y},\quad -1\le \rho_{XY}\le 1 \]

  • \(\rho=1\): perfect positive linear relation

  • \(\rho=-1\): perfect negative linear relation

  • \(\rho=0\): uncorrelated (no linear association; independent \(\implies\) uncorrelated, converse false)

4 Covariance Matrix

For random vector \(\mathbf{X} = \begin{bmatrix}X_1\\X_2\\\vdots\\X_k\end{bmatrix}\), mean vector \(\boldsymbol{\mu}=\begin{bmatrix}\mu_1\\\mu_2\\\vdots\\\mu_k\end{bmatrix},\ \mu_i=\mathbb{E}[X_i]\).

Covariance matrix \(\boldsymbol{\Sigma}\in\mathbb{R}^{k\times k}\): \[ \boldsymbol{\Sigma} = \begin{bmatrix} \operatorname{Var}(X_1) & \operatorname{Cov}(X_1,X_2) & \dots & \operatorname{Cov}(X_1,X_k) \\ \operatorname{Cov}(X_2,X_1) & \operatorname{Var}(X_2) & \dots & \operatorname{Cov}(X_2,X_k) \\ \vdots & \vdots & \ddots & \vdots \\ \operatorname{Cov}(X_k,X_1) & \operatorname{Cov}(X_k,X_2) & \dots & \operatorname{Var}(X_k) \end{bmatrix} \] Properties: symmetric (\(\boldsymbol{\Sigma}=\boldsymbol{\Sigma}^T\)), positive semi-definite.

5 Example 3

Select a continuous PDF \(f(x)\) from a statistics textbook (e.g., uniform, exponential, normal, beta). Repeat steps:

  1. Verify \(\int f(x)dx=1\) (valid PDF check)

  2. Compute \(\mathbb{E}[X]=\int x f(x)dx\)

  3. Compute \(\mathbb{E}[X^2]=\int x^2 f(x)dx\)

  4. \(\operatorname{Var}(X)=\mathbb{E}[X^2]-(\mathbb{E}[X])^2\)

  5. \(\sigma_X=\sqrt{\operatorname{Var}(X)}\)

Implement calculation in SymPy Python code for symbolic/exact results.

# Import symbolic computation library SymPy
import sympy as sp

# --------------------------
# Step 1: Define symbolic variables and PDF parameters
# --------------------------
# Real variable for random variable X
x = sp.Symbol("x", real=True)
# Distribution hyperparameters: mu (mean), sigma > 0 (standard deviation)
mu = sp.Symbol("mu", real=True)
sigma = sp.Symbol("sigma", positive=True)

# Define Normal PDF f(x)
f_x = (1 / (sigma * sp.sqrt(2 * sp.pi))) * sp.exp(-((x - mu) ** 2) / (2 * sigma ** 2))
print("=== Defined Normal PDF f(x) ===")
## === Defined Normal PDF f(x) ===
sp.pprint(f_x)
##                  2 
##        -(-mu + x)  
##        ------------
##                 2  
##   ___    2*sigma   
## \/ 2 *e            
## -------------------
##       ____         
##   2*\/ pi *sigma
print("\n")
# --------------------------
# Step 1: Verify PDF validity: Integral of f(x) over full support = 1
# Support of normal distribution: (-∞, +∞)
# --------------------------
integral_f = sp.integrate(f_x, (x, -sp.oo, sp.oo))
print("Step 1: Integral of f(x) from -∞ to +∞")
## Step 1: Integral of f(x) from -∞ to +∞
sp.pprint(integral_f)
## 1
if integral_f == 1:
    print("Conclusion: ∫f(x)dx = 1, this is a valid probability density function.\n")
else:
    print("Conclusion: Integral not equal to 1, invalid PDF.\n")
## Conclusion: ∫f(x)dx = 1, this is a valid probability density function.
# --------------------------
# Step 2: Compute expected value E[X] = ∫x*f(x) dx
# --------------------------
E_X = sp.integrate(x * f_x, (x, -sp.oo, sp.oo))
print("Step 2: E[X] = ∫x f(x) dx")
## Step 2: E[X] = ∫x f(x) dx
sp.pprint(E_X)
## mu
print("\n")
# --------------------------
# Step 3: Compute second raw moment E[X²] = ∫x²*f(x) dx
# --------------------------
E_X2 = sp.integrate((x ** 2) * f_x, (x, -sp.oo, sp.oo))
print("Step 3: E[X²] = ∫x² f(x) dx")
## Step 3: E[X²] = ∫x² f(x) dx
sp.pprint(E_X2)
##   2        2
## mu  + sigma
print("\n")
# --------------------------
# Step 4: Variance Var(X) = E[X²] - (E[X])²
# Symbolic simplification
# --------------------------
Var_X = sp.simplify(E_X2 - (E_X) ** 2)
print("Step 4: Var(X) = E[X²] - (E[X])²")
## Step 4: Var(X) = E[X²] - (E[X])²
sp.pprint(Var_X)
##      2
## sigma
print("\n")
# --------------------------
# Step 5: Standard deviation σ_X = sqrt(Var(X))
# --------------------------
sigma_X = sp.simplify(sp.sqrt(Var_X))
print("Step 5: σ_X = sqrt(Var(X))")
## Step 5: σ_X = sqrt(Var(X))
sp.pprint(sigma_X)
## sigma
import sympy as sp

# Symbols
x = sp.Symbol("x", real=True)
a = sp.Symbol("a", real=True)
b = sp.Symbol("b", real=True)

# Uniform PDF
f_x = 1 / (b - a)
support_lower = a
support_upper = b

# Step1: Integral f(x) over [a,b]
int_f = sp.integrate(f_x, (x, a, b))
print("Step1 ∫f(x)dx =", int_f)
## Step1 ∫f(x)dx = -a/(-a + b) + b/(-a + b)
# Step2 E[X]
E_X = sp.integrate(x * f_x, (x, a, b))
print("Step2 E[X] =", sp.simplify(E_X))
## Step2 E[X] = a/2 + b/2
# Step3 E[X²]
E_X2 = sp.integrate(x**2 * f_x, (x, a, b))
print("Step3 E[X²] =", sp.simplify(E_X2))
## Step3 E[X²] = a**2/3 + a*b/3 + b**2/3
# Step4 Variance
Var_X = sp.simplify(E_X2 - E_X**2)
print("Step4 Var(X) =", Var_X)
## Step4 Var(X) = a**2/12 - a*b/6 + b**2/12
# Step5 Standard deviation
sigma_X = sp.simplify(sp.sqrt(Var_X))
print("Step5 σ_X =", sigma_X)
## Step5 σ_X = sqrt(3*a**2 - 6*a*b + 3*b**2)/6

6 Joint Probability Distribution

When analyzing two or more random variables simultaneously, we study their joint probability distribution, which describes the probability that multiple events occur at the same time. Given a joint distribution and one marginal distribution, we can solve for the remaining marginal distribution of a single random variable.

6.1 Discrete Random Variables: Joint Probability Mass Function (PMF)

Definition of Joint PMF

Let \(X,Y\) be discrete random variables with support values \(x_1,x_2,\dots,x_s\) and \(y_1,y_2,\dots,y_t\) respectively. The is defined: \[ p_{ij}=p(x_i,y_j) = \mathbb{P}(X=x_i,\ Y=y_j),\quad x_i,y_j\in\mathbb{R},\ i=1,\dots,s,\ j=1,\dots,t \]

Joint Probability Table Layout

Marginal PMF Formulas

From the joint PMF, marginal distributions for single variables are obtained by summing out the other variable: \[ \mathbb{P}(X=x_i) = \sum_{j=1}^t \mathbb{P}(X=x_i,Y=y_j) = \sum_{j=1}^t p_{ij} \] \[ \mathbb{P}(Y=y_j) = \sum_{i=1}^s \mathbb{P}(X=x_i,Y=y_j) = \sum_{i=1}^s p_{ij} \] The marginal distribution is the univariate probability distribution of one variable alone, extracted from the joint table.

Properties of Discrete Joint PMF

  • Non-negativity: \(p(x_i,y_j)\ge 0\) for all valid \(x_i,y_j\).

  • Total probability sums to 1: \(\sum_{i=1}^s \sum_{j=1}^t p(x_i,y_j) = 1\).

  • Rectangle probability: \(\mathbb{P}(a<X<b,\ c<Y<d) = \displaystyle\sum_{\substack{a<x_i<b \\ c<y_j<d}} p(x_i,y_j)\).

6.2 Example 4: Hypergeometric Draw of Colored Balls

Problem Statement

Pocket contents:

  • 3 blue balls, 2 red balls, 3 green balls; total \(3+2+3=8\) balls.

Draw 2 balls without replacement.

Define:

  • \(X\) = number of blue balls drawn

  • \(Y\) = number of red balls drawn

Tasks:

  • Derive formula for joint PMF \(p(x,y)=\mathbb{P}(X=x,Y=y)\).

  • Construct full joint probability table.

  • Compute \(\mathbb{P}(X+Y\le 1)\).

  • Find marginal distribution of \(X\).

  • Find marginal distribution of \(Y\).

Step 1: Joint PMF Formula (Hypergeometric)

Total ways to draw 2 balls: \(\binom{8}{2}\). For \(x\) blue, \(y\) red, the remaining \(2-x-y\) balls are green. Valid ranges: \(x=0,1,2;\ y=0,1,2;\ x+y\le 2\). \[ p(x,y) = \frac{\binom{3}{x}\binom{2}{y}\binom{3}{2-x-y}}{\binom{8}{2}} \] Compute denominator first: \[ \binom{8}{2} = \frac{8!}{2!6!} = \frac{8\cdot7}{2}=28 \]

Individual Joint Probability Calculations \[\begin{align*} p(0,0) &= \frac{\binom{3}{0}\binom{2}{0}\binom{3}{2}}{28} = \frac{1\cdot1\cdot3}{28} = \frac{3}{28} \\ p(0,1) &= \frac{\binom{3}{0}\binom{2}{1}\binom{3}{1}}{28} = \frac{1\cdot2\cdot3}{28} = \frac{6}{28} = \frac{3}{14} \\ p(0,2) &= \frac{\binom{3}{0}\binom{2}{2}\binom{3}{0}}{28} = \frac{1\cdot1\cdot1}{28} = \frac{1}{28} \\ p(1,0) &= \frac{\binom{3}{1}\binom{2}{0}\binom{3}{1}}{28} = \frac{3\cdot1\cdot3}{28} = \frac{9}{28} \\ p(1,1) &= \frac{\binom{3}{1}\binom{2}{1}\binom{3}{0}}{28} = \frac{3\cdot2\cdot1}{28} = \frac{6}{28} = \frac{3}{14} \\ p(1,2) &= \binom{3}{1}\binom{2}{2}\binom{3}{-1} \quad (\text{invalid, zero}) = 0 \\ p(2,0) &= \frac{\binom{3}{2}\binom{2}{0}\binom{3}{0}}{28} = \frac{3\cdot1\cdot1}{28} = \frac{3}{28} \\ p(2,1) &= p(2,2) = 0 \quad (\text{insufficient balls, negative green count}) \end{align*}\]

Step 2: Full Joint Probability Table

Step 3: Compute \(\mathbb{P}(X+Y\le 1)\)

Valid \((x,y)\) pairs satisfying \(x+y\le1\): \((0,0),\ (0,1),\ (1,0)\) \[ \begin{aligned} \mathbb{P}(X+Y\le 1) &= p(0,0)+p(0,1)+p(1,0) \\ &= \frac{3}{28} + \frac{3}{14} + \frac{9}{28} \\ &= \frac{3}{28} + \frac{6}{28} + \frac{9}{28} \\ &= \frac{18}{28} = \frac{9}{14} \end{aligned} \]

Step 4: Marginal Distribution of \(X\)

\[ \begin{aligned} \mathbb{P}(X=0) &= p(0,0)+p(0,1)+p(0,2) = \frac{3}{28}+\frac{6}{28}+\frac{1}{28}=\frac{10}{28}=\frac{5}{14} \\ \mathbb{P}(X=1) &= p(1,0)+p(1,1)+p(1,2) = \frac{9}{28}+\frac{6}{28}+0=\frac{15}{28} \\ \mathbb{P}(X=2) &= p(2,0)+p(2,1)+p(2,2) = \frac{3}{28}+0+0=\frac{3}{28} \end{aligned} \]

Step 5: Marginal Distribution of \(Y\)

\[ \begin{aligned} \mathbb{P}(Y=0) &= p(0,0)+p(1,0)+p(2,0) = \frac{3}{28}+\frac{9}{28}+\frac{3}{28}=\frac{15}{28} \\ \mathbb{P}(Y=1) &= p(0,1)+p(1,1)+p(2,1) = \frac{6}{28}+\frac{6}{28}+0=\frac{12}{28}=\frac{3}{7} \\ \mathbb{P}(Y=2) &= p(0,2)+p(1,2)+p(2,2) = \frac{1}{28}+0+0=\frac{1}{28} \end{aligned} \]
import math
from fractions import Fraction

# Define combination function
def comb(n, k):
    if k < 0 or k > n:
        return 0
    return math.comb(n, k)

# Parameters
n_blue = 3
n_red = 2
n_green = 3
total = n_blue + n_red + n_green
draw = 2
denominator = comb(total, draw)

# Step 1: Compute all joint probabilities p(x,y)
joint_probs = {}
for x in [0,1,2]:
    for y in [0,1,2]:
        z = draw - x - y
        num = comb(n_blue, x) * comb(n_red, y) * comb(n_green, z)
        prob = Fraction(num, denominator)
        joint_probs[(x,y)] = prob

# Print joint table
print("=== Joint Probability Table p(x,y) ===")
## === Joint Probability Table p(x,y) ===
for x in [0,1,2]:
    row = []
    for y in [0,1,2]:
        row.append(str(joint_probs[(x,y)]))
    print(f"x={x}: y0={row[0]}, y1={row[1]}, y2={row[2]}")
## x=0: y0=3/28, y1=3/14, y2=1/28
## x=1: y0=9/28, y1=3/14, y2=0
## x=2: y0=3/28, y1=0, y2=0
# Step2: Compute P(X+Y <= 1)
p_sum_le1 = joint_probs[(0,0)] + joint_probs[(0,1)] + joint_probs[(1,0)]
print(f"\nP(X+Y <= 1) = {p_sum_le1}, decimal = {float(p_sum_le1):.6f}")
## 
## P(X+Y <= 1) = 9/14, decimal = 0.642857
# Step3: Marginal X
marg_x = {}
for x in [0,1,2]:
    s = sum(joint_probs[(x,y)] for y in [0,1,2])
    marg_x[x] = s
print("\n=== Marginal Distribution X ===")
## 
## === Marginal Distribution X ===
for x, val in marg_x.items():
    print(f"P(X={x}) = {val}, decimal = {float(val):.6f}")
## P(X=0) = 5/14, decimal = 0.357143
## P(X=1) = 15/28, decimal = 0.535714
## P(X=2) = 3/28, decimal = 0.107143
# Step4: Marginal Y
marg_y = {}
for y in [0,1,2]:
    s = sum(joint_probs[(x,y)] for x in [0,1,2])
    marg_y[y] = s
print("\n=== Marginal Distribution Y ===")
## 
## === Marginal Distribution Y ===
for y, val in marg_y.items():
    print(f"P(Y={y}) = {val}, decimal = {float(val):.6f}")
## P(Y=0) = 15/28, decimal = 0.535714
## P(Y=1) = 3/7, decimal = 0.428571
## P(Y=2) = 1/28, decimal = 0.035714

7 Joint Probability Density Function for Continuous Random Variables

Definition and Properties of Continuous Joint PDF

For two continuous random variables \(X\) and \(Y\), we characterize their joint behavior with a joint probability density function (joint PDF) \(f(x,y)\). Probability calculations require double integration over regions of the \((x,y)\) plane.

  • Non-negativity condition: \[ f(x,y) \ge 0 \quad \text{for all real numbers } x,y \]

  • Total probability normalization (integral over full plane equals 1): \[ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f(x,y)\,dx\,dy = 1 \]

  • Rectangular region probability: \[ \mathbb{P}(a<X<b,\ c<Y<d) = \int_{c}^{d} \int_{a}^{b} f(x,y)\,dx\,dy \]

  • General planar region probability: For any 2D area \(A \subset \mathbb{R}^2\), \[ \mathbb{P}\big\{(X,Y)\in A\big\} = \iint_A f(x,y)\,dx\,dy \]

  • Marginal probability density functions (integrate out the other variable): \[ f_X(x) = \int_{-\infty}^{\infty} f(x,y)\,dy,\qquad f_Y(y) = \int_{-\infty}^{\infty} f(x,y)\,dx \]

7.1 Example 5: Joint PDF \(f(x,y)=x+y,\ 0<x<1,\ 0<y<1\)

Problem Statement

The joint PDF of continuous random variables \(X,Y\) is \[ f(x,y) = \begin{cases} x+y & 0<x<1,\ 0<y<1 \\ 0 & \text{otherwise} \end{cases} \] Calculate the marginal density functions \(f_X(x)\) and \(f_Y(y)\) over the support region \((0,1)\times(0,1)\).

Step-by-Step Analytical Derivation for \(f_X(x)\)

By definition of marginal PDF: \[ f_X(x) = \int_{-\infty}^{\infty} f(x,y)\,dy \] Outside \(0<y<1\), \(f(x,y)=0\), so we restrict integration bounds to \(y=0\) to \(y=1\): \[ f_X(x) = \int_{0}^{1} (x+y)\,dy,\quad 0<x<1 \] Integrate term-by-term with respect to \(y\) (treat \(x\) as constant): \[ \begin{aligned} \int_{0}^{1} (x+y)\,dy &= \int_{0}^{1} x\,dy + \int_{0}^{1} y\,dy \\ &= x\big[y\big]_{0}^{1} + \left[\frac{1}{2}y^2\right]_{0}^{1} \\ &= x(1-0) + \left(\frac{1}{2}(1)^2 - \frac{1}{2}(0)^2\right) \\ &= x + \frac{1}{2} \end{aligned} \] Final marginal for \(X\): \[ f_X(x) = \begin{cases} x+\dfrac12 & 0<x<1 \\ 0 & \text{elsewhere} \end{cases} \]

Step-by-Step Analytical Derivation for \(f_Y(y)\)

Symmetric calculation integrating out \(x\): \[ f_Y(y) = \int_{-\infty}^{\infty} f(x,y)\,dx = \int_{0}^{1} (x+y)\,dx,\quad 0<y<1 \] Integrate term-by-term with respect to \(x\) (treat \(y\) as constant): \[ \begin{aligned} \int_{0}^{1} (x+y)\,dx &= \int_{0}^{1} x\,dx + \int_{0}^{1} y\,dx \\ &= \left[\frac12 x^2\right]_{0}^{1} + y\big[x\big]_{0}^{1} \\ &= \left(\frac12(1)^2 - \frac12(0)^2\right) + y(1-0) \\ &= \frac12 + y \end{aligned} \] Final marginal for \(Y\): \[ f_Y(y) = \begin{cases} y+\dfrac12 & 0<y<1 \\ 0 & \text{elsewhere} \end{cases} \]

Consistency Check (Verify Marginal Integrals Equal 1)

Check \(\int_0^1 f_X(x)dx = 1\): \[ \int_{0}^{1}\left(x+\frac12\right)dx = \left[\frac12 x^2 + \frac12 x\right]_0^1 = \left(\frac12+\frac12\right)-0 = 1 \] Check \(\int_0^1 f_Y(y)dy = 1\): \[ \int_{0}^{1}\left(y+\frac12\right)dy = \left[\frac12 y^2 + \frac12 y\right]_0^1 = \left(\frac12+\frac12\right)-0 = 1 \] Both marginal PDFs satisfy the normalization rule for valid density functions.

import sympy as sp

# Define symbolic variables
x, y = sp.Symbol('x'), sp.Symbol('y')

# Define joint PDF over support 0<x<1, 0<y<1
f = x + y

# Compute marginal PDF f_X(x) = integrate f w.r.t y from 0 to 1
f_X = sp.integrate(f, (y, 0, 1))
# Compute marginal PDF f_Y(y) = integrate f w.r.t x from 0 to 1
f_Y = sp.integrate(f, (x, 0, 1))

# Print symbolic results
print("Marginal PDF f_X(x) =", f_X)
## Marginal PDF f_X(x) = x + 1/2
print("Marginal PDF f_Y(y) =", f_Y)
## Marginal PDF f_Y(y) = y + 1/2
# Verify normalization integral for f_X
norm_X = sp.integrate(f_X, (x, 0, 1))
print("\nIntegral of f_X(x) over [0,1] =", norm_X)
## 
## Integral of f_X(x) over [0,1] = 1
# Verify normalization integral for f_Y
norm_Y = sp.integrate(f_Y, (y, 0, 1))
print("Integral of f_Y(y) over [0,1] =", norm_Y)
## Integral of f_Y(y) over [0,1] = 1

8 Covariance Matrix

Matrix notation provides a compact way to quantify pairwise relationships between a collection of random variables. The covariance matrix assembles variances (self-spread) and covariances (cross-linear association) for every pair of random variables in the set.

Definition of Population Covariance Matrix

Let \(\{X_1,X_2,\dots,X_p\}\) be a set of \(p\) random variables. The covariance matrix \(\boldsymbol{\Sigma}\in\mathbb{R}^{p\times p}\) is defined element-wise: \[ \boldsymbol{\Sigma}_{ij}= \begin{cases} \operatorname{Var}(X_i) & i=j \quad (\text{main diagonal: variance of }X_i)\\ \operatorname{Cov}(X_i,X_j) & i\neq j \quad (\text{off-diagonal: covariance between }X_i,X_j) \end{cases} \] Full matrix expansion: \[ \boldsymbol{\Sigma}= \begin{bmatrix} \operatorname{Var}(X_1) & \operatorname{Cov}(X_1,X_2) & \dots & \operatorname{Cov}(X_1,X_p)\\ \operatorname{Cov}(X_2,X_1) & \operatorname{Var}(X_2) & \dots & \operatorname{Cov}(X_2,X_p)\\ \vdots & \vdots & \ddots & \vdots\\ \operatorname{Cov}(X_p,X_1) & \operatorname{Cov}(X_p,X_2) & \dots & \operatorname{Var}(X_p) \end{bmatrix} = \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \dots & \sigma_{1p}\\ \sigma_{21} & \sigma_2^2 & \dots & \sigma_{2p}\\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{p1} & \sigma_{p2} & \dots & \sigma_p^2 \end{bmatrix} \] Key property: Covariance is symmetric \(\operatorname{Cov}(X_i,X_j)=\operatorname{Cov}(X_j,X_i)\), so \(\boldsymbol{\Sigma}=\boldsymbol{\Sigma}^T\) (symmetric square matrix).

Simple interpretation: Diagonal entries measure individual variable spread; off-diagonals measure linear co-movement between two distinct variables.

8.1 Example 6: Sample Covariance Matrix Calculation

Given Sample Dataset

5 variables (\(A,B,C,D,E\)), \(n=6\) sample observations per variable:

\[ \begin{array}{c|cccccc} Variable & Obs1 & Obs2 & Obs3 & Obs4 & Obs5 & Obs6 \\ \hline A & 1 & 2 & 3 & 4 & 5 & 6 \\ B & 2 & 3 & 5 & 6 & 1 & 9 \\ C & 3 & 5 & 5 & 5 & 10 & 8 \\ D & 10 & 20 & 30 & 40 & 50 & 55 \\ E & 7 & 8 & 9 & 4 & 6 & 10 \end{array} \] Sample covariance formula (unbiased, denominator \(n-1\)): \[ \widehat{\operatorname{Cov}}(X_i,X_j)=\frac{1}{n-1}\sum_{k=1}^n \big(X_{i,k}-\bar{X}_i\big)\big(X_{j,k}-\bar{X}_j\big),\quad \widehat{\operatorname{Var}}(X_i)=\widehat{\operatorname{Cov}}(X_i,X_i) \] \(n=6\), so \(n-1=5\).

Step 1: Compute Sample Means for Each Variable

\[ \begin{aligned} \bar{A}&=\frac{1+2+3+4+5+6}{6}=\frac{21}{6}=3.5 \\ \bar{B}&=\frac{2+3+5+6+1+9}{6}=\frac{26}{6}\approx4.3333333 \\ \bar{C}&=\frac{3+5+5+5+10+8}{6}=\frac{36}{6}=6 \\ \bar{D}&=\frac{10+20+30+40+50+55}{6}=\frac{205}{6}\approx34.1666667 \\ \bar{E}&=\frac{7+8+9+4+6+10}{6}=\frac{44}{6}\approx7.3333333 \end{aligned} \]

Step 2: Centered Data Matrix (Subtract Row Mean from Each Entry)

Centered row = Original row \(-\) row mean: \[ \begin{aligned} \tilde{A}&=[1-3.5,\ 2-3.5,\ 3-3.5,\ 4-3.5,\ 5-3.5,\ 6-3.5] =[-2.5,\ -1.5,\ -0.5,\ 0.5,\ 1.5,\ 2.5] \\ \tilde{B}&=\left[2-\tfrac{26}{6},\ 3-\tfrac{26}{6},\ 5-\tfrac{26}{6},\ 6-\tfrac{26}{6},\ 1-\tfrac{26}{6},\ 9-\tfrac{26}{6}\right] =\left[-\tfrac{14}{6},\ -\tfrac{8}{6},\ \tfrac{4}{6},\ \tfrac{10}{6},\ -\tfrac{20}{6},\ \tfrac{28}{6}\right] \\ \tilde{C}&=[3-6,\ 5-6,\ 5-6,\ 5-6,\ 10-6,\ 8-6] =[-3,\ -1,\ -1,\ -1,\ 4,\ 2] \\ \tilde{D}&=\left[10-\tfrac{205}{6},\ 20-\tfrac{205}{6},\ 30-\tfrac{205}{6},\ 40-\tfrac{205}{6},\ 50-\tfrac{205}{6},\ 55-\tfrac{205}{6}\right] =\left[-\tfrac{145}{6},\ -\tfrac{85}{6},\ -\tfrac{25}{6},\ \tfrac{35}{6},\ \tfrac{95}{6},\ \tfrac{125}{6}\right] \\ \tilde{E}&=\left[7-\tfrac{44}{6},\ 8-\tfrac{44}{6},\ 9-\tfrac{44}{6},\ 4-\tfrac{44}{6},\ 6-\tfrac{44}{6},\ 10-\tfrac{44}{6}\right] =\left[\tfrac{2}{6},\ \tfrac{8}{6},\ \tfrac{14}{6},\ -\tfrac{20}{6},\ -\tfrac{8}{6},\ \tfrac{16}{6}\right] \end{aligned} \]

Step 3: Sample Covariance Matrix Formula via Centered Matrix

Let \(\widetilde{\mathbf{X}}\) be the \(5\times6\) centered data matrix. The unbiased sample covariance matrix: \[ \widehat{\boldsymbol{\Sigma}}=\frac{1}{n-1}\widetilde{\mathbf{X}}\,\widetilde{\mathbf{X}}^T \] \(n-1=5\), so scale the outer product by \(\tfrac15\).

Verified Numerical Sample Covariance Matrix (Matching Sage Output) \[ \widehat{\boldsymbol{\Sigma}}= \begin{bmatrix} 3.5000000 & 3.0000000 & 0.4000000 & 32.5000000 & 0.4000000 \\ 3.0000000 & 8.6666667 & 0.4000000 & 25.3333333 & 2.4666667 \\ 0.4000000 & 0.4000000 & 38.0000000 & 0.4000000 & 0.4000000 \\ 32.5000000 & 25.3333333 & 0.4000000 & 304.1666667 & 1.3333333 \\ 0.4000000 & 2.4666667 & 0.4000000 & 1.3333333 & 4.6666667 \end{bmatrix} \] Validation Reason: This matrix is symmetric, diagonal entries are sample variances, off-diagonals sample covariances, computed with unbiased \(n-1\) scaling matching Python results.

import numpy as np

# Step 1: Define dataset matrix (rows = variables A,B,C,D,E; columns = observations)
data = np.array([
    [1, 2, 3, 4, 5, 6],    # A
    [2, 3, 5, 6, 1, 9],    # B
    [3, 5, 5, 5, 10, 8],   # C
    [10, 20, 30, 40, 50, 55],# D
    [7, 8, 9, 4, 6, 10]    # E
])

# Step 2: NumPy built-in cov function (rowvar=True: rows = variables, ddof=1 for unbiased n-1)
cov_matrix = np.cov(data, rowvar=True, ddof=1)

# Print formatted 7-decimal output matching Sage
print("Sample Covariance Matrix (unbiased, ddof=1):")
## Sample Covariance Matrix (unbiased, ddof=1):
print(np.round(cov_matrix, 7))
## [[  3.5         3.          4.         32.5         0.4      ]
##  [  3.          8.6666667   0.4        25.3333333   2.4666667]
##  [  4.          0.4         6.4        38.          0.4      ]
##  [ 32.5        25.3333333  38.        304.1666667   1.3333333]
##  [  0.4         2.4666667   0.4         1.3333333   4.6666667]]
# Optional manual calculation to verify formula 1/(n-1)*X_centered @ X_centered.T
n_vars, n_samples = data.shape
data_centered = data - np.mean(data, axis=1, keepdims=True)
cov_manual = (1/(n_samples - 1)) * data_centered @ data_centered.T
print("\nManual Centered Matrix Calculation Match Check:")
## 
## Manual Centered Matrix Calculation Match Check:
print(np.allclose(cov_matrix, cov_manual))
## True