当前位置：首页 > article >正文

机器学习笔记【Week1】

article 2025/8/18 20:56:22

一、机器学习简介（Introduction）

什么是机器学习？

定义（Tom Mitchell）：

“A computer program is said to learn from experience E with respect to some task T and performance measure P, if its performance on T, as measured by P, improves with experience E.”

要素	示例
Task T	识别垃圾邮件、识别人脸、预测房价等
Experience E	历史邮件数据、历史房价数据
Performance P	分类准确率、均方误差（MSE）等

与传统编程的区别：

编程方式	输入	模型	输出
传统编程	规则 + 数据	人工设计的程序	得出结果
机器学习	数据 + 结果	算法训练生成的模型	模型进行预测

二、机器学习的分类

1. 监督学习（Supervised Learning）

有输入 x 和已知输出 y
目标：学习函数 f(x) ≈ y

任务类型：

回归：输出是连续值（如房价）
分类：输出是离散类别（如垃圾邮件）

2. 非监督学习（Unsupervised Learning）

只有输入 x，没有输出标签 y
目标：发现数据内部结构，如聚类或降维

三、线性回归模型（Linear Regression）

什么是线性回归？

线性回归（Linear Regression）是一种回归算法，用于预测一个连续数值型输出（例如房价、工资等），假设输入变量 $x$ 与输出 $y$ 存在线性关系。

问题定义

根据输入变量 x 预测连续输出 y，例如房价预测。

假设函数（Hypothesis）

单变量线性回归的假设函数如下：

$h_\theta(x) = \theta_0 + \theta_1 x$
其中：

$x$ ：输入特征（如房屋面积）
$y$ ：真实标签（如房价）
$h_\theta(x)$ ：模型预测值
$\theta_0$ ：偏置项（intercept）
$\theta_1$ ：权重系数（slope）

在 Python 中表示为：

def hypothesis(theta, x):return theta[0] + theta[1] * x

四、代价函数（Cost Function）

为什么要最小化代价函数（Cost Function）？

预测值和真实值会存在误差，我们需要一个方式来衡量“预测得好不好”。

衡量预测值与实际值偏差的函数：

$J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$
$m$ 是样本数量

该函数是一个关于 $\theta_0, \theta_1$ 的凸函数（抛物面）

Python 实现：

import numpy as npdef compute_cost(X, y, theta):m = len(y)predictions = X @ thetaerrors = predictions - yreturn (1 / (2 * m)) * np.dot(errors.T, errors)

五、梯度下降算法（Gradient Descent）

为什么使用梯度下降法（Gradient Descent）？

代价函数 $J(\theta)$ 是一个关于参数 $\theta$ 的二次函数图像（碗状），我们要找到那个最低点（最优参数）：

梯度下降的思想：

随便选一组参数初始值（如 $\theta_0=0$ , $\theta_1=0$ ）
计算当前点的“斜率方向”（导数）
朝着函数下降最快的方向（梯度的反方向）更新参数
重复多次，直到收敛到最低点（代价函数几乎不变）

梯度下降是如何收敛的？

如果学习率 $\alpha$ 太大，可能“跨过了山谷”，导致震荡甚至发散；
如果太小，虽然能收敛，但需要非常多次迭代；
所以在实践中，一般尝试多个 $\alpha$ ，并可绘制 $J(\theta)$ 的值随时间变化的曲线来观察收敛速度。

为什么加入 $\theta_0$ （bias）项？

假设没有偏置项 $\theta_0$ ，那么模型强制通过原点。显然这不是普适的情况。

例如：

面积为 0 平方米的房子，不一定价格是 0 万元。
所以我们需要一个“可以整体平移”的能力，这就是偏置项的作用。

目标：最小化代价函数，找到最优参数 $\theta$

梯度更新公式：

$\theta_j := \theta_j - \alpha \cdot \frac{\partial}{\partial \theta_j} J(\theta)$

$\alpha$ ：学习率，表示每一步更新的速度
太小 → 收敛慢；太大 → 发散

导数告诉我们：该参数朝哪个方向走会让代价函数变小

在单变量线性回归中：
$\theta_0 := \theta_0 - \alpha \cdot \frac{1}{m} \sum (h_\theta(x^{(i)}) - y^{(i)})$

$\theta_1 := \theta_1 - \alpha \cdot \frac{1}{m} \sum (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)}$

Python 实现：

def gradient_descent(X, y, theta, alpha, iterations):m = len(y)cost_history = []for i in range(iterations):predictions = X @ thetaerrors = predictions - ygradients = (1 / m) * (X.T @ errors)theta -= alpha * gradientscost_history.append(compute_cost(X, y, theta))return theta, cost_history

示例：房价预测（数据 + 可视化 + 训练）

示例数据：

面积 x（平方米）	价格 y（万元）
50	15
70	20
100	30

Python 全流程实现：

import numpy as np
import matplotlib.pyplot as plt# Step 1: 数据定义
X_raw = np.array([50, 70, 100])
y = np.array([15, 20, 30])
m = len(y)# Step 2: 加上 x0 = 1（常数项）
X = np.c_[np.ones(m), X_raw]  # shape = (m, 2)
y = y.reshape(m, 1)
theta = np.zeros((2, 1))# Step 3: 设置学习率和迭代次数
alpha = 0.0001
iterations = 1000# Step 4: 梯度下降训练
theta, cost_history = gradient_descent(X, y, theta, alpha, iterations)# Step 5: 输出结果
print("Learned theta:", theta.ravel())
print("Final cost:", compute_cost(X, y, theta))# Step 6: 可视化拟合
plt.scatter(X_raw, y, color='red', label='Training data')
plt.plot(X_raw, X @ theta, label='Linear regression')
plt.xlabel("Area (sqm)")
plt.ylabel("Price (10k)")
plt.legend()
plt.grid(True)
plt.title("Linear Regression Fit")
plt.show()# Step 7: 可视化代价函数下降过程
plt.plot(range(iterations), cost_history)
plt.xlabel("Iterations")
plt.ylabel("Cost J(θ)")
plt.title("Gradient Descent Cost Convergence")
plt.grid(True)
plt.show()

六、线性代数基础复习（为后续多变量线性回归做准备）

概念	描述
向量	一维数组，如 $xₙ]^T$
矩阵	二维数组，如 $X = [x^{(1)}; x^{(2)}; ...]$
转置	行列互换：`X.T`
矩阵乘法	`A @ B` 表示矩阵点乘，要求形状兼容（如 A 是 m × n，B 必须是 n × 1）