当前位置：首页 > news >正文

Python训练营打卡 Day55

news 2025/6/20 20:13:47

序列预测任务介绍

知识点回顾

序列预测介绍
1. 单步预测
2. 多步预测的2种方式
序列数据的处理：滑动窗口
多输入多输出任务的思路
经典机器学习在序列任务上的劣势；以随机森林为例

作业：手动构造类似的数据集（如cosx数据），观察不同的机器学习模型的差异

序列预测任务介绍

知识点回顾

序列预测介绍
- 单步预测：根据历史数据预测下一个时间点的值。这就像根据之前的天气情况预测明天的气温。
- 多步预测的2种方式：
  - 递归预测：使用模型多次预测，将前一次的预测结果作为下一次的输入。这就像连续多天的天气预报，每次都用前一天的预测结果来预测下一天。
  - 直接预测：模型直接输出多个时间点的预测值。这就像一次性预测未来一周的天气情况。
序列数据的处理：滑动窗口
- 滑动窗口：将序列数据分割成多个窗口，每个窗口包含固定数量的历史数据点，用于预测下一个或多个数据点。这就像用一个固定大小的框架在数据上滑动，每次只看框架内的数据来做出预测。
多输入多输出任务的思路
- 多输入多输出：在序列预测中，模型可以接收多个历史数据点作为输入，并输出多个未来数据点。这就像根据多天的天气数据预测接下来几天的天气。
经典机器学习在序列任务上的劣势；以随机森林为例
- 劣势：经典机器学习模型（如随机森林）在处理序列数据时，难以捕捉数据的长期依赖关系和时间序列的动态特性。这就像用静态的图片来预测视频的下一帧，忽略了时间上的连贯性。

作业

手动构造类似的数据集（如 cosx 数据），观察不同的机器学习模型的差异。

构造数据集：创建一个基于 cosx 函数的时间序列数据集，模拟真实世界的序列数据。
模型比较：使用不同的机器学习模型（如线性回归、随机森林、LSTM 等）进行序列预测，比较它们的预测效果。

用比喻解释

序列预测：就像根据过去几天的天气情况预测未来的天气，单步预测是预测明天的天气，多步预测是预测未来一周的天气。
滑动窗口：类似于用一个固定的观察窗口来看过去几天的天气数据，以此为基础来预测未来的天气。
多输入多输出：就像根据过去一周的天气数据，直接预测出未来三天的天气情况。
经典机器学习的劣势：随机森林等经典模型在预测天气时，无法像专门的时间序列模型那样，很好地考虑天气随时间变化的规律。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import lightgbm as lgb# 设置随机种子
np.random.seed(42)# 生成合成时间序列数据
x = np.linspace(0, 100, 1000)
y = np.cos(x) + 0.1 * x + np.random.normal(0, 0.5, 1000)# 数据预处理
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_y = scaler.fit_transform(y.reshape(-1, 1)).flatten()# 创建序列数据
def create_sequences(data, seq_length):X, y = [], []for i in range(len(data) - seq_length):X.append(data[i:i+seq_length])y.append(data[i+seq_length])return np.array(X), np.array(y)seq_length = 30
X, y = create_sequences(scaled_y, seq_length)# 划分训练集和测试集
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]# 准备数据以适应机器学习模型
n_samples_train = X_train.shape[0]
n_samples_test = X_test.shape[0]
X_train_rf = X_train.reshape(n_samples_train, -1)
X_test_rf = X_test.reshape(n_samples_test, -1)# 训练随机森林模型
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train_rf, y_train)
train_predict_rf = rf_model.predict(X_train_rf)
test_predict_rf = rf_model.predict(X_test_rf)# 训练 LightGBM 模型
lgb_model = lgb.LGBMRegressor(random_state=42)
lgb_model.fit(X_train_rf, y_train)
train_predict_lgb = lgb_model.predict(X_train_rf)
test_predict_lgb = lgb_model.predict(X_test_rf)# 反标准化预测结果
train_predict_rf = scaler.inverse_transform(train_predict_rf.reshape(-1, 1))
test_predict_rf = scaler.inverse_transform(test_predict_rf.reshape(-1, 1))
train_predict_lgb = scaler.inverse_transform(train_predict_lgb.reshape(-1, 1))
test_predict_lgb = scaler.inverse_transform(test_predict_lgb.reshape(-1, 1))y_train_orig = scaler.inverse_transform(y_train.reshape(-1, 1))
y_test_orig = scaler.inverse_transform(y_test.reshape(-1, 1))# 计算 RMSE
rf_train_rmse = np.sqrt(mean_squared_error(y_train_orig, train_predict_rf))
rf_test_rmse = np.sqrt(mean_squared_error(y_test_orig, test_predict_rf))
lgb_train_rmse = np.sqrt(mean_squared_error(y_train_orig, train_predict_lgb))
lgb_test_rmse = np.sqrt(mean_squared_error(y_test_orig, test_predict_lgb))# 可视化结果
plt.figure(figsize=(15, 7))
plt.plot(y, label='原始数据', color='gray', alpha=0.5)# 随机森林结果
train_predict_plot_rf = np.empty_like(y)
train_predict_plot_rf[:] = np.nan
train_predict_plot_rf[seq_length:len(train_predict_rf) + seq_length] = train_predict_rf.flatten()
test_predict_plot_rf = np.empty_like(y)
test_predict_plot_rf[:] = np.nan
test_predict_plot_rf[len(train_predict_rf) + seq_length:] = test_predict_rf.flatten()# LightGBM 结果
train_predict_plot_lgb = np.empty_like(y)
train_predict_plot_lgb[:] = np.nan
train_predict_plot_lgb[seq_length:len(train_predict_lgb) + seq_length] = train_predict_lgb.flatten()
test_predict_plot_lgb = np.empty_like(y)
test_predict_plot_lgb[:] = np.nan
test_predict_plot_lgb[len(train_predict_lgb) + seq_length:] = test_predict_lgb.flatten()plt.plot(train_predict_plot_rf, label='随机森林训练集预测值', color='blue', linestyle='--')
plt.plot(test_predict_plot_rf, label='随机森林测试集预测值', color='red', linestyle='--')
plt.plot(train_predict_plot_lgb, label='LightGBM 训练集预测值', color='green', linestyle=':')
plt.plot(test_predict_plot_lgb, label='LightGBM 测试集预测值', color='orange', linestyle=':')
plt.title('时间序列预测结果对比')
plt.xlabel('时间步')
plt.ylabel('值')
plt.legend()
plt.grid(True)
plt.show()print(f"随机森林训练集 RMSE: {rf_train_rmse:.4f}")
print(f"随机森林测试集 RMSE: {rf_test_rmse:.4f}")
print(f"LightGBM 训练集 RMSE: {lgb_train_rmse:.4f}")
print(f"LightGBM 测试集 RMSE: {lgb_test_rmse:.4f}")

@浙大疏锦行

查看全文

http://www.lryc.cn/news/572548.html