当前位置: 首页 > article >正文

Python训练营-Day22-Titanic - Machine Learning from Disaster

Description

linkkeyboard_arrow_up

👋🛳️ Ahoy, welcome to Kaggle! You’re in the right place.

This is the legendary Titanic ML competition – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.

If you want to talk with other users about this competition, come join our Discord! We've got channels for competitions, job postings and career discussions, resources, and socializing with your fellow data scientists. Follow the link here: https://discord.gg/kaggle

The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.

Read on or watch the video below to explore more details. Once you’re ready to start competing, click on the "Join Competition button to create an account and gain access to the competition data. Then check out Alexis Cook’s Titanic Tutorial that walks you through step by step how to make your first submission!

1.训练模型

import pandas as pd
from sklearn.ensemble import RandomForestClassifier# 1. 读取训练数据
train_df = pd.read_csv('titanic/train.csv')  # 如果你的train.csv在data文件夹下# 2. 数据预处理# 映射性别为数值:male -> 0, female -> 1
train_df['Sex'] = train_df['Sex'].map({'male': 0, 'female': 1})# 用中位数填补 Age 和 Fare 的缺失值
train_df['Age'].fillna(train_df['Age'].median(), inplace=True)
train_df['Fare'].fillna(train_df['Fare'].median(), inplace=True)# 填补 Embarked 缺失值,并做独热编码
train_df['Embarked'].fillna('S', inplace=True)
embarked_dummies = pd.get_dummies(train_df['Embarked'], prefix='Embarked')
train_df = pd.concat([train_df, embarked_dummies], axis=1)# 3. 选择特征列(可根据需要扩展)
feature_cols = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare','Embarked_C', 'Embarked_Q', 'Embarked_S']
X = train_df[feature_cols]
y = train_df['Survived']# 4. 模型训练
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)print("模型训练完成!")

2.输入测试集并预测

# ==========================
# 1. 读取训练数据并训练模型
# ==========================
train_df = pd.read_csv('titanic/train.csv')# 性别映射
train_df['Sex'] = train_df['Sex'].map({'male': 0, 'female': 1})# 缺失值处理
train_df['Age'].fillna(train_df['Age'].median(), inplace=True)
train_df['Fare'].fillna(train_df['Fare'].median(), inplace=True)
train_df['Embarked'].fillna('S', inplace=True)# 独热编码 Embarked
embarked_dummies = pd.get_dummies(train_df['Embarked'], prefix='Embarked')
train_df = pd.concat([train_df, embarked_dummies], axis=1)# 选择特征
feature_cols = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare','Embarked_C', 'Embarked_Q', 'Embarked_S']
X = train_df[feature_cols]
y = train_df['Survived']# 模型训练
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)print("✅ 模型训练完成")# ==========================
# 2. 加载测试数据并做预测
# ==========================
test_df = pd.read_csv('titanic/test.csv')# 同样的预处理
test_df['Sex'] = test_df['Sex'].map({'male': 0, 'female': 1})
test_df['Age'].fillna(train_df['Age'].median(), inplace=True)    # 用训练集的中位数更稳健
test_df['Fare'].fillna(train_df['Fare'].median(), inplace=True)
test_df['Embarked'].fillna('S', inplace=True)# 独热编码 Embarked
embarked_dummies_test = pd.get_dummies(test_df['Embarked'], prefix='Embarked')# 保证测试集也包含这三列(某些类别可能缺失)
for col in ['Embarked_C', 'Embarked_Q', 'Embarked_S']:if col not in embarked_dummies_test:embarked_dummies_test[col] = 0test_df = pd.concat([test_df, embarked_dummies_test], axis=1)# 确保列顺序一致
X_test = test_df[feature_cols]# 预测
predictions = model.predict(X_test)# ==========================
# 3. 生成提交文件
# ==========================
submission = pd.DataFrame({'PassengerId': test_df['PassengerId'],'Survived': predictions
})
submission.to_csv('submission.csv', index=False)print("✅ 预测完成,提交文件已保存为 submission.csv")

3.提交代码

@浙大疏锦行 

http://www.lryc.cn/news/2404726.html

相关文章:

  • FreeCAD:开源世界的三维建模利器
  • 指针的定义与使用
  • 嵌入式里的时间魔法:RTC 与 BKP 深度拆解
  • Java项目中常用的中间件及其高频问题避坑
  • 图卷积网络:从理论到实践
  • ES 学习总结一 基础内容
  • Maven 构建缓存与离线模式
  • 基于51单片机的光强控制LED灯亮灭
  • 【Linux操作系统】基础开发工具(yum、vim、gcc/g++)
  • gopool 源码分析
  • 【Survival Analysis】【机器学习】【3】 SHAP可解釋 AI
  • ModuleNotFoundError No module named ‘torch_geometric‘未找到
  • iOS 门店营收表格功能的实现
  • 链表题解——环形链表【LeetCode】
  • Cell-o1:强化学习训练LLM解决单细胞推理问题
  • 求解插值多项式及其余项表达式
  • vue3: bingmap using typescript
  • vue3前端实现导出Excel功能
  • 超大规模芯片验证:基于AMD VP1902的S8-100原型验证系统实测性能翻倍
  • 【工作记录】接口功能测试总结
  • Dubbo Logback 远程调用携带traceid
  • 【element-ui】el-autocomplete实现 无数据匹配
  • NLP学习路线图(二十):FastText
  • 力扣面试150题--除法求值
  • SQL进阶之旅 Day 20:锁与并发控制技巧
  • 美业破局:AI智能体如何用数据重塑战略决策(5/6)
  • 生成模型+两种机器学习范式
  • 【学习笔记】Python金融基础
  • 在Linux查看电脑的GPU型号
  • A Execllent Software Project Review and Solutions