当前位置：首页 > news >正文

【Spark】What is the difference between Input and Shuffle Read

news 2025/7/12 17:41:10

Spark调参过程中
保持每个task的 input + shuffle read 量在300-500M左右比较合适

The Spark UI is documented here: https://spark.apache.org/docs/3.0.1/web-ui.html

The relevant paragraph reads:

Input: Bytes read from storage in this stage
Output: Bytes written in storage in this stage
Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage

http://www.lryc.cn/news/214938.html

相关文章：

redis相关的一些面试题？

什么是Babel？它的主要作用是什么？

【APP】go-musicfox - 一款网易云音乐命令行客户端, 文件很小Mac版本只有16.5M

P1284 三角形牧场

【Linux】：Linux开发工具之Linux编辑器vim的使用

PFMEA详解结构分析——Sun FMEA软件

Qt扫盲-QFutureWatcher理论总结

对比学习（contrastive Learning）

译文：我们如何使 Elasticsearch 7.11 中的 date_histogram 聚合比以往更快

python设计模式4：适配器模式

kubectl资源管理命令---声明式

IDEA使用-通过Database面板访问数据库

单片机如何写好一个模块的驱动文件

【C++笔记】C++多态

不想改代码！这样实现Reverse Sync测量时间同步精度

【webrtc】对视频质量的码率控制的测试与探索

2003 - Can‘t connect to MysQL server on ‘39.108.169.0‘ (10060 “Unknown error“）

Python算法——选择排序

从「码农」到管理者，E人程序员的十年蜕变

ant Java任务的jvmargs属性和＜jvmarg＞内嵌元素

XML External Entity-XXE-XML实体注入

生态扩展Spark Doris Connector

构建 hive 时间维表

Pycharm安装jupyter和d2l

虹科案例 | AR内窥镜手术应用为手术节约45分钟？

纳米银线纳米银纳米线平均直径: 50-100nm

力扣labuladong——一刷day15

【开题报告】基于微信小程序的母婴商品仓储管理系统的设计与实现

【原创】java+swing+mysql校园论坛管理系统设计与实现