当前位置: 首页 > news >正文

24Hibench

1. Hibench

官网

​ HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Repartition, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.

1.1 workloads

There are totally 29 workloads in HiBench. The workloads are divided into 6 categories which are micro, ml(machine learning), sql, graph, websearch and streaming.

1.2 install maven

首先需要安装maven,并配好环境安装教程

mkdir repo
cd conf
vim settings.xml

修改仓库地址

<localRepository>/opt/module/maven/apache-maven-3.8.6/repo</localRepository>

阿里云镜像文件中已经有了,注释掉其他mirror

  <mirror><id>alimaven</id><name>aliyun maven</name><url>http://maven.aliyun.com/nexus/content/groups/public/</url><mirrorOf>central</mirrorOf>
</mirror>

1.3 bulid

下载zip文件,上传解压后的文件 不要使用7.11 会出现版本问题

参照文档中的,构建HiBench项目,我使用的是全部安装:

#ALL
mvn -Dspark=3.1 -Dscala=2.12 clean package
#SPARK
mvn -Psparkbench -Dspark=3.1 -Dscala=2.12 clean package

image-20221113191059611

如果出现以下错误:

image-20221113171241273

原因是maven没有安装好,没有设置好镜像以及安装仓库,详情见安装教程

build成功

image-20221114134337141

第二次

image-20221214105422318

1.4 configure

hadoop.conf

image-20221122101805886

spark.conf

image-20221122101910798

Input data size

image-20230408154837846

if you chose a real large data size ,you may find the errors:

image-20230608165007024

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

you need to modify the mapred-site.xml, and add the context:

<property><name>mapred.task.timeout</name><value>800000</value><final>true</final>
</property>
cluster mode
vim /opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh# 修改run_spark_job 方法

image-20230702113310572

1.5 hadoop example

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/hadoop/run.sh

官网地址

运行成功

image-20221122121433468

image-20221122121637126

更详细的介绍

/opt/module/Hibench/HiBench-master/report/wordcount/hadoop

1.6 spark example

准备输入数据

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

控制台输出

image-20221122102053983

yarn

image-20221122102505411

进入HDFS查看准备的输入数据

image-20221122102227590

准备命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

注意jar文件夹中不要包含其他备用的jar包

运行命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/spark/run.sh

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

成功!

image-20221214112321741

结果

image-20230610145858128

trouble

网络配置问题

所有任务在yarn上都用的是内网IP

image-20230608180507018

image-20230609113252954

Permission denied

# 进入bin目录
chmod -R +x ./bin/

multi-job

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/hibench.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/spark.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/workloads/websearch/pagerank.conf
probe sleep jar: /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
ERROR, execute cmd: '( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING )' timedout.STDOUT:STDERR:Please check!
Traceback (most recent call last):File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 685, in <module>load_config(conf_root, workload_configFile, workload_folder, patching_config)File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 217, in load_configgenerate_optional_value()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 613, in generate_optional_valueprobe_masters_slaves_hostnames()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 549, in probe_masters_slaves_hostnamesprobe_masters_slaves_by_Yarn()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 500, in probe_masters_slaves_by_Yarnassert 0, "Get workers from yarn-site.xml page failed, reason:%s\nplease set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually" % e
AssertionError: Get workers from yarn-site.xml page failed, reason:( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING ) executed timedout for 5 seconds
please set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually
start ScalaSparkPagerank bench
/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh: line 38: .: filename argument required
.: usage: . filename [arguments]
/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh: line 26: OUTPUT_HDFS: unbound variable

原因可能是系统负载过高导致的响应迟钝

image-20230710195619538

2.records

parallelism = 18

有input的stage的任务数是由数据数据的大小决定的,spark.default.parallelism决定的是shuffle后的stage的任务数

LR

内存不够

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692258304054&to=1692264023063&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817154625-0003&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222144197

http://192.168.10.102:18080/history/app-20230817154625-0003/stages/

image-20230817182124188

image-20230823160549361

Bayes

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692265076383&to=1692265140060&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817173746-0004&var-groupbyInterval=1s

image-20230817182326359

http://192.168.10.102:18080/history/app-20230817173746-0004/jobs/

image-20230817182448207

image-20230817182410877

image-20230823160616907

NWeightGraphX

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222031686

ScalaPageRank

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692277885573&to=1692278062264&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817211121-0012&var-groupbyInterval=1s

image-20230817212034137

http://192.168.10.102:18080/history/app-20230817211121-0012/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

distinct

image-20230817211940302

flatMap

image-20230817211756998

DenseKMeans

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692273890319&to=1692274051643&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817200442-0009&var-groupbyInterval=1s

image-20230817203320695

http://192.168.10.102:18080/history/app-20230817200442-0009/stages/

image-20230817203354123

image-20230823160320490

map

image-20230817210720121

collect

image-20230817210901877

image-20230817210942909

sort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692276453862&to=1692276644236&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817204743-0011&var-groupbyInterval=1s

image-20230814105408629

http://192.168.10.102:18080/history/app-20230817204743-0011/stages/

image-20230817205354207

map

image-20230817205420719

reduce

image-20230817205507056

TeraSort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692275803257&to=1692276079162&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817203719-0010&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

http://192.168.10.102:18080/history/app-20230817203719-0010/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

map

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

reduce

image-20230817205732605

WordCount

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1691983226006&to=1691983545077&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

image-20230814112936597

parallelism=20

image-20230817210134565

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

image-20230814113241553

reduce

image-20230814113216632

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

[外链图片转存中…(img-eGnFeoml-1696143711685)]

parallelism=20

[外链图片转存中…(img-tqMq87MT-1696143711685)]

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

[外链图片转存中…(img-ckoZlP4I-1696143711686)]

reduce

[外链图片转存中…(img-yA8CbUjp-1696143711686)]

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

http://www.lryc.cn/news/180167.html

相关文章:

  • VC++父进程交互式操作子进程标准输入输出
  • 一步一招,教你如何制作出成功的优惠促销微传单
  • 27、Flink 的SQL之SELECT (Pattern Recognition 模式检测)介绍及详细示例(7)
  • Git使用【上】
  • flink的序列化基准测试
  • Error: node: unknown or unsupported macOS version: :dunno 错误解决
  • 嵌入式Linux应用开发-基础知识-第十八章系统对中断的处理②
  • Kolmogorov-Smirnov正态性检验
  • BI神器Power Query(25)-- 使用PQ实现表格多列转换(1/3)
  • windows系统一键开启和关闭虚拟化
  • NSSCTF做题(5)
  • java基础题——二维数组的基本应用
  • Leetcode 2119.反转两次的数字
  • BI神器Power Query(27)-- 使用PQ实现表格多列转换(3/3)
  • VUE3照本宣科——认识VUE3
  • 《计算机视觉中的多视图几何》笔记(12)
  • TFT LCD刷新原理及LCD时序参数总结(LCD时序,写的挺好)
  • 基于Java的电影院购票系统设计与实现(源码+lw+部署文档+讲解等)
  • Linux基础指令(六)
  • Anderson-Darling正态性检验【重要统计工具】
  • Ubuntu基于Docker快速配置GDAL的Python、C++环境
  • <C++> 哈希表模拟实现STL_unordered_set/map
  • 【数据结构与算法】通过双向链表和HashMap实现LRU缓存 详解
  • MySQL的内置函数
  • 数据结构与算法-(7)---栈的应用-(3)表达式转换
  • Lilliefors正态性检验(一种非参数统计方法)
  • 【云原生】配置Kubernetes CronJob自动备份MySQL数据库(单机版)
  • 基于PSO算法的功率角摆动曲线优化研究(Matlab代码实现)
  • 数论知识点总结(一)
  • 知识分享 钡铼网关功能介绍:使用SSLTLS 加密,保证MQTT通信安全