当前位置: 首页 > news >正文

spark兼容性验证

前言

Apache Spark是专门为大规模数据处理而设计的快速通用的计算引擎,Spark拥有Hadoop MapReduce所具有的优点,但不同于Mapreduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好的适用于数据挖掘与机器学习等需要迭代的MapReduce。

Spark是一种与hadoop相似的开源集群计算环境,但是两者之间还存在一些不同之处,Spark启用了内存分布数据集群,除了能够提供交互式查询外,它还可以优化迭代工作负载。

Spark特点:
1、更快的速度:内存计算下,Spark比Hadoop快100倍
2、易用性:可以使用java、scala、python、R和SQL语言进行spark开发,Spark提供了80多个高级运算符
3、通用性:Spark提供了大量的库,包括Spark Core、Spark SQL、Spark Streaming、MLlib、GraphX。开发者可以在一个应用程序中无缝组合使用这些库
4、多种运行环境:Spark可在Hadoop、Apache Mesos、Kubernetes、standalone或其他云环境上运行

参考链接:https://blog.csdn.net/cuiyaonan2000/article/details/116048663

spark适用场景:
1、spark是基于内存的迭代计算,适合多次操作特定数据集的场合
2、数据量不是特别大,但要求实时统计分析需求
3、不适用异步细粒更新状态的应用,如web服务器存储、增量的web爬虫和索引

spark运行模式:
1、可以运行在一台机器上,称为Local(本地)运行模式
2、可以使用spark自带的资源调度系统,称为Standalone模式
3、可以使用Yarn、Mesos、kubernetes作为底层资源调度系统,称为Spark On Yarn、Spark On Mesos、Spark On K8s

参考链接:https://blog.csdn.net/jiayi_yao/article/details/125545826#t8

一、安装启动

安装spark及其依赖
yum install java-1.8.0-openjdk curl tar python3
mkdir -p /usr/local/spark
cd /usr/local/spark
wget https://mirrors.aliyun.com/apache/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz
tar -xvf spark-3.3.2-bin-hadoop3.tgz

  • 启动spark-master
    cd /usr/local/spark/spark-3.3.2-bin-hadoop3/
    [root@bogon spark-3.3.2-bin-hadoop3]# ./sbin/start-master.sh
    可以看到类似如下的输出:
    starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/spark-3.3.2-bin-hadoop3/logs/spark-root-org.apache.spark.deploy.master.Master-1-bogon.out
    用tail命令查看执行日志
[root@bogon spark-3.3.2-bin-hadoop3]# tail logs/spark-root-org.apache.spark.deploy.master.Master-1-bogon.out 
23/03/06 14:35:44 INFO SecurityManager: Changing modify acls to: root
23/03/06 14:35:44 INFO SecurityManager: Changing view acls groups to: 
23/03/06 14:35:44 INFO SecurityManager: Changing modify acls groups to: 
23/03/06 14:35:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
23/03/06 14:35:45 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
23/03/06 14:35:45 INFO Master: Starting Spark master at spark://bogon:7077
23/03/06 14:35:45 INFO Master: Running Spark version 3.3.2
23/03/06 14:35:45 INFO Utils: Successfully started service 'MasterUI' on port 8080.
23/03/06 14:35:45 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://bogon:8080
23/03/06 14:35:46 INFO Master: I have been elected leader! New state: ALIVE
  • 启动spark-worker
    cd /usr/local/spark/spark-3.3.2-bin-hadoop3/
    [root@bogon spark-3.3.2-bin-hadoop3]# ./sbin/start-worker.sh spark://bogon:7077
    可以看到类似如下的输出:
    starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/spark-3.3.2-bin-hadoop3/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bogon.out
    用tail命令查看执行日志
[root@bogon spark-3.3.2-bin-hadoop3]# tail logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bogon.out 
23/03/06 14:52:17 INFO Worker: Spark home: /usr/local/spark/spark-3.3.2-bin-hadoop3
23/03/06 14:52:17 INFO ResourceUtils: ==============================================================
23/03/06 14:52:17 INFO ResourceUtils: No custom resources configured for spark.worker.
23/03/06 14:52:17 INFO ResourceUtils: ==============================================================
23/03/06 14:52:17 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
23/03/06 14:52:17 INFO Utils: Successfully started service 'WorkerUI' on port 8082.
23/03/06 14:52:17 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://bogon:8082
23/03/06 14:52:17 INFO Worker: Connecting to master bogon:7077...
23/03/06 14:52:17 INFO TransportClientFactory: Successfully created connection to bogon/10.130.0.73:7077 after 85 ms (0 ms spent in bootstraps)
23/03/06 14:52:18 INFO Worker: Successfully registered with master spark://bogon:7077

二、测试

  • 通过http://$IP:8080 访问master页面,获取资源消耗等的摘要信息
    如果部署成功,我们可以看到类似如下的返回结果
    spark-master
  • 也可以通过web浏览器请求对应服务器8082端口(默认8081端口,可通过日志查看具体的端口号),查看worker的基本情况
    spark-worker
  • 提交测试任务,计算pi,提交命令如下
    ./bin/spark-submit --master spark://bogon:7077 examples/src/main/python/pi.py 1000
    可以看到类似如下的输出:
 23/03/06 16:32:55 INFO TaskSetManager: Starting task 999.0 in stage 0.0 (TID 999) (10.130.0.73, executor 0, partition 999, PROCESS_LOCAL, 4437 bytes) taskResourceAssignments Map()
23/03/06 16:32:55 INFO TaskSetManager: Finished task 995.0 in stage 0.0 (TID 995) in 217 ms on 10.130.0.73 (executor 0) (996/1000)
23/03/06 16:32:55 INFO TaskSetManager: Finished task 996.0 in stage 0.0 (TID 996) in 220 ms on 10.130.0.73 (executor 0) (997/1000)
23/03/06 16:32:55 INFO TaskSetManager: Finished task 997.0 in stage 0.0 (TID 997) in 198 ms on 10.130.0.73 (executor 0) (998/1000)
23/03/06 16:32:55 INFO TaskSetManager: Finished task 998.0 in stage 0.0 (TID 998) in 189 ms on 10.130.0.73 (executor 0) (999/1000)
23/03/06 16:32:55 INFO TaskSetManager: Finished task 999.0 in stage 0.0 (TID 999) in 238 ms on 10.130.0.73 (executor 0) (1000/1000)
23/03/06 16:32:55 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
23/03/06 16:32:55 INFO DAGScheduler: ResultStage 0 (reduce at /usr/local/spark/spark-3.3.2-bin-hadoop3/examples/src/main/python/pi.py:42) finished in 64.352 s
23/03/06 16:32:55 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
23/03/06 16:32:55 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
23/03/06 16:32:55 INFO DAGScheduler: Job 0 finished: reduce at /usr/local/spark/spark-3.3.2-bin-hadoop3/examples/src/main/python/pi.py:42, took 64.938378 s
Pi is roughly 3.133640
23/03/06 16:32:55 INFO SparkUI: Stopped Spark web UI at http://bogon:4040
23/03/06 16:32:55 INFO StandaloneSchedulerBackend: Shutting down all executors
23/03/06 16:32:55 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/03/06 16:32:55 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/03/06 16:32:55 INFO MemoryStore: MemoryStore cleared
23/03/06 16:32:55 INFO BlockManager: BlockManager stopped
23/03/06 16:32:55 INFO BlockManagerMaster: BlockManagerMaster stopped
23/03/06 16:32:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/03/06 16:32:55 INFO SparkContext: Successfully stopped SparkContext
23/03/06 16:32:56 INFO ShutdownHookManager: Shutdown hook called
23/03/06 16:32:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-087c8a21-8641-4b20-8b65-be47b77f26c5
23/03/06 16:32:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-24d1bc1a-841a-435e-8263-ad891e2aaa97/pyspark-f67ad2cb-ec86-41eb-ae7c-8fcb46e66827
23/03/06 16:32:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-24d1bc1a-841a-435e-8263-ad891e2aaa97

可以在网页界面看到运行结果

也可以查看执行日志

[root@bogon spark-3.3.2-bin-hadoop3]# tail logs/spark-root-org.apache.spark.deploy.worker.Worker-1-bogon.out 
23/03/06 16:31:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
23/03/06 16:31:48 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-8.1.10.lns8.loongarch64/jre/bin/java" "-cp" "/usr/local/spark/spark-3.3.2-bin-hadoop3/conf/:/usr/local/spark/spark-3.3.2-bin-hadoop3/jars/*" "-Xmx1024M" "-Dspark.driver.port=46821" "-XX:+IgnoreUnrecognizedVMOptions" "--add-opens=java.base/java.lang=ALL-UNNAMED" "--add-opens=java.base/java.lang.invoke=ALL-UNNAMED" "--add-opens=java.base/java.lang.reflect=ALL-UNNAMED" "--add-opens=java.base/java.io=ALL-UNNAMED" "--add-opens=java.base/java.net=ALL-UNNAMED" "--add-opens=java.base/java.nio=ALL-UNNAMED" "--add-opens=java.base/java.util=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent=ALL-UNNAMED" "--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED" "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" "--add-opens=java.base/sun.nio.cs=ALL-UNNAMED" "--add-opens=java.base/sun.security.action=ALL-UNNAMED" "--add-opens=java.base/sun.util.calendar=ALL-UNNAMED" "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@bogon:46821" "--executor-id" "0" "--hostname" "10.130.0.73" "--cores" "4" "--app-id" "app-20230306163148-0000" "--worker-url" "spark://Worker@10.130.0.73:35787"
23/03/06 16:32:55 INFO Worker: Asked to kill executor app-20230306163148-0000/0
23/03/06 16:32:55 INFO ExecutorRunner: Runner thread for executor app-20230306163148-0000/0 interrupted
23/03/06 16:32:55 INFO ExecutorRunner: Killing process!
23/03/06 16:32:55 INFO Worker: Executor app-20230306163148-0000/0 finished with state KILLED exitStatus 143
23/03/06 16:32:55 INFO ExternalShuffleBlockResolver: Clean up non-shuffle and non-RDD files associated with the finished executor 0
23/03/06 16:32:55 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20230306163148-0000, execId=0)
23/03/06 16:32:55 INFO Worker: Cleaning up local directories for application app-20230306163148-0000
23/03/06 16:32:55 INFO ExternalShuffleBlockResolver: Application app-20230306163148-0000 removed, cleanupLocalDirs = true

清理环境

./sbin/stop-worker.sh
./sbin/stop-master.sh
rm -rf /usr/local/spark

http://www.lryc.cn/news/31659.html

相关文章:

  • docker逃逸复现--pid=host模式下的逃逸
  • 【环境配置】Windows系统下搭建Pytorch框架
  • Dockerfile简单使用入门
  • 什么是CCC认证3C强制认证机构
  • C语言-基础了解-18-C共用体
  • Vue基础18之github案例、vue-resource
  • UE4 c++ Mediaplayer取消自动播放,运行时首帧为黑屏的问题
  • C语言-基础了解-17-C结构体
  • Python爬虫实践:优志愿 院校列表
  • Java框架学习 | MySQL和Maven笔记
  • C++入门教程||C++ 变量作用域||C++ 常量
  • 想找工作,这一篇15w字数+的文章帮你解决
  • Mac brew搭建php整套开发环境
  • 111 e
  • Cookie和Session
  • git上传下载
  • 如何使用码匠连接 Oracle
  • 【Git】git常用命令集合
  • 基于 WebSocket、Spring Boot 教你实现“QQ聊天功能”的底层简易demo
  • 13. 郭老师爱合并果子
  • Method breakpoints may dramatically slow down debugging 解决方案
  • ABAP ALV和OOALV设置单元格颜色,编辑
  • Java知识复习(十三)数据库和SQL
  • JVM虚拟机种类
  • Linux操作系统学习(线程基础)
  • YOLOv5源码逐行超详细注释与解读(1)——项目目录结构解析
  • 前端开发总结的一些技巧和实用方法(2)
  • Docker搭建jenkins(Vue自动化部署)
  • ADCS攻击之CVE-2022–26923
  • AO3401-ASEMI低压P沟道MOS管AO3401