当前位置: 首页 > news >正文

windows本地开发Spark[不开虚拟机]

1. windows本地安装hadoop

hadoop 官网下载 hadoop2.9.1版本

1.1 解压缩至C:\XX\XX\hadoop-2.9.1

1.2 下载动态链接库和工具库

在这里插入图片描述

1.3 将文件winutils.exe放在目录C:\XX\XX\hadoop-2.9.1\bin下

1.4 将文件hadoop.dll放在目录C:\XX\XX\hadoop-2.9.1\bin下

1.5 将文件hadoop.dll放在目录C:\Windows\System32下

1.6 配置环境变量

添加方式变量名变量值
新建HADOOP_HOMEC:\XX\XX\hadoop-2.9.1
新建HADOOP_HOMEC:\XX\XX\hadoop-2.9.1
编辑(在Path中添加)Path%HADOOP_HOME%\bin

1.7 重启计算机

2. windows本地安装scala

scala 官网下载 scala 2.12.17 版本
在这里插入图片描述

  • 在上图中选择如下图所示下载
    在这里插入图片描述
  • 解压文件至 D:\Server\,并将文件夹名称由D:\Server\scala-2.12.17改为D:\Server\scala

2.1 配置环境变量

操作方式变量名变量值
新建SCALA_HOMED:\Server\scala
添加Path%SCALA_HOME%\bin
添加CLASSPATH.;%SCALA_HOME%\bin;%SCALA_HOME%\lib\dt.jar;%SCALA_HOME%\lib\tools.jar.;

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

2.2 CMD初体验

在这里插入图片描述

3. Spark应用开发

3.1 IDEA 创建maven项目

maven安装教程

3.2 创建scala文件夹

在这里插入图片描述

  • 文件夹名为scala
    在这里插入图片描述
  • 将scala文件夹标记为源文件目录
    在这里插入图片描述

3.3 创建scala文件

3.3.1 安装scala插件

在这里插入图片描述
在这里插入图片描述

3.3.2 添加框架支持

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

3.3.3 修改pom文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>cn.cmcst</groupId><artifactId>spark</artifactId><version>1.0</version><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.7</maven.compiler.source><maven.compiler.target>1.7</maven.compiler.target><hadoop.version>2.9.1</hadoop.version><scala.version>2.12.17</scala.version></properties><dependencies><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.scala-lang</groupId><artifactId>scala-compiler</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.scala-lang</groupId><artifactId>scala-reflect</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.12</artifactId><version>2.4.8</version></dependency></dependencies><build><plugins><plugin><groupId>org.scala-tools</groupId><artifactId>maven-scala-plugin</artifactId><version>2.12.2</version><executions><execution><goals><goal>compile</goal><goal>testCompile</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>8</source><target>8</target></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-assembly-plugin</artifactId><version>2.4</version><configuration><archive><manifest><mainClass>cn.cmcst.spark.WordCount</mainClass></manifest></archive><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>
</project>

3.3.4 创建WordCount文件

package cn.cmcst.sparkimport org.apache.spark.{SparkConf, SparkContext}object WordCount {def main(args: Array[String]): Unit = {if(args.length < 2){System.err.println("Usage: WordCount <input> <output>")System.exit(0)}val input = args(0)val output = args(1)val conf = new SparkConf().setAppName("WordCount").setMaster("local[3]")val sc = new SparkContext(conf)val lines = sc.textFile(input)val result = lines.flatMap(_.split("\\s+")).map((_,1)).reduceByKey(_+_)result.collect().foreach(println)
//    result.saveAsTextFile(output)sc.stop()}
}

3.3.5 IDEA中输入参数

在这里插入图片描述

在这里插入图片描述

  • 红框中输入两个参数:程序中要读取文件输入目录 文件输出目录
    在这里插入图片描述
  • words.log文件内容为
flink flink flink
hadoop hadoop hadoop
spark spark spark
davinci davinci davinci
hive zookeeper sqoop mysql
hive zookeeper sqoop mysql
IDEA IDEA IDEA

2.3.6 输出结果

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/02/14 13:59:34 INFO SparkContext: Running Spark version 2.4.8
23/02/14 13:59:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/02/14 13:59:35 INFO SparkContext: Submitted application: WordCount
23/02/14 13:59:35 INFO SecurityManager: Changing view acls to: CMCST,root
23/02/14 13:59:35 INFO SecurityManager: Changing modify acls to: CMCST,root
23/02/14 13:59:35 INFO SecurityManager: Changing view acls groups to: 
23/02/14 13:59:35 INFO SecurityManager: Changing modify acls groups to: 
23/02/14 13:59:35 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(CMCST, root); groups with view permissions: Set(); users  with modify permissions: Set(CMCST, root); groups with modify permissions: Set()
23/02/14 13:59:36 INFO Utils: Successfully started service 'sparkDriver' on port 9229.
23/02/14 13:59:36 INFO SparkEnv: Registering MapOutputTracker
23/02/14 13:59:36 INFO SparkEnv: Registering BlockManagerMaster
23/02/14 13:59:36 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/02/14 13:59:36 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/02/14 13:59:36 INFO DiskBlockManager: Created local directory at C:\Users\CMCST\AppData\Local\Temp\blockmgr-87f53eb9-52f3-4323-83b0-ebc53bfc2073
23/02/14 13:59:36 INFO MemoryStore: MemoryStore started with capacity 897.6 MB
23/02/14 13:59:36 INFO SparkEnv: Registering OutputCommitCoordinator
23/02/14 13:59:36 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/02/14 13:59:36 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://DESKTOP-0DK2AAM:4040
23/02/14 13:59:36 INFO Executor: Starting executor ID driver on host localhost
23/02/14 13:59:37 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 9252.
23/02/14 13:59:37 INFO NettyBlockTransferService: Server created on DESKTOP-0DK2AAM:9252
23/02/14 13:59:37 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/02/14 13:59:37 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, DESKTOP-0DK2AAM, 9252, None)
23/02/14 13:59:37 INFO BlockManagerMasterEndpoint: Registering block manager DESKTOP-0DK2AAM:9252 with 897.6 MB RAM, BlockManagerId(driver, DESKTOP-0DK2AAM, 9252, None)
23/02/14 13:59:37 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, DESKTOP-0DK2AAM, 9252, None)
23/02/14 13:59:37 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, DESKTOP-0DK2AAM, 9252, None)
23/02/14 13:59:37 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 214.6 KB, free 897.4 MB)
23/02/14 13:59:37 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 897.4 MB)
23/02/14 13:59:37 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on DESKTOP-0DK2AAM:9252 (size: 20.4 KB, free: 897.6 MB)
23/02/14 13:59:37 INFO SparkContext: Created broadcast 0 from textFile at WordCount.scala:18
23/02/14 13:59:37 INFO FileInputFormat: Total input paths to process : 1
23/02/14 13:59:38 INFO SparkContext: Starting job: collect at WordCount.scala:21
23/02/14 13:59:38 INFO DAGScheduler: Registering RDD 3 (map at WordCount.scala:19) as input to shuffle 0
23/02/14 13:59:38 INFO DAGScheduler: Got job 0 (collect at WordCount.scala:21) with 2 output partitions
23/02/14 13:59:38 INFO DAGScheduler: Final stage: ResultStage 1 (collect at WordCount.scala:21)
23/02/14 13:59:38 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
23/02/14 13:59:38 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
23/02/14 13:59:38 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:19), which has no missing parents
23/02/14 13:59:38 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.8 KB, free 897.4 MB)
23/02/14 13:59:38 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 897.4 MB)
23/02/14 13:59:38 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on DESKTOP-0DK2AAM:9252 (size: 3.4 KB, free: 897.6 MB)
23/02/14 13:59:38 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
23/02/14 13:59:38 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:19) (first 15 tasks are for partitions Vector(0, 1))
23/02/14 13:59:38 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
23/02/14 13:59:38 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7381 bytes)
23/02/14 13:59:38 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7381 bytes)
23/02/14 13:59:38 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
23/02/14 13:59:38 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
23/02/14 13:59:38 INFO HadoopRDD: Input split: file:/C:/WorkSpace/IDEA/BigData/spark/input/words.log:0+77
23/02/14 13:59:38 INFO HadoopRDD: Input split: file:/C:/WorkSpace/IDEA/BigData/spark/input/words.log:77+78
23/02/14 13:59:38 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1163 bytes result sent to driver
23/02/14 13:59:38 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1163 bytes result sent to driver
23/02/14 13:59:38 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 523 ms on localhost (executor driver) (1/2)
23/02/14 13:59:38 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 508 ms on localhost (executor driver) (2/2)
23/02/14 13:59:39 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
23/02/14 13:59:39 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:19) finished in 0.627 s
23/02/14 13:59:39 INFO DAGScheduler: looking for newly runnable stages
23/02/14 13:59:39 INFO DAGScheduler: running: Set()
23/02/14 13:59:39 INFO DAGScheduler: waiting: Set(ResultStage 1)
23/02/14 13:59:39 INFO DAGScheduler: failed: Set()
23/02/14 13:59:39 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:19), which has no missing parents
23/02/14 13:59:39 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.1 KB, free 897.4 MB)
23/02/14 13:59:39 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.5 KB, free 897.4 MB)
23/02/14 13:59:39 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on DESKTOP-0DK2AAM:9252 (size: 2.5 KB, free: 897.6 MB)
23/02/14 13:59:39 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
23/02/14 13:59:39 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:19) (first 15 tasks are for partitions Vector(0, 1))
23/02/14 13:59:39 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
23/02/14 13:59:39 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, executor driver, partition 0, ANY, 7141 bytes)
23/02/14 13:59:39 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, executor driver, partition 1, ANY, 7141 bytes)
23/02/14 13:59:39 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
23/02/14 13:59:39 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
23/02/14 13:59:39 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
23/02/14 13:59:39 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
23/02/14 13:59:39 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 9 ms
23/02/14 13:59:39 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 9 ms
23/02/14 13:59:39 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1285 bytes result sent to driver
23/02/14 13:59:39 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1355 bytes result sent to driver
23/02/14 13:59:39 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 195 ms on localhost (executor driver) (1/2)
23/02/14 13:59:39 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 199 ms on localhost (executor driver) (2/2)
23/02/14 13:59:39 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
23/02/14 13:59:39 INFO DAGScheduler: ResultStage 1 (collect at WordCount.scala:21) finished in 0.215 s
23/02/14 13:59:39 INFO DAGScheduler: Job 0 finished: collect at WordCount.scala:21, took 1.181680 s
23/02/14 13:59:39 INFO BlockManagerInfo: Removed broadcast_1_piece0 on DESKTOP-0DK2AAM:9252 in memory (size: 3.4 KB, free: 897.6 MB)
23/02/14 13:59:39 INFO SparkUI: Stopped Spark web UI at http://DESKTOP-0DK2AAM:4040
(hive,2)
(flink,3)
(davinci,3)
(zookeeper,2)
(mysql,2)
(sqoop,2)
(spark,3)
(hadoop,3)
(IDEA,3)
23/02/14 13:59:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/02/14 13:59:39 INFO MemoryStore: MemoryStore cleared
23/02/14 13:59:39 INFO BlockManager: BlockManager stopped
23/02/14 13:59:39 INFO BlockManagerMaster: BlockManagerMaster stopped
23/02/14 13:59:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/02/14 13:59:39 INFO SparkContext: Successfully stopped SparkContext
23/02/14 13:59:39 INFO ShutdownHookManager: Shutdown hook called
23/02/14 13:59:39 INFO ShutdownHookManager: Deleting directory C:\Users\CMCST\AppData\Local\Temp\spark-3a48779f-a780-43f5-9898-d8a95cbdbcec
http://www.lryc.cn/news/6285.html

相关文章:

  • 一文教你快速估计个股交易成本
  • Leetcode—移除元素、删除有序数组中的重复项、合并两个有序数组
  • 面试(十)大疆 安全开发 C++1面
  • 短信链接跳转微信小程序
  • 吉林电视台启用乾元通多卡聚合系统广电视频传输解决方案
  • Linux常用命令1
  • 【C++进阶】一、继承(总)
  • AttributeError: module ‘lib‘ has no attribute ‘OpenSSL_add_all_algorithms
  • Python实现视频自动打码功能,避免看到羞羞的画面
  • 说说Knife4j
  • Java学习笔记-03(API阶段-2)集合
  • 「3」线性代数(期末复习)
  • 【CSDN竞赛】27期题解(Javascript)
  • 高压放大器在骨的逆力电研究中的应用
  • 思科网络部署,(0基础)入门实验,超详细
  • private static final Long serialVersionUID= 1L详解
  • 若依前后端分离版集成nacos
  • JAVA面试八股文一(mysql)
  • 动静态库概念及创建
  • 【H.264】码流解析 annexb vs avcc
  • 【最优化方法】1-最优化方法介绍
  • 数据结构 | 树 | 二叉树
  • 笔记:使用 unbuild 搭建 JavaScript 构建系统笔记
  • 【SpringBoot3.0源码】启动流程源码解析 •下
  • QT(56)-动态链接库-windows-导出变量-导出类
  • TCP传输文件
  • vue3:加载本地图片等静态资源
  • 工作记录------数据库group_concat函数长度问题
  • Python基础语法
  • windows环境下安装Nginx及常用操作命令