当前位置: 首页 > news >正文

火车头采集器AI伪原创【php源码】

大家好,本文将围绕python作业提交什么文件展开说明,python123怎么提交作业是一个很多人都想弄明白的事情,想搞清楚python期末作业程序需要先了解以下几个事情。

 

火车头采集ai伪原创插件截图:

I have a python project, whose folder has the structure

main_directory - lib - lib.py

- run - .py

.py is

from lib.lib import add_two

spark = SparkSession \

.builder \

.master('yarn') \

.appName('') \

.getOrCreate()

print(add_two(1,2))

and lib.py is

def add_two(x,y):

return x+y

I want to launch as a Dataproc job in GCP. I have checked online, but I have not understood well how to do it. I am trying to launch the with

gcloud dataproc jobs submit pyspark --cluster=$CLUSTER_NAME --region=$REGION \

run/.py

But I receive the following error message:

from lib.lib import add_two

ModuleNotFoundError: No module named 'lib.lib'

Could you help me on how I should do to launch the job on Dataproc? The only way I have found to do it is to remove the absolute path, making this change to .py:

from lib import add_two

and the launch the job as

gcloud dataproc jobs submit pyspark --cluster=$CLUSTER_NAME --region=$REGION \

--files /lib/lib.py \

/run/.py

However, I would like to avoid the tedious process to list the files manually every time.

Following the suggestion of @Igor, to pack in a zip file I have found that

zip -j --update -r libpack.zip /projectfolder/* && spark-submit --py-files libpack.zip /projectfolder/run/.py

works. However, this puts all files in the same root folder in libpack.zip, so if there were files with the same names in subfolders this would not work.

Any suggestions?

解决方案

To zip the dependencies -

cd base-path-to-python-modules

zip -qr deps.zip ./* -x .py

Copy deps.zip to hdfs/gs. Use uri when submitting the job as shown below.

Submit a python project (pyspark) using Dataproc' Python connector

from google.cloud import dataproc_v1

from google.cloud.dataproc_v1.gapic.transports import (

job_controller_grpc_transport)

region =

cluster_name =

project_id =

job_transport = (

job_controller_grpc_transport.JobControllerGrpcTransport(

address='{}-dataproc.googleapis.com:443'.format(region)))

dataproc_job_client = dataproc_v1.JobControllerClient(job_transport)

job_file =

# command line for the main job file

args = ['args1', 'arg2']

# required only if main python job file has imports from other modules

# can be one of .py, .zip, or .egg.

addtional_python_files = ['hdfs://path/to/deps.zip', 'gs://path/to/moredeps.zip']

job_details = {

'placement': {

'cluster_name': cluster_name

},

'pyspark_job': {

'main_python_file_uri': job_file,

'args': args,

'python_file_uris': addtional_python_files

}

}

res = dataproc_job_client.submit_job(project_id=project_id,

region=region,

job=job_details)

job_id = res.reference.job_id

print(f'Submitted dataproc job id: {job_id}')

http://www.lryc.cn/news/114915.html

相关文章:

  • Python中常见的6种数据类型
  • 消息队列项目(2)
  • 解决MAC M1处理器运行Android protoc时出现的错误
  • C#使用SnsSharp实现鼠标键盘钩子,实现全局按键响应
  • Zookeeper基础操作
  • 【CSS】说说响应式布局
  • 数据结构 | 利用二叉堆实现优先级队列
  • Javascript怎样阻止事件传播?
  • web-csrf
  • 数据结构—图的存储结构
  • Vue3 中 setup,ref 和 reactive 的理解
  • BL302嵌入式ARM控制器进行SQLite3数据库操作的实例演示
  • C++ 多线程:std::future
  • 断路器回路电阻试验
  • Python中的CALL_FUNCTION指令
  • 微服务——es数据聚合+RestClient实现聚合
  • 代码分析Java中的BIO与NIO
  • 网络安全(黑客)工作篇
  • zookeeper入门学习
  • VirtualEnv 20.24.0 发布
  • LabVIEW开发高压航空航天动力系统爬电距离的测试
  • 【论文阅读】基于深度学习的时序异常检测——Anomaly Transformer
  • Java并发总结
  • 视频汇聚平台EasyCVR视频广场侧边栏支持拖拽
  • MyCat分片规则——范围分片、取模分片、一致性hash、枚举分片
  • 设计模式行为型——备忘录模式
  • Parquet存储的数据模型以及文件格式
  • Go和Java实现访问者模式
  • 想要通过软件测试的面试,都需要学习哪些知识
  • MySQL的索引使用的数据结构,事务知识