dolphinscheduler中sqoop无法执行
由于sqoop兼容的hdfs版本和hive版本比较低,所以用sqoop想把postgres中数据导入到hive中,想了很多办法,都不行。
先是出错:
/tmp/dolphinscheduler/exec/process/admin/18331744396416/18363661030912_1/18/18/18_18.command: line 4: /opt/datasophon/dolphinscheduler-3.1.8/worker-server/conf/dolphinscheduler_env.sh: Permission denied
发现个问题,/opt/datasophon/dolphinscheduler-3.1.8/worker-server/conf/dolphinscheduler_env.sh这个脚本和目录,每次dolphinscheduler重启后,权限变成700了,只能手动去修改成755,包括目录也要改。这个问题只要重启就会存在,正在想办法。临时办法是搞个脚本定时去修改成755。
然后就是sqoop操作hive又有各种问题。
再就是dolphinscheduler中sqoop不支持postgres数据源,用mysql试了,也是各种出错,比如
后来用Datax也是很多问题,比如:
com.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-05], Description:[往您配置的写入表中写入数据时失败.]. - java.sql.SQLFeatureNotSupportedException: Method not supportedat org.apache.hive.jdbc.HiveConnection.rollback(HiveConnection.java:1327)at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doBatchInsert(CommonRdbmsWriter.java:365)at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWriteWithConnection(CommonRdbmsWriter.java:295)at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWrite(CommonRdbmsWriter.java:323)at com.alibaba.datax.plugin.reader.rdbmswriter.RdbmsWriter$Task.startWrite(RdbmsWriter.java:86)at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:56)at java.lang.Thread.run(Thread.java:750)- java.sql.SQLFeatureNotSupportedException: Method not supportedat org.apache.hive.jdbc.HiveConnection.rollback(HiveConnection.java:1327)at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.doBatchInsert(CommonRdbmsWriter.java:365)at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWriteWithConnection(CommonRdbmsWriter.java:295)at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.startWrite(CommonRdbmsWriter.java:323)at com.alibaba.datax.plugin.reader.rdbmswriter.RdbmsWriter$Task.startWrite(RdbmsWriter.java:86)at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:56)at java.lang.Thread.run(Thread.java:750)
最后,我想,既然它们自带的hive和hdfs版本不兼容,我直接用客户端不就解决问题了吗,于是改用shell脚本:
/opt/sqoop/bin/sqoop import -D mapred.job.name=pg2hadoop -m 1 --connect "jdbc:postgresql://10.0.0.22:5432/postgres" --username user1 --password "123456" --null-non-string 'NULL' --null-string 'NULL' \
--query "SELECT username, display_name, user_sn, native_name, email, "password", valid_from, valid_to, status, mobile, user_type, sso_uid, gender, birth_date, id_number, id_type, src_sys, created_at, created_by, updated_at, updated_by FROM ri_base.ri_base_auth_user where \$CONDITIONS " \
--target-dir /dmp/tmp/riskctr/ri_base_auth_user/${bizDate} --delete-target-dir
这就放到了临时目录,再用hive脚本在临时目录建个表:
drop table if exists testdb.tmp_riskctr_ri_base_auth_user;
CREATE external TABLE testdb.tmp_riskctr_ri_base_auth_user (username string COMMENT '登录账号',display_name string COMMENT '显示名',user_sn string COMMENT '用户系列号',native_name string COMMENT '本土名',email string COMMENT '邮箱',passwd string COMMENT '密码',valid_from date COMMENT '生效日期',valid_to date COMMENT '终效日期',status string COMMENT '状态',mobile string COMMENT '移动电话',user_type string COMMENT '用户类型',sso_uid string COMMENT 'SSO的用户ID',gender string COMMENT '性别',birth_date date COMMENT '生日',id_number string COMMENT '证件号码',id_type string COMMENT '证件类型',src_sys string COMMENT '来源系统',created_at timestamp,created_by string,updated_at timestamp,updated_by string
)
COMMENT '员工表'
STORED AS textfile
location \"/dmp/tmp/riskctr/ri_base_auth_user/${bizDate}\";
最后把临时表数据加载到正式表里:
insert overwrite table testdb.ods_riskctr_ri_base_auth_user
select username, display_name, user_sn, native_name, email, passwd as passwd, valid_from,
valid_to, status, mobile, user_type, sso_uid, gender, birth_date, id_number, id_type, src_sys,
created_at, created_by, updated_at, updated_by
from testdb.tmp_riskctr_ri_base_auth_user;