当前位置: 首页 > news >正文

【hudi】数据湖客户端运维工具Hudi-Cli实战

数据湖客户端运维工具Hudi-Cli实战

help

hudi:student_mysql_cdc_hudi_fl->help
AVAILABLE COMMANDSArchived Commits Commandtrigger archival: trigger archivalshow archived commits: Read commits from archived files and show detailsshow archived commit stats: Read commits from archived files and show detailsBootstrap Commandbootstrap run: Run a bootstrap action for current Hudi tablebootstrap index showmapping: Show bootstrap index mappingbootstrap index showpartitions: Show bootstrap indexed partitionsBuilt-In Commandshelp: Display help about available commandsstacktrace: Display the full stacktrace of the last error.clear: Clear the shell screen.quit, exit: Exit the shell.history: Display or save the history of previously run commandsversion: Show version infoscript: Read and execute commands from a file.Cleans Commandcleans show: Show the cleansclean showpartitions: Show partition level details of a cleancleans run: run cleanClustering Commandclustering run: Run Clusteringclustering scheduleAndExecute: Run Clustering. Make a cluster plan first and execute that plan immediatelyclustering schedule: Schedule ClusteringCommits Commandcommits compare: Compare commits with another Hoodie tablecommits sync: Sync commits with another Hoodie tablecommit showpartitions: Show partition level details of a commitcommits show: Show the commitscommits showarchived: Show the archived commitscommit showfiles: Show file level details of a commitcommit show_write_stats: Show write stats of a commitCompaction Commandcompaction run: Run Compaction for given instant timecompaction scheduleAndExecute: Schedule compaction plan and execute this plancompaction showarchived: Shows compaction details for a specific compaction instantcompaction repair: Renames the files to make them consistent with the timeline as dictated by Hoodie metadata. Use when compaction unschedule fails partially.compaction schedule: Schedule Compactioncompaction show: Shows compaction details for a specific compaction instantcompaction unscheduleFileId: UnSchedule Compaction for a fileIdcompaction validate: Validate Compactioncompaction unschedule: Unschedule Compactioncompactions show all: Shows all compactions that are in active timelinecompactions showarchived: Shows compaction details for specified time windowDiff Commanddiff partition: Check how file differs across range of commits. It is meant to be used only for partitioned tables.diff file: Check how file differs across range of commitsExport Commandexport instants: Export Instants and their metadata from the TimelineFile System View Commandshow fsview all: Show entire file-system viewshow fsview latest: Show latest file-system viewHDFS Parquet Import Commandhdfsparquetimport: Imports Parquet table to a hoodie tableHoodie Log File Commandshow logfile records: Read records from log filesshow logfile metadata: Read commit metadata from log filesHoodie Sync Validate Commandsync validate: Validate the sync by counting the number of recordsKerberos Authentication Commandkerberos kdestroy: Destroy Kerberos authenticationkerberos kinit: Perform Kerberos authenticationMarkers Commandmarker delete: Delete the markerMetadata Commandmetadata stats: Print stats about the metadatametadata list-files: Print a list of all files in a partition from the metadatametadata list-partitions: List all partitions from metadatametadata validate-files: Validate all files in all partitions from the metadatametadata delete: Remove the Metadata Tablemetadata create: Create the Metadata Table if it does not existmetadata init: Update the metadata table from commits since the creationmetadata set: Set options for Metadata TableRepairs Commandrepair deduplicate: De-duplicate a partition path contains duplicates & produce repaired files to replace withrename partition: Rename partition. Usage: rename partition --oldPartition <oldPartition> --newPartition <newPartition>repair overwrite-hoodie-props: Overwrite hoodie.properties with provided file. Risky operation. Proceed with caution!repair migrate-partition-meta: Migrate all partition meta file currently stored in text format to be stored in base file format. See HoodieTableConfig#PARTITION_METAFILE_USE_DATA_FORMAT.repair addpartitionmeta: Add partition metadata to a table, if not presentrepair deprecated partition: Repair deprecated partition ("default"). Re-writes data from the deprecated partition into __HIVE_DEFAULT_PARTITION__repair show empty commit metadata: show failed commitsrepair corrupted clean files: repair corrupted clean filesRollbacks Commandshow rollback: Show details of a rollback instantcommit rollback: Rollback a commitshow rollbacks: List all rollback instantsSavepoints Commandsavepoint rollback: Savepoint a commitsavepoints show: Show the savepointssavepoint create: Savepoint a commitsavepoint delete: Delete the savepointSpark Env Commandset: Set spark launcher env to clishow env: Show spark launcher env by keyshow envs all: Show spark launcher envsStats Commandstats filesizes: File Sizes. Display summary stats on sizes of filesstats wa: Write Amplification. Ratio of how many records were upserted to how many records were actually writtenTable Commandtable update-configs: Update the table configs with configs with provided file.table recover-configs: Recover table configs, from update/delete that failed midway.refresh, metadata refresh, commits refresh, cleans refresh, savepoints refresh: Refresh table metadatacreate: Create a hoodie table if not presenttable delete-configs: Delete the supplied table configs from the table.fetch table schema: Fetches latest table schemaconnect: Connect to a hoodie tabledesc: Describe Hoodie Table propertiesTemp View Commandtemp_query, temp query: query against created temp viewtemps_show, temps show: Show all views nametemp_delete, temp delete: Delete view nameTimeline Commandmetadata timeline show incomplete: List all incomplete instants in active timeline of metadata tablemetadata timeline show active: List all instants in active timeline of metadata tabletimeline show incomplete: List all incomplete instants in active timelinetimeline show active: List all instants in active timelineUpgrade Or Downgrade Commanddowngrade table: Downgrades a tableupgrade table: Upgrades a tableUtils Commandutils loadClass: Load a class

kerberos

kerberos kinit --principal xxx@XXXXX.COM --keytab /xxx/kerberos/xxx.keytab

在这里插入图片描述
先看下样例表的表结构:
分区表哦!

-- FLink SQL建表语句
create table student_mysql_cdc_hudi_fl(`_hoodie_commit_time` string comment 'hoodie commit time',`_hoodie_commit_seqno` string comment 'hoodie commit seqno',`_hoodie_record_key` string comment 'hoodie record key',`_hoodie_partition_path` string comment 'hoodie partition path',`_hoodie_file_name` string comment 'hoodie file name',`s_id` bigint not null comment '主键',`s_name` string not null comment '姓名',`s_age` int comment '年龄',`s_sex` string comment '性别',`s_part` string not null comment '分区字段',`create_time` timestamp(6) not null comment '创建时间',`dl_ts` timestamp(6) not null,`dl_s_sex` string not null,PRIMARY KEY(s_id) NOT ENFORCED
)PARTITIONED BY (`dl_s_sex`) with ( 
,'connector' = 'hudi'
,'hive_sync.table' = 'student_mysql_cdc_hudi'
,'hoodie.datasource.write.drop.partition.columns' = 'true'
,'hoodie.datasource.write.hive_style_partitioning' = 'true'
,'hoodie.datasource.write.partitionpath.field' = 'dl_s_sex'
,'hoodie.datasource.write.precombine.field' = 'dl_ts'
,'path' = 'hdfs://xxx/hudi_db.db/student_mysql_cdc_hudi'
,'precombine.field' = 'dl_ts'
,'primaryKey' = 's_id'
)

table

connect

connect --path /xxx/hudi_db.db/student_mysql_cdc_hudi

在这里插入图片描述

desc

desc

在这里插入图片描述

refresh

refresh

在这里插入图片描述

fetch table schema

fetch table schema

在这里插入图片描述

  "type" : "record","name" : "student_mysql_cdc_hudi_fl_record","namespace" : "hoodie.student_mysql_cdc_hudi_fl","fields" : [ {"name" : "_hoodie_commit_time","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_commit_seqno","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_record_key","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_partition_path","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_file_name","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_operation","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "s_id","type" : "long"}, {"name" : "s_name","type" : "string"}, {"name" : "s_age","type" : [ "null", "int" ],"default" : null}, {"name" : "s_sex","type" : [ "null", "string" ],"default" : null}, {"name" : "s_part","type" : "string"}, {"name" : "create_time","type" : {"type" : "long","logicalType" : "timestamp-micros"}}, {"name" : "dl_ts","type" : {"type" : "long","logicalType" : "timestamp-micros"}}, {"name" : "dl_s_sex","type" : "string"} ]
}

commit

commits show

commits show --sortBy "Total Bytes Written" --desc true --limit 10

在这里插入图片描述

commits showarchived

commits showarchived

在这里插入图片描述

commit showfiles

commit showfiles --commit 20230915164442583

在这里插入图片描述

commit showfiles --commit 20230915164442583 --sortBy "Partition Path"

在这里插入图片描述

commit showpartitions

commit showpartitions --commit 20230915164442583

在这里插入图片描述

commit showpartitions --commit 20230915164442583 --sortBy "Total Bytes Written" --desc true --limit 10

在这里插入图片描述

commit show_write_stats

commit show_write_stats --commit 20230915164442583

在这里插入图片描述

File System View

show fsview all

show fsview all

在这里插入图片描述

show fsview latest

show fsview latest --partitionPath dl_s_sex=female

在这里插入图片描述

Log File

show logfile records

# 注意10 是需要取数据记录条数
show logfile records 10 /xxx/hudi_db.db/student_mysql_cdc_hudi/dl_s_sex=female/.bf4b06b4-e897-42df-8a3c-a3a2f737d367_20230915163856302.log.1_0-1-0

在这里插入图片描述
数据是json格式的:

{"_hoodie_commit_time": "20230915163856302","_hoodie_commit_seqno": "20230915163856302_0_83","_hoodie_record_key": "88","_hoodie_partition_path": "dl_s_sex=female","_hoodie_file_name": "bf4b06b4-e897-42df-8a3c-a3a2f737d367","_hoodie_operation": "I","s_id": 88,"s_name": "傅亮","s_age": 4,"s_sex": "female","s_part": "2017/11/20","create_time": 790128367000000,"dl_ts": -28800000000,"dl_s_sex": "female"
}

show logfile metadata

show logfile metadata /xxx/xxx/hive/hudi_db.db/student_mysql_cdc_hudi/dl_s_sex=female/dl_create_time_yyyy=1971/dl_create_time_mm=03/.dadac2dd-7e5e-46c3-9b27-f1f03e04a90c_20230915151426134.log.1_0

图片中还有FooterMetadata列没显示全
在这里插入图片描述

{"SCHEMA": "{\"type\":\"record\",\"name\":\"student_mysql_cdc_hudi_fl_record\",\"namespace\":\"hoodie.student_mysql_cdc_hudi_fl\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_operation\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"s_id\",\"type\":\"long\"},{\"name\":\"s_name\",\"type\":\"string\"},{\"name\":\"s_age\",\"type\":[\"null\",\"int\"],\"default\":null},{\"name\":\"s_sex\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"s_part\",\"type\":\"string\"},{\"name\":\"create_time\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"}},{\"name\":\"dl_ts\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"}},{\"name\":\"dl_s_sex\",\"type\":\"string\"}]}","INSTANT_TIME": "20230915164442583"
}

differ

diff partition

diff partition dl_s_sex=female

在这里插入图片描述

differ file
# 需要提供FileID。就是log文件的部分
# 如log文件:.bf4b06b4-e897-42df-8a3c-a3a2f737d367_20230915163856302.log.1_0-1-0
diff file bf4b06b4-e897-42df-8a3c-a3a2f737d367

在这里插入图片描述在这里插入图片描述

rollbacks

show rollbacks

show rollbacks

在这里插入图片描述

stats

stats filesizes

stats filesizes --partitionPath dl_s_sex=female --sortBy "95th" --desc true --limit 3

在这里插入图片描述

stats wa

stats wa

在这里插入图片描述

compaction

compactions show all

compactions show all

待续!!!

compactions showarchived

compactions showarchived

在这里插入图片描述

compaction showarchived

compaction showarchived 20230915200042501

在这里插入图片描述

compaction show

compaction show 20230915174042680

在这里插入图片描述

参考文章:
Apache Hudi数据湖hudi-cli客户端使用

http://www.lryc.cn/news/170534.html

相关文章:

  • RK3588 添加ROOT权限
  • 【云原生】k8s-----集群调度
  • 一键集成prometheus监控微服务接口平均响应时长
  • 2023/9/13 -- C++/QT
  • mybatis mapper.xml转建表语句
  • 封装使用Axios进行前后端交互
  • SOA、分布式、微服务
  • json数据传输压缩以及数据切片分割分块传输多种实现方法,大数据量情况下zlib压缩以及bytes指定长度分割
  • 移动端APP测试-如何指定测试策略、测试标准?
  • 【Redis】深入探索 Redis 主从结构的创建、配置及其底层原理
  • CSS 滚动驱动动画 scroll-timeline ( scroll-timeline-name ❤️ scroll-timeline-axis )
  • 9.19号作业
  • Mybatis学习笔记9 动态SQL
  • element表格 和后台联调
  • 基于SSM的智慧城市实验室主页系统的设计与实现
  • 怒赞,阿里P8推荐的Java面试宝典:41个专题PDF(史上最全+面试必备)
  • 线程池各个参数设置说明
  • springBoot对接多个mq并且实现延迟队列---未完待续
  • Pytorch从零开始实战04
  • 北大C++课后记录:文件读写的I/O流
  • 详解Linux的grep命令
  • spark6. 如何设置spark 日志
  • glibc: strlcpy
  • 如何在 Buildroot 中配置 Samba
  • SSM02
  • day3_QT
  • js-map方法中调用服务器接口
  • docker 已经配置了国内镜像源,但是拉取镜像速度还是很慢(gcr.io、quay.io、ghcr.io)
  • [linux(静态文件服务)] 部署vue发布后的dist网页到nginx
  • 智华计算机终端保护检查系统使用笔记