当前位置: 首页 > news >正文

源码解析FlinkKafkaConsumer支持punctuated水位线发送

背景

FlinkKafkaConsumer支持当收到某个kafka分区中的某条记录时发送水位线,比如这条特殊的记录代表一个完整记录的结束等,本文就来解析下发送punctuated水位线的源码

punctuated 水位线发送源码解析

1.首先KafkaFetcher中的runFetchLoop方法

public void runFetchLoop() throws Exception {try {// kick off the actual Kafka consumerconsumerThread.start();while (running) {// this blocks until we get the next records// it automatically re-throws exceptions encountered in the consumer threadfinal ConsumerRecords<byte[], byte[]> records = handover.pollNext();// get the records for each topic partitionfor (KafkaTopicPartitionState<T, TopicPartition> partition :subscribedPartitionStates()) {List<ConsumerRecord<byte[], byte[]>> partitionRecords =records.records(partition.getKafkaPartitionHandle());
// 算子任务消费的每个分区都调用这个方法partitionConsumerRecordsHandler(partitionRecords, partition);}}} finally {// this signals the consumer thread that no more work is to be doneconsumerThread.shutdown();}

2.查看partitionConsumerRecordsHandler方法处理当前算子任务对应的每个分区的水位线

    protected void emitRecordsWithTimestamps(Queue<T> records,KafkaTopicPartitionState<T, KPH> partitionState,long offset,long kafkaEventTimestamp) {// emit the records, using the checkpoint lock to guarantee// atomicity of record emission and offset state updatesynchronized (checkpointLock) {T record;while ((record = records.poll()) != null) {long timestamp = partitionState.extractTimestamp(record, kafkaEventTimestamp);// 发送kafka记录到下游算子sourceContext.collectWithTimestamp(record, timestamp);// this might emit a watermark, so do it after emitting the record// 处理分区的水位线,记录这个分区的水位线,并在满足条件时更新整个算子任务的水位线partitionState.onEvent(record, timestamp);}partitionState.setOffset(offset);}}```3.处理每个分区的水位线```javapublic void onEvent(T event, long timestamp) {watermarkGenerator.onEvent(event, timestamp, immediateOutput);}public void onEvent(T event, long eventTimestamp, WatermarkOutput output) {final org.apache.flink.streaming.api.watermark.Watermark next =wms.checkAndGetNextWatermark(event, eventTimestamp);if (next != null) {output.emitWatermark(new Watermark(next.getTimestamp()));}}其中 output.emitWatermark(new Watermark(next.getTimestamp()));对应方法如下public void emitWatermark(Watermark watermark) {long timestamp = watermark.getTimestamp();// 更新每个分区对应的水位线,并且更新boolean wasUpdated = state.setWatermark(timestamp);// if it's higher than the max watermark so far we might have to update the// combined watermark 这个表明这个算子任务的最低水位线,也就是算子任务级别的水位线,而不是分区级别的了if (wasUpdated && timestamp > combinedWatermark) {updateCombinedWatermark();}}//每个分区水位线的更新如下public boolean setWatermark(long watermark) {this.idle = false;final boolean updated = watermark > this.watermark;this.watermark = Math.max(watermark, this.watermark);return updated;}       

4.最后是发送算子任务级别的水位线的方法

private void updateCombinedWatermark() {long minimumOverAllOutputs = Long.MAX_VALUE;boolean hasOutputs = false;boolean allIdle = true;for (OutputState outputState : watermarkOutputs) {if (!outputState.isIdle()) {minimumOverAllOutputs = Math.min(minimumOverAllOutputs, outputState.getWatermark());allIdle = false;}hasOutputs = true;}// if we don't have any outputs minimumOverAllOutputs is not valid, it's still// at its initial Long.MAX_VALUE state and we must not emit thatif (!hasOutputs) {return;}if (allIdle) {underlyingOutput.markIdle();} else if (minimumOverAllOutputs > combinedWatermark) {combinedWatermark = minimumOverAllOutputs;underlyingOutput.emitWatermark(new Watermark(minimumOverAllOutputs));}}

你可以看这个流程,是不是意味着如果使用Punctuated的方式,是不支持Idle空闲时间的?–答案是的

http://www.lryc.cn/news/193401.html

相关文章:

  • vue3学习(五)--- 父子组件传值
  • 寻找AI时代的关键拼图,从美国橡树岭国家实验室读懂AI存力信标
  • 多线程并发篇---第十二篇
  • P7537 [COCI2016-2017#4] Rima
  • SwiftUI Swift CoreData 计算某实体某属性总和
  • docker安装skyWalking笔记
  • 【Codeforces】 CF1097G Vladislav and a Great Legend
  • 力扣每日一题36:有效的数独
  • 钉钉数字校园小程序开发:开启智慧教育新时代
  • 数据结构与算法--其他算法
  • 矩阵键盘行列扫描
  • unity 实现拖动ui填空,并判断对错
  • 《机器学习》第5章 神经网络
  • FPGA project : flash_erasure
  • AC修炼计划(AtCoder Regular Contest 166)
  • Android---Android 是如何通过 Activity 进行交互的
  • 【论文解读】单目3D目标检测 MonoCon(AAAI2022)
  • Angular知识点系列(5)-每天10个小知识
  • 基于海洋捕食者优化的BP神经网络(分类应用) - 附代码
  • Lift, Splat, Shoot图像BEV安装与模型详解
  • MySQL简介
  • php代码优化---本人的例子
  • EMC Unity存储(VNXe) service Mode和Normal Mode的一些说明
  • 基于全景运动感知的飞行视觉脑关节神经网络全方位碰撞检测
  • Java 继承与实现
  • Unity 3D基础——计算两个物体之间的距离
  • css常见问题处理
  • 蓝桥杯(迷宫,C++)
  • Python爬虫selenium安装谷歌驱动解决办法
  • 生信教程:使用拓扑加权探索基因组进化(3)