当前位置: 首页 > article >正文

kakfa 基本了解

部署结构

Kafka 使用zookeeper来协商和同步,但是kafka 从版本3.5正式开始deprecate zookeeper, 同时推荐使用自带的 kraft. 而从4.0 开始则不再支持 zookeeper。
所以 kafka 是有control plane 和 data plane 的。
data plane 就是broker,control plane 旧的就是zookeeper,新的是 kRaft contorller。

kafka cluster的节点(broker)是没有leader 和 follower 之分的。
只是存储messages 的 topic partition 有。
Client只能connect partition 的leader 来写和读。

Partition Connection

kafka 术语

Kafka brokers: Brokers refer to each of the nodes in a Kafka cluster

The broker.id property is the unique and permanent name of each node in the cluster

Record:Records are also called messages or events. A Kafka record consists of headers, a key, a value, and a timestamp.

Headers contain metadata consisting of string-value pairs, which can be read by consumers in order to make decisions based on the metadata. Headers are optional.
Key and value pairs in a Kafka record contains data relevant to your business. The key may have some structure, and can be a string, an integer, or some compound value. The value is structured and is likely an object to be serialized.
Every Kafka record has a timestamp. If you don’t provide one, one is provided by default.

Message streams are persistent in Kafka. This means that messages do not disappear once received. This is in contrast with classic “pub-sub” systems such as JMS, which as soon as the message is received by the subscriber, is removed from the system. In Kafka, message retention periods are configurable, usually based on a length of time or the size of the underlying storage.
kafka 的message 是可以被多次消费的,不是消费后就被删除,而是通过 retention policy 来控制的。

在这里插入图片描述

Topic:Kafka topics are the categories used to organize messages, messages are sent to and read from specific topics.

Each topic has a name that is unique across the entire Kafka cluster.
And producers write data to topics, and consumers read data from topics.
Kafka topics are multi-subscriber.This means that a topic can have zero, one, or multiple consumers subscribing to that topic and the data written to it. There is a many-to-many relation between producers/consumers and topics.

在这里插入图片描述

Partition: Kafka topics are divided into one or more partitions, each of which is a logical segment of the topic’s data.

Each partition holds a subset of the topic’s data, and the producer decides which partition a message is written to, based on factors like a key or round-robin distribution.

client producer根据record 的 key 的hash 来分配topic 的partition. 如果没有key则round-robin 分配partition。
一个partition 只会分配给一个client,但是一个client 可能会handle 多个partition。

在这里插入图片描述

A topic can be divided into multiple partitions to enable parallel processing and improve scalability. Each partition is an ordered sequence of immutable records, with messages within a partition guaranteed to maintain their order. Partitions allow Kafka to distribute the workload of writing and reading messages across multiple brokers in the cluster. This enables consumers to read data in parallel, and producers to write data to different partitions simultaneously, increasing throughput and reducing latency.

Within each partition, messages have a unique offset, which is an integer representing their position in the sequence. This helps consumers track their position and retrieve messages in the correct order.

在这里插入图片描述

Kafka Transaction: A Kafka transaction is a group of operations (usually message writes or reads + writes) that are treated as a single atomic unit. Either all of the operations succeed, or none of them take effect.

** Transactions: Balance Overhead with Latency **

One thing to consider, specifically in Kafka Streams applications, is how to set the commit.interval.ms configuration. This will determine how frequently to commit, and hence the size of our transactions. There is a bit of overhead for each transaction so many smaller transactions could cause performance issues. However, long-running transactions will delay the availability of output, resulting in increased latency. Different applications will have different needs, so this should be considered and adjusted accordingly.

http://www.lryc.cn/news/2383794.html

相关文章:

  • 基于Browser Use + Playwright 实现AI Agent操作Web UI自动化
  • Origin绘制多因子柱状点线图
  • Web漏洞扫描服务的特点与优势:守护数字时代的安全防线
  • iOS 直播技术及优化
  • 抛弃传统P2P技术,EasyRTC音视频基于WebRTC打造教育/会议/远程巡检等场景实时通信解决方案
  • 俄罗斯军总参情报局APT28组织瞄准援乌后勤供应链发起全球网络攻击
  • 杰发科技AC7801——PWM获取固定脉冲个数
  • MacBookPro上macOS安装第三方应用报错解决方案:遇到:“无法打开“XXX”,因为无法确定(验证)开发者身份?怎么解决
  • MVC和MVVM架构的区别
  • RAG(Retrieval-Augmented-Generation)检索增强生成
  • 黑马点评前端Nginx启动失败问题解决记录
  • 第12天-Python+Qt5开发实战:10大经典案例与深度解析
  • 软件开发命名避开保留关键字指南
  • 力扣第450场周赛
  • React-改变当前页class默认的样式
  • zabbix 常见问题
  • 人工智能培训:解锁未来职场竞争力的核心路径与课程内容解析
  • 深入解析Java泛型:从定义到实战应用
  • 【开源】一个基于 Vue3 和 Electron 开发的第三方网易云音乐客户端,具有与官方客户端相似的界面布局
  • 【云实验】Excel文件转存到RDS数据库
  • 从零开始:用Python语言基础构建宠物养成游戏:从核心知识到完整实战
  • labview设计一个虚拟信号发生器
  • 工业路由器WiFi6+5G的作用与使用指南,和普通路由器对比
  • Chrome 插件网络请求的全面指南
  • 编译Qt5.15.16并启用pdf模块
  • Python绘制新冠疫情的知识图谱
  • canvas(三)-动画3d
  • 使用RUST在Arduino上进行编程(MacOS,mega板)
  • MySQL迁移SSL报错
  • 大模型微调与高效训练