当前位置：首页 > news >正文

Kafka 硬件和操作系统

news 2025/7/31 2:49:34

一. 前言

二. Kafka 硬件和操作系统（Hardware and OS）

2.1. 操作系统（OS）

2.2. 磁盘和文件系统（Disks and Filesystem）

一. 前言

Kafka 是 I/O 密集型而非计算密集型的框架，所以对 CPU 的需求是各个指标里最宽松的，消耗CPU 的点主要在于消息的压缩和解压缩。一个 Kafka Broker 节点往往要承载许多个 Topic Partition 并与许多个 Producer/Consumer 交互，所以并行度（核心/线程数）要比单核性能（频率）更重要。

一般来讲单节点 8C/16T，主频 2GHz 以上（按 Broadwell 架构计）就可以满足小型生产环境，负载比较重的集群可以配到 12C/24T 甚至 16C/32T。注意根据 CPU 规格的不同，Broker 的num.network.threads 和 num.io.threads 参数也要适当改变。

二. Kafka 硬件和操作系统（Hardware and OS）

原文引用：We are using dual quad-core Intel Xeon machines with 24GB of memory.

You need sufficient memory to buffer active readers and writers. You can do a back-of-the-envelope estimate of memory needs by assuming you want to be able to buffer for 30 seconds and compute your memory need as write_throughput*30.

我们使用的是具有 24GB 内存的双四核 Intel Xeon 机器。

您需要足够的内存来缓冲活动的读写器。假设您希望能够缓冲30秒，并将您的内存需求计算为write_throughput*30，您可以对内存需求进行粗略估计。

原文引用：The disk throughput is important. We have 8x7200 rpm SATA drives. In general disk throughput is the performance bottleneck, and more disks is better. Depending on how you configure flush behavior you may or may not benefit from more expensive disks (if you force flush often then higher RPM SAS drives may be better).

磁盘吞吐量很重要。我们有 8x7200 rpm SATA 驱动器。一般来说，磁盘吞吐量是性能瓶颈，磁盘越多越好。根据您配置刷新行为的方式，您可能会从更昂贵的磁盘中受益，也可能不会从中受益（如果您经常强制刷新，则 RPM 更高的 SAS 驱动器可能会更好）。

2.1. 操作系统（OS）

原文引用：Kafka should run well on any unix system and has been tested on Linux and Solaris.

We have seen a few issues running on Windows and Windows is not currently a well supported platform though we would be happy to change that.

Kafka 应该在任何 unix 系统上都能很好地运行，并且已经在 Linux 和 Solaris 上进行了测试。

我们在 Windows 上看到了一些问题，Windows 目前不是一个受支持的平台，尽管我们很乐意改变这一点。

原文引用：It is unlikely to require much OS-level tuning, but there are three potentially important OS-level configurations:

File descriptor limits: Kafka uses file descriptors for log segments and open connections. If a broker hosts many partitions, consider that the broker needs at least (number_of_partitions)*(partition_size/segment_size) to track all log segments in addition to the number of connections the broker makes. We recommend at least 100000 allowed file descriptors for the broker processes as a starting point. Note: The mmap() function adds an extra reference to the file associated with the file descriptor fildes which is not removed by a subsequent close() on that file descriptor. This reference is removed when there are no more mappings to the file.
Max socket buffer size: can be increased to enable high-performance data transfer between data centers as described here.
Maximum number of memory map areas a process may have (aka vm.max_map_count). See the Linux kernel documentation. You should keep an eye at this OS-level property when considering the maximum number of partitions a broker may have. By default, on a number of Linux systems, the value of vm.max_map_count is somewhere around 65535. Each log segment, allocated per partition, requires a pair of index/timeindex files, and each of these files consumes 1 map area. In other words, each log segment uses 2 map areas. Thus, each partition requires minimum 2 map areas, as long as it hosts a single log segment. That is to say, creating 50000 partitions on a broker will result allocation of 100000 map areas and likely cause broker crash with OutOfMemoryError (Map failed) on a system with default vm.max_map_count. Keep in mind that the number of log segments per partition varies depending on the segment size, load intensity, retention policy and, generally, tends to be more than one.

它不太可能需要太多的操作系统级别调整，但有三种潜在的重要操作系统级别配置：

文件描述符限制：Kafka 对日志段和开放连接使用文件描述符。如果一个 Broker 托管许多分区，那么除了 Broker 建立的连接数之外，还需要考虑该 Broker 至少需要(number_of_disparations)*(partition_size/segment_size) 来跟踪所有日志段。我们建议至少100000个允许的文件描述符作为 Broker 进程的起点。注意：mmap() 函数为与文件描述符过滤器相关联的文件添加了一个额外的引用，该文件描述符过滤器不会被该文件描述符的后续close() 删除。当不再有到该文件的映射时，将删除此引用。
最大套接字缓冲区大小：可以增加以实现数据中心之间的高性能数据传输，如本文所述。
进程可能具有的内存映射区域的最大数量（也称为 vm.max_map_count）。请参阅 Linux 内核文档。在考虑 Broker 可能具有的最大分区数时，您应该密切关注此操作系统级别的属性。默认情况下，在许多 Linux 系统上，vm.max_map_count 的值约为65535。每个分区分配的日志段都需要一对索引/时间索引文件，每个文件占用1个映射区域。换句话说，每个日志段使用2个 map 区域。因此，每个分区至少需要2个映射区域，只要它承载一个日志段即可。也就是说，在一个 Broker 上创建50000个分区将导致100000个映射区域的分配，并可能导致具有默认 vm.max_map_count 的系统上出现 OutOfMemoryError（映射失败）的Broker 崩溃。请记住，每个分区的日志段数因段大小、负载强度、保留策略而异，通常情况下往往不止一个。

2.2. 磁盘和文件系统（Disks and Filesystem）

原文引用：We recommend using multiple drives to get good throughput and not sharing the same drives used for Kafka data with application logs or other OS filesystem activity to ensure good latency. You can either RAID these drives together into a single volume or format and mount each drive as its own directory. Since Kafka has replication the redundancy provided by RAID can also be provided at the application level. This choice has several tradeoffs.

我们建议使用多个驱动器以获得良好的吞吐量，而不是与应用程序日志或其他操作系统文件系统活动共享用于 Kafka 数据的相同驱动器以确保良好的延迟。您可以将这些驱动器 RAID 到一个卷中，也可以格式化并将每个驱动器装载为自己的目录。由于 Kafka 具有复制功能，RAID 提供的冗余也可以在应用程序级别提供。这个选择有几个折衷方案。

原文引用：If you configure multiple data directories partitions will be assigned round-robin to data directories. Each partition will be entirely in one of the data directories. If data is not well balanced among partitions this can lead to load imbalance between disks.

如果配置多个数据目录，分区将被分配给数据目录。每个分区都将完全位于其中一个数据目录中。如果分区之间的数据没有很好地平衡，这可能会导致磁盘之间的负载不平衡。

原文引用：RAID can potentially do better at balancing load between disks (although it doesn't always seem to) because it balances load at a lower level. The primary downside of RAID is that it is usually a big performance hit for write throughput and reduces the available disk space.

RAID 可能在平衡磁盘之间的负载方面做得更好（尽管它似乎并不总是这样），因为它在较低级别上平衡负载。RAID 的主要缺点是，它通常会对写入吞吐量造成很大的性能影响，并减少可用磁盘空间。

原文引用：Another potential benefit of RAID is the ability to tolerate disk failures. However our experience has been that rebuilding the RAID array is so I/O intensive that it effectively disables the server, so this does not provide much real availability improvement.

RAID 的另一个潜在好处是能够容忍磁盘故障。然而，我们的经验是，重建 RAID 阵列是 I/O 密集型的，它会有效地禁用服务器，因此这并不能提供太多实际的可用性改进。

查看全文

http://www.lryc.cn/news/338901.html