当前位置: 首页 > news >正文

PostgreSQL16中pg_dump的LZ4和ZSTD压缩

PostgreSQL16中pg_dump的LZ4和ZSTD压缩

pg_dump压缩lz4和zstd

LZ4和ZSTD压缩算法合入了PG16。LZ4补丁的作者是Georgios Kokolatos。由Tomas Vondra提交。由Michael Paquier、Rachel Heaton、Justin Pryzby、Shi Yu 和 Tomas Vondra 审阅。提交消息是:

Expand pg_dump's compression streaming and file APIs to 
support the lz4 algorithm. The newly added compress_lz4.{c,h} 
files cover all the functionality of the aforementioned APIs. 
Minor changes were necessary in various pg_backup_* files, 
where code for the 'lz4' file suffix has been added, 
as well as pg_dump's compression option parsing.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, 
Shi Yu, Tomas Vondra
Discussion: 
https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com

ZSTD补丁的作者是Justin Pryzby。由Tomas Vondra提交。由Tomas Vondra、Jacob Champion 和 Andreas Karlsson 审阅。提交消息是:

Allow pg_dump to use the zstd compression, 
in addition to gzip/lz4. Bulk of the new compression method 
is implemented in compress_zstd.{c,h},covering the pg_dump 
compression APIs. The rest of the patch adds test and makesvarious places aware of the new compression method.
The zstd library (which this patch relies on) supports 
multithreaded compression since version 1.5. We however 
disallow that feature for now, as it might interfere with 
parallel backups on platforms that rely on threads 
(e.g. Windows). This can be improved / relaxed in the future.
This also fixes a minor issue in 
InitDiscoverCompressFileHandle(), which was not updated to 
check if the file already has the .lz4 extension.
Adding zstd compression was originally proposed in 2020 
(see the second thread), but then was reworked to use the 
new compression API introduced in e9960732a9. I've considered 
both threads when compiling the list of reviewers.
Author: Justin Pryzby
Reviewed-by: Tomas Vondra, Jacob Champion, Andreas Karlsson
Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com
Discussion: https://postgr.es/m/20201221194924.GI30237@telsasoft.com

尝试下

~$ pg_dump --version
pg_dump (PostgreSQL) 16devel
~$ pgbench --initialize --scale=100
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
10000000 of 10000000 tuples (100%) done (elapsed 39.52 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 49.65 s (drop tables 0.00 s, create tables 0.08 s, client-side generate 39.96 s, vacuum 0.29 s, primary keys 9.32 s).
~$ psql --command="select pg_size_pretty(pg_database_size('postgres'))"
pg_size_pretty 
----------------
1503 MB
(1 row)
~$ time pg_dump --format=custom --compress=lz4:9 > dump.lz4
real 0m10.507s
user 0m9.901s
sys 0m0.436s
~$ time pg_dump --format=custom --compress=zstd:9 > dump.zstd
real 0m8.794s
user 0m8.393s
sys 0m0.364s
~$ time pg_dump --format=custom --compress=gzip:9 > dump.gz
real 0m14.245s
user 0m13.064s
sys 0m0.978s
~$ time pg_dump --format=custom --compress=lz4 > dump_default.lz4
real 0m6.809s
user 0m1.666s
sys 0m1.125s
~$ time pg_dump --format=custom --compress=zstd > dump_default.zstd
real 0m7.534s
user 0m2.428s
sys 0m0.892s
~$ time pg_dump --format=custom --compress=gzip > dump_default.gz
real 0m11.564s
user 0m10.661s
sys 0m0.525s
~$ time pg_dump --format=custom --compress=lz4:3 > dump_3.lz4
real 0m8.497s
user 0m7.856s
sys 0m0.507s
~$ time pg_dump --format=custom --compress=zstd:3 > dump_3.zstd
real 0m5.129s
user 0m2.228s
sys 0m0.726s
~$ time pg_dump --format=custom --compress=gzip:3 > dump_3.gz
real 0m4.468s
user 0m3.654s
sys 0m0.504s
~$ ls -l --block-size=M
total 250M
-rw-rw-r-- 1 postgres postgres 28M Apr 18 13:58 dump_3.gz
-rw-rw-r-- 1 postgres postgres 48M Apr 18 13:57 dump_3.lz4
-rw-rw-r-- 1 postgres postgres 8M Apr 18 13:58 dump_3.zstd
-rw-rw-r-- 1 postgres postgres 27M Apr 18 13:57 dump_default.gz
-rw-rw-r-- 1 postgres postgres 50M Apr 18 13:56 dump_default.lz4
-rw-rw-r-- 1 postgres postgres 8M Apr 18 13:57 dump_default.zstd
-rw-rw-r-- 1 postgres postgres 27M Apr 18 13:56 dump.gz
-rw-rw-r-- 1 postgres postgres 48M Apr 18 13:55 dump.lz4
-rw-rw-r-- 1 postgres postgres 8M Apr 18 13:56 dump.zstd

根据命令的输出,得出以下关于三种压缩方法的结论:

gzip:这是一种众所周知且广泛使用的压缩方法,可以在压缩率和压缩速度之间提供两行的平衡。

lz4:这是一种非常快的压缩算法,以较低的压缩比为代价提供较高的压缩和解压速度。Lz4压缩转出的文件在48-50MB范围,明显大于gzip压缩转储。

Zstd:这是一种比较新的压缩算法,压缩比高,压缩速度也不错。Zstd压缩转储的文件大小在8-8.5MB范围内,是三种压缩方法中最小的。

令人吃惊的是zstd压缩时间最少,其次是lz4和gzip。该数据可能不是测量和比较的最佳数据。默认压缩级别,zstd生成最小的转储文件大小,其次是lz4和gzip。在最大压缩级别,zstd仍然生成最小的转储文件大小,其次是gzip和lz4。

基于这些观察,如果首要任务是减少磁盘使用空间,zstd是推荐的压缩方法。但如果首要任务是减少压缩时间,则zstd和lz4都表现不错。如果担心与其他实用程序的兼容性,gzip仍然是一个可行的选择。

最后

PostgreSQL16中的pg_dump -Z/--compress将不仅仅支持整数。它可用于指定使用的压缩方法和级别。默认仍然是级别为 6 的gzip。但是块上的新方法lz4和zstd已经在这里了!

ff21325ea09b8f640828574b5386a2fe.png

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5e73a6048849bd7bda4947e39570b9011734114d

原文

https://www.cybertec-postgresql.com/en/lz4-zstd-pg_dump-compression-postgresql-16/

http://www.lryc.cn/news/63966.html

相关文章:

  • 网络安全基础入门学习路线
  • 错误检测技术:奇偶校验
  • 语义版本控制规范(SemVer)
  • 基于Flask的留言板的设计与实现
  • vmware 详细安装教程
  • Python 爬虫工具
  • 再也不去字节跳动面试了,6年测开经验的真实面试经历.....
  • 第十五章 角色移动旋转实例
  • 数据湖Data Lakehouse支持行级更改的策略:COW、MOR、Delete+Insert
  • 双亲委派机制的原理和作用
  • mac免费杀毒软件哪个好用?如何清理mac系统需要垃圾
  • css 实现太极效果
  • 【前端基础知识】Vue中的变量不是响应式的吗?属性赋值后视图不变化的原因是什么?
  • 如何完全卸载linux下通过rpm安装的mysql
  • [渗透教程]-004-长城防火墙GFW的原理
  • LaTeX基础文本排版命令
  • PLC模糊控制模糊PID(梯形图实现+算法分析)
  • 线程池在Java多线程中的应用
  • 1997-2021年全国30省技术市场成交额(亿元)
  • 【C++】面向对象之多态
  • 卡尔曼滤波器简介——多维卡尔曼滤波
  • 如何用 GPT-4 帮你写游戏?
  • R语言的贝叶斯时空数据模型实践技术应用
  • Lazysysadmin靶机渗透过程
  • 为什么网络安全缺口很大,招聘却很少?
  • SpringBoot手册
  • 【Linux】如何实现单机版QQ,来看进程间通信之管道
  • 从一到无穷大 #6 盘满排查过程
  • ChatGPT技术原理 第九章:数据集和训练技巧
  • NCR被攻击后服务中断!原是BlackCat勒索软件作祟