当前位置：首页 > news >正文

PostgreSQL16中pg_dump的LZ4和ZSTD压缩

news 2025/8/18 18:10:31

pg_dump压缩lz4和zstd

LZ4和ZSTD压缩算法合入了PG16。LZ4补丁的作者是Georgios Kokolatos。由Tomas Vondra提交。由Michael Paquier、Rachel Heaton、Justin Pryzby、Shi Yu 和 Tomas Vondra 审阅。提交消息是：

Expand pg_dump's compression streaming and file APIs to 
support the lz4 algorithm. The newly added compress_lz4.{c,h} 
files cover all the functionality of the aforementioned APIs. 
Minor changes were necessary in various pg_backup_* files, 
where code for the 'lz4' file suffix has been added, 
as well as pg_dump's compression option parsing.
Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, 
Shi Yu, Tomas Vondra
Discussion: 
https://postgr.es/m/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss%3D%40protonmail.com

ZSTD补丁的作者是Justin Pryzby。由Tomas Vondra提交。由Tomas Vondra、Jacob Champion 和 Andreas Karlsson 审阅。提交消息是：

Allow pg_dump to use the zstd compression, 
in addition to gzip/lz4. Bulk of the new compression method 
is implemented in compress_zstd.{c,h},covering the pg_dump 
compression APIs. The rest of the patch adds test and makesvarious places aware of the new compression method.
The zstd library (which this patch relies on) supports 
multithreaded compression since version 1.5. We however 
disallow that feature for now, as it might interfere with 
parallel backups on platforms that rely on threads 
(e.g. Windows). This can be improved / relaxed in the future.
This also fixes a minor issue in 
InitDiscoverCompressFileHandle(), which was not updated to 
check if the file already has the .lz4 extension.
Adding zstd compression was originally proposed in 2020 
(see the second thread), but then was reworked to use the 
new compression API introduced in e9960732a9. I've considered 
both threads when compiling the list of reviewers.
Author: Justin Pryzby
Reviewed-by: Tomas Vondra, Jacob Champion, Andreas Karlsson
Discussion: https://postgr.es/m/20230224191840.GD1653@telsasoft.com
Discussion: https://postgr.es/m/20201221194924.GI30237@telsasoft.com

尝试下

~$ pg_dump --version
pg_dump (PostgreSQL) 16devel
~$ pgbench --initialize --scale=100
dropping old tables...
NOTICE: table "pgbench_accounts" does not exist, skipping
NOTICE: table "pgbench_branches" does not exist, skipping
NOTICE: table "pgbench_history" does not exist, skipping
NOTICE: table "pgbench_tellers" does not exist, skipping
creating tables...
generating data (client-side)...
10000000 of 10000000 tuples (100%) done (elapsed 39.52 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 49.65 s (drop tables 0.00 s, create tables 0.08 s, client-side generate 39.96 s, vacuum 0.29 s, primary keys 9.32 s).
~$ psql --command="select pg_size_pretty(pg_database_size('postgres'))"
pg_size_pretty 
----------------
1503 MB
(1 row)
~$ time pg_dump --format=custom --compress=lz4:9 > dump.lz4
real 0m10.507s
user 0m9.901s
sys 0m0.436s
~$ time pg_dump --format=custom --compress=zstd:9 > dump.zstd
real 0m8.794s
user 0m8.393s
sys 0m0.364s
~$ time pg_dump --format=custom --compress=gzip:9 > dump.gz
real 0m14.245s
user 0m13.064s
sys 0m0.978s
~$ time pg_dump --format=custom --compress=lz4 > dump_default.lz4
real 0m6.809s
user 0m1.666s
sys 0m1.125s
~$ time pg_dump --format=custom --compress=zstd > dump_default.zstd
real 0m7.534s
user 0m2.428s
sys 0m0.892s
~$ time pg_dump --format=custom --compress=gzip > dump_default.gz
real 0m11.564s
user 0m10.661s
sys 0m0.525s
~$ time pg_dump --format=custom --compress=lz4:3 > dump_3.lz4
real 0m8.497s
user 0m7.856s
sys 0m0.507s
~$ time pg_dump --format=custom --compress=zstd:3 > dump_3.zstd
real 0m5.129s
user 0m2.228s
sys 0m0.726s
~$ time pg_dump --format=custom --compress=gzip:3 > dump_3.gz
real 0m4.468s
user 0m3.654s
sys 0m0.504s
~$ ls -l --block-size=M
total 250M
-rw-rw-r-- 1 postgres postgres 28M Apr 18 13:58 dump_3.gz
-rw-rw-r-- 1 postgres postgres 48M Apr 18 13:57 dump_3.lz4
-rw-rw-r-- 1 postgres postgres 8M Apr 18 13:58 dump_3.zstd
-rw-rw-r-- 1 postgres postgres 27M Apr 18 13:57 dump_default.gz
-rw-rw-r-- 1 postgres postgres 50M Apr 18 13:56 dump_default.lz4
-rw-rw-r-- 1 postgres postgres 8M Apr 18 13:57 dump_default.zstd
-rw-rw-r-- 1 postgres postgres 27M Apr 18 13:56 dump.gz
-rw-rw-r-- 1 postgres postgres 48M Apr 18 13:55 dump.lz4
-rw-rw-r-- 1 postgres postgres 8M Apr 18 13:56 dump.zstd

根据命令的输出，得出以下关于三种压缩方法的结论：

gzip:这是一种众所周知且广泛使用的压缩方法，可以在压缩率和压缩速度之间提供两行的平衡。

lz4:这是一种非常快的压缩算法，以较低的压缩比为代价提供较高的压缩和解压速度。Lz4压缩转出的文件在48-50MB范围，明显大于gzip压缩转储。

Zstd：这是一种比较新的压缩算法，压缩比高，压缩速度也不错。Zstd压缩转储的文件大小在8-8.5MB范围内，是三种压缩方法中最小的。

令人吃惊的是zstd压缩时间最少，其次是lz4和gzip。该数据可能不是测量和比较的最佳数据。默认压缩级别，zstd生成最小的转储文件大小，其次是lz4和gzip。在最大压缩级别，zstd仍然生成最小的转储文件大小，其次是gzip和lz4。

基于这些观察，如果首要任务是减少磁盘使用空间，zstd是推荐的压缩方法。但如果首要任务是减少压缩时间，则zstd和lz4都表现不错。如果担心与其他实用程序的兼容性，gzip仍然是一个可行的选择。