当前位置：首页 > news >正文

Clickhouse数据库部署、Python3压测实践

news 2025/8/23 10:55:29

Clickhouse数据库部署、Python3压测实践

一、Clickhouse数据库部署

版本：yandex/clickhouse-server:latest
部署方式：docker

内容

version: "3"services:clickhouse:image: yandex/clickhouse-server:latestcontainer_name: clickhouse    ports:- "8123:8123"- "9000:9000"- "9009:9009"- "9004:9004"volumes:- ./data/config:/var/lib/clickhouseulimits:nproc: 65535nofile:soft: 262144hard: 262144healthcheck:test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]interval: 30stimeout: 5sretries: 3deploy:resources:limits:cpus: '4'memory: 4096Mreservations:memory: 4096M

建表语句

CREATE TABLE test_table (id int,feild1 String, feild2 String, feild3 String, feild4 String, feild5 String, feild6 String, feild7 String, feild8 String, feild9 String, feild10 String, feild11 String, feild12 String, feild13 String, feild14 String, feild15 String, feild16 String, feild17 String, feild18 String, feild19 String, feild20 String) ENGINE = MergeTree：

二、Python3插入数据压测

关键库：clickhouse_driver、 concurrent.futures

代码：

import random
import time
from clickhouse_driver import Client
from concurrent.futures import ThreadPoolExecutor, as_completedclient = Client(host='ip')# 采用多个连接，避免单个连接被打死
clients = [Client(host='ip'),Client(host='ip'),Client(host='ip'),Client(host='ip')
]# 采用批量插入，经过测试，单条并发插入支持差，每秒只能执行2-5次insert
def task(i):sql = "INSERT INTO ck_table (id, feild1, feild2,feild3,feild4,feild5,feild6,feild7,feild8,feild9,feild10,feild11,feild12,feild13,feild14,feild15,feild16,feild17,feild18,feild19,feild20) VALUES"values = []for i in range(1000):values.append((random.randint(1,10000000),"feild1-"+str((random.randint(1,10000000))),"feild2-"+str(i),"feild3-"+str(i), "feild4-"+str(i), "feild5-"+str(i), "feild6-"+str(i), "feild7-"+str(i), "feild8-"+str(i), "feild9-"+str(i), "feild10-"+str(i), "feild11-"+str(i), "feild12-"+str(i), "feild13-"+str(i), "feild14-"+str(i), "feild15-"+str(i), "feild16-"+str(i), "feild17-"+str(i), "feild18-"+str(i), "feild19-"+str(i), "feild20-"+str(i)))clid = random.randint(1, len(clients)-1)clients[clid].execute(sql, values)return '第',clid, "插入",i, '条数据成功'if __name__ == '__main__':print ("程序开始运行")exec = ThreadPoolExecutor(max_workers=2)#ress = []start_time = time.perf_counter()for j in range(4000000):  # 总共需要执行的次数res = exec.submit(task,j)#ress.append(res)# for i in as_completed(ress):#     print("执行状态",i.result())print("执行耗时", time.perf_counter()-start_time,"s")

三、Python3查询数据测试

关键库：clickhouse_driver、 concurrent.futures

代码

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from clickhouse_driver import Clientclient = Client(host='10.10.16.110')query_sql = """select * from ck_table where feild2='feild2-1009' """def new_task(i):count_sql = """ select count(*) from ck_table"""time.sleep(1)return "执行第",i,"个任务",client.execute(count_sql)if __name__ == '__main__':print ("程序开始运行")thd_ques = []exec = ThreadPoolExecutor(max_workers=1)ress = []start_time = time.perf_counter()for j in range(1000):res = exec.submit(new_task,j)ress.append(res)for i in as_completed(ress):print("执行状态",i.result())print("执行耗时", time.perf_counter()-start_time,"s")

四、测试结论

clickhouse：21个字段表插入-查询测试, CPU200w数据以内 >100,峰值：133.6，均值：约110

1、不支持频繁插入（一般1-2次/s），否则会断联等报错，只能批插入（脚本使用2协程每次1000条没有报错，2个协程或者以上会出现断联等报错）

2、不支持频发查询，QPS官方建议100以内，否则CPU占用会很高，拉高服务器负载

3、查询效率：

一个条件where查询(Memery)：60W 0.33s

5个条件where查询(Memery)：80W 0.57s

5个条件where查询(Memery)：100W 0.54s

5个条件where查询(Memery)：112W 0.56s

5个条件where查询(Memery)：200W 0.565s

5个条件where查询(Memery)：500W 1.2s(停止插入的情况下)

5个条件where查询(Memery)：560W 1.97s(停止插入的情况下）

5个条件where查询(TinyLog)：7000W条 1分47秒

2个条件where查询(TinyLog)：1亿零460万条 89s

5个条件where查询(TinyLog)：1亿零460万条 84s

10个条件where查询(TinyLog)：1亿零460万条 87s

备注 450w条数据后，数据插入线程和查询线程只能存在一个，慢查询的内存消耗很高，16G内存不够用。5个条件where查询还能执行，在1-2s

（1）500w数据量服务器情况：（COPU均值在320左右，16G内存剩余在500-800M之间，停止写入/查询后，CPU恢复正常水平，内存剩余在800M左右）

total used free shared buff/cache available

15G 5.9G 519M 9.2M 9.1G 9.2G

%CPU %MEM

429.5 26.0

（2）1亿数据量服务器情况（1T磁盘消耗共38%，预计消耗6% ）

total used free shared buff/cache available

15G 2.7G 181M 9.2M 12G 12G

%CPU %MEM

103.7 3.6

总结：

1、不支持并发单条频繁插入，否则会报错，断联等造成数据丢失
2、不支持高并发查询，官方建议QPS<= 100，否则会增加服务器负载，CPU，内存等消耗过高
3、对服务器要求高，亿级CPU一般建议16核心以上，内存64G以上
4、优点是查询快，批量插入效率高，建议低频大批量插入