当前位置：首页 > news >正文

芝法酱学习笔记（2.3）——shardingsphere分库分表

news 2025/8/9 3:01:49

一、前言

之前的例子中，我们以一个简化了的销售单报表查询，展示了大数据量查询时，在索引和变量类型层面可以做的一些优化。可我们发现，无论怎么优化，一次查询都要好几秒。
这是一个现实问题，只要一个系统用户的业务足够多，运行时间足够长，数据库的单张表中就会存在海量的数据。数据量大到一定程度，无论怎么做性能都会下降。那是否有解决方法呢？
最容易想到的，就是分库分表了。其实在这个业务中，第一章已经给出了一个分库的方案，把不同用户放在不同的库中。然而单个用户的数据也可能较大，这时就需要分表了。
本节，就介绍市面上最主流的分库分表方案，shardingsphere。

二、代码展示

由于本节原理部分讲的很少，多数仅仅介绍shardingsphere如何使用。而该框架学习的难点仅仅在如何配置，理论部分涉及很少，故这次在开始位置，直接给出代码展示，大家可以看着代码，再看我后面的介绍。

三、shardingsphere配置

3.1 版本

本节使用的shardingsphere版本引用如下：

        <dependency><groupId>org.apache.shardingsphere</groupId><artifactId>shardingsphere-jdbc</artifactId><version>5.5.1</version></dependency>

3.2 yml配置

shardingsphere有一个复杂的yml配置，我们先看官方文档的介绍。
其实，看着官方文档还是很复杂，而且配置时总觉得心慌，我们可以看着源码做配置。由于我们本次也牵扯监控中心和企业中心，监控中心是不分表的，所以用SpringBoot的默认数据源即可。所以本次数据要我们手动配置。创建shardingsphere数据源的核心代码为：

 @Bean(name = "shardingSphereDataSource")public DataSource shardingSphereDataSource() throws SQLException, IOException {File file = new File(getClass().getClassLoader().getResource("shardingsphere.yml").getFile());DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(file);return dataSource;}

我们点进去这个YamlShardingSphereDataSourceFactory，可以看到配置类核心是这个结构体：

@Getter
@Setter
public final class YamlJDBCConfiguration implements YamlConfiguration {private String databaseName;private Map<String, Map<String, Object>> dataSources = new HashMap<>();private Collection<YamlRuleConfiguration> rules = new LinkedList<>();private YamlModeConfiguration mode;private YamlAuthorityRuleConfiguration authority;private YamlSQLParserRuleConfiguration sqlParser;private YamlTransactionRuleConfiguration transaction;private YamlGlobalClockRuleConfiguration globalClock;private YamlSQLFederationRuleConfiguration sqlFederation;private YamlSQLTranslatorRuleConfiguration sqlTranslator;private YamlLoggingRuleConfiguration logging;private Properties props = new Properties();......}

这里面的变量名，则是我们yml第一层的配置。我们本节只讲本例中用到的配置，其他细节大家可以看着官网仔细学习。

变量	名称	作用
databaseName	数据源名	配置数据源的名称，如果不是自动配置的，这个其实没啥用
dataSources	数据源	在该节点下，配置多个数据源
mode	模式	是单机模式还是集群模式，本例子配单机Standalone，还有配连接类型，本例配JDBC
rule	规则	该段配置是重中之重，配置分库分表的规则。该配置是一个数组，每种类型规则可以配一个
props	变量	一些sharding框架用的变量，本例中用于开启log信息

3.3 规则配置

我们对YamlRuleConfiguration这个类ctrl + H，可以看到每种规则的类，这样可以确定我们每种规则具体该怎么配置。
shardingsphere的规则配置中，数组元素对应哪个类，使用shardingsphere的一个特有的注释，如数据分片的配置是：- !SHARDING
我们可以在官方文档中，查看每种规则配置的注释，这里给出本例用到的配置

类型类名	类型	注释	作用
YamlShardingRuleConfiguration	分片规则	- !SHARDING	用于描述如何分库分表
YamlSingleRuleConfiguration	单表规则	- !SINGLE	用于扫描库中有哪些表，可以配置通配
YamlBroadcastRuleConfiguration	广播表规则	- !BROADCAST	用于描述哪些表会被用作连表，并且该表是没做分表的

3.3.1 分片规则

分片规则怎么配，我们可以结合官网文档和源码来看
官网文档：

rules:
- !SHARDINGtables: # 数据分片规则配置<logic_table_name> (+): # 逻辑表名称actualDataNodes (?): # 由数据源名 + 表名组成（参考 Inline 语法规则）databaseStrategy (?): # 分库策略，缺省表示使用默认分库策略，以下的分片策略只能选其一standard: # 用于单分片键的标准分片场景shardingColumn: # 分片列名称shardingAlgorithmName: # 分片算法名称complex: # 用于多分片键的复合分片场景shardingColumns: # 分片列名称，多个列以逗号分隔shardingAlgorithmName: # 分片算法名称hint: # Hint 分片策略shardingAlgorithmName: # 分片算法名称none: # 不分片tableStrategy: # 分表策略，同分库策略keyGenerateStrategy: # 分布式序列策略column: # 自增列名称，缺省表示不使用自增主键生成器keyGeneratorName: # 分布式序列算法名称auditStrategy: # 分片审计策略auditorNames: # 分片审计算法名称- <auditor_name>- <auditor_name>allowHintDisable: true # 是否禁用分片审计hintautoTables: # 自动分片表规则配置t_order_auto: # 逻辑表名称actualDataSources (?): # 数据源名称shardingStrategy: # 切分策略standard: # 用于单分片键的标准分片场景shardingColumn: # 分片列名称shardingAlgorithmName: # 自动分片算法名称bindingTables (+): # 绑定表规则列表- <logic_table_name_1, logic_table_name_2, ...> - <logic_table_name_1, logic_table_name_2, ...> defaultDatabaseStrategy: # 默认数据库分片策略defaultTableStrategy: # 默认表分片策略defaultKeyGenerateStrategy: # 默认的分布式序列策略defaultShardingColumn: # 默认分片列名称# 分片算法配置shardingAlgorithms:<sharding_algorithm_name> (+): # 分片算法名称type: # 分片算法类型props: # 分片算法属性配置# ...# 分布式序列算法配置keyGenerators:<key_generate_algorithm_name> (+): # 分布式序列算法名称type: # 分布式序列算法类型props: # 分布式序列算法属性配置# ...# 分片审计算法配置auditors:<sharding_audit_algorithm_name> (+): # 分片审计算法名称type: # 分片审计算法类型props: # 分片审计算法属性配置# ...- !BROADCASTtables: # 广播表规则列表- <table_name>- <table_name>

源码：

@RepositoryTupleEntity("sharding")
@Getter
@Setter
public final class YamlShardingRuleConfiguration implements YamlRuleConfiguration {@RepositoryTupleField(type = Type.TABLE)private Map<String, YamlTableRuleConfiguration> tables = new LinkedHashMap<>();@RepositoryTupleField(type = Type.TABLE)private Map<String, YamlShardingAutoTableRuleConfiguration> autoTables = new LinkedHashMap<>();@RepositoryTupleField(type = Type.TABLE)@RepositoryTupleKeyListNameGenerator(ShardingBindingTableRepositoryTupleKeyListNameGenerator.class)private Collection<String> bindingTables = new LinkedList<>();@RepositoryTupleField(type = Type.DEFAULT_STRATEGY)private YamlShardingStrategyConfiguration defaultDatabaseStrategy;@RepositoryTupleField(type = Type.DEFAULT_STRATEGY)private YamlShardingStrategyConfiguration defaultTableStrategy;@RepositoryTupleField(type = Type.DEFAULT_STRATEGY)private YamlKeyGenerateStrategyConfiguration defaultKeyGenerateStrategy;@RepositoryTupleField(type = Type.DEFAULT_STRATEGY)private YamlShardingAuditStrategyConfiguration defaultAuditStrategy;@RepositoryTupleField(type = Type.ALGORITHM)private Map<String, YamlAlgorithmConfiguration> shardingAlgorithms = new LinkedHashMap<>();@RepositoryTupleField(type = Type.ALGORITHM)private Map<String, YamlAlgorithmConfiguration> keyGenerators = new LinkedHashMap<>();@RepositoryTupleField(type = Type.ALGORITHM)private Map<String, YamlAlgorithmConfiguration> auditors = new LinkedHashMap<>();@RepositoryTupleField(type = Type.OTHER)private String defaultShardingColumn;@RepositoryTupleField(type = Type.OTHER)private YamlShardingCacheConfiguration shardingCache;@Overridepublic Class<ShardingRuleConfiguration> getRuleConfigurationType() {return ShardingRuleConfiguration.class;}
}

这里文档基本写的很清楚了，大家看着文档配就行了。
这里主要讲解一下actualDataNodes的表达式
我们需要告诉shardingsphere，一个逻辑表可能出现的实际表有哪些，以便在连表查询时，shardingsphere帮我们做关联。自定义分表算法的回调中，也能获取到这些值（虽然可能用不到）。
以本例中的配置为例：

      consign:#logicTable: consignactualDataNodes: ds${0..1}.consign_${2022..2024}${1..4},ds${0..1}.consign_0tableStrategy:standard:shardingAlgorithmName: year-month-shardingshardingColumn: bill_time_key

可以使用${}的形式，展示可能出现哪些情况。由于shardingsphere的设计问题，这里必须是数字，后面会讲解为什么。多种不同的可能，可以用","分割。
注意，要实现分库分表时的join操作正常，要把可能join的组合配置到bindingTables中，不然两个分表了的表join，该出现笛卡尔积了。

3.3.2 单表规则

这个配置必须配，不然只能查询到分片配置中已经配的逻辑表，会非常难绷。报错信息如下：

Cause: org.apache.shardingsphere.infra.exception.kernel.metadata.TableNotFoundException: Table or view 'item' does not exist.

该配置，可以用通配符，如：

  - !SINGLEtables:# 加载全部单表- "ds0.*"

3.3.3 广播表配置

如果连表时，分表了的表和没分表的表做连表，如本例中的consign连item，并且没有配置分库的列时，就会报如下错误：

### Cause: java.sql.SQLException: Unknown exception.
More details: java.lang.NullPointerException: Cannot invoke "String.equalsIgnoreCase(String)" because "shardingColumn" is null
; uncategorized SQLException; SQL state [HY000]; error code [30000]; Unknown exception.

此时，把item配进去即可

  - !BROADCASTtables: # 广播表规则列表- item

3.4 本例完整的配置

mode:type: Standalonerepository:type: JDBC
databaseName: mysql
dataSources:ds0:dataSourceClassName: com.zaxxer.hikari.HikariDataSourcedriverClassName: com.mysql.cj.jdbc.Driverurl: jdbc:mysql://192.168.0.64:3306/study2024-class009-busy001?useUnicode=true&characterEncoding=utf-8&useSSL=falseusername: dbMgrpassword: qqhilvMgAl@7ds1:dataSourceClassName: com.zaxxer.hikari.HikariDataSourcedriverClassName: com.mysql.cj.jdbc.Driverurl: jdbc:mysql://192.168.0.64:3306/study2024-class009-busy002?useUnicode=true&characterEncoding=utf-8&useSSL=falseusername: dbMgrpassword: qqhilvMgAl@7
rules:- !SHARDINGdefaultDatabaseStrategy:standard:shardingAlgorithmName: enterprise-shardingshardingColumn: enp_iddefaultTableStrategy:none:shardingAlgorithms:year-month-sharding:type: CUSTOM_YEAR_MONTHenterprise-sharding:type: ENTERPRISE-SHARDINGtables:consign:#logicTable: consignactualDataNodes: ds${0..1}.consign_${2022..2024}${1..4},ds${0..1}.consign_0tableStrategy:standard:shardingAlgorithmName: year-month-shardingshardingColumn: bill_time_keyconsign_header:#logicTable: consign_headeractualDataNodes: ds${0..1}.consign_header_${2022..2024}${1..4},ds${0..1}.consign_header_0tableStrategy:standard:shardingAlgorithmName: year-month-shardingshardingColumn: bill_time_keybindingTables:- consign_header,consign- !SINGLEtables:# 加载全部单表- "ds0.*"- !BROADCASTtables: # 广播表规则列表- item
props:sql-show: true

四、自定义分片算法

在实际开发中，我们通常不会使用系统自带的算法。我们都会做一个自己的分片规则。

4.1 算法编写

在本例中，我们写了两个分片算法，一个是表的分片，根据年份和季度。另一个是数据库的分片，根据jwt中的库信息，告诉系统去哪个库中查询。这里我们先展示代码，再进行讲解。

4.1.1 YearMonthTableShardingAlgorithm

public class YearMonthTableShardingAlgorithm implements StandardShardingAlgorithm<Long> {@Overridepublic String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Long> shardingValue) {String tableName = shardingValue.getLogicTableName();Long billTimeSecond = shardingValue.getValue();LocalDateTime localDateTime = CommonUtil.parseFromSecond(billTimeSecond);int year = localDateTime.getYear();int monVal = localDateTime.getMonthValue();int season = (monVal+2)/3;if(year < 2022){return tableName+"_0";}else{return tableName+"_"+year+season;}}@Overridepublic Collection<String> doSharding(Collection<String> collection, RangeShardingValue<Long> rangeShardingValue) {List<String> rtn = new ArrayList<String>();String tableName = rangeShardingValue.getLogicTableName();Long begTimeL = rangeShardingValue.getValueRange().lowerEndpoint();Long endTimeL = rangeShardingValue.getValueRange().upperEndpoint();LocalDateTime beginTime = CommonUtil.parseFromSecond(begTimeL);LocalDateTime endTime = CommonUtil.parseFromSecond(endTimeL);int yearBeg = beginTime.getYear();int yearEnd = endTime.getYear();int monBeg = beginTime.getMonthValue();int monEnd = endTime.getMonthValue();int seasonBeg = (monBeg+2)/3;int seasonEnd = (monEnd+2)/3;if(yearBeg < 2022){rtn.add(tableName+"_0");seasonBeg = 1;yearBeg = 2022;}for(int i = yearBeg; i <= yearEnd; i++){int curSeasonBeg = i > yearBeg ? 1:  seasonBeg;int curSeasonEnd = i < yearEnd ? 4 : seasonEnd;for(int j = curSeasonBeg; j <= curSeasonEnd; j++){rtn.add(tableName+"_"+i+""+j);}}return rtn;}@Overridepublic String getType() {return "CUSTOM_YEAR_MONTH"; // 自定义算法类型名称}}

4.1.2 YearMonthTableShardingAlgorithm

@Slf4j
public class EnterpriseShardingAlgorithm implements StandardShardingAlgorithm<Long> {@Overridepublic String getType() {return "ENTERPRISE-SHARDING"; // 自定义算法类型名称}@Overridepublic String doSharding(Collection<String> availableTargetNames, PreciseShardingValue<Long> shardingValue) {ITokenUtil tokenUtil = SpringUtil.getBean(ITokenUtil.class);String prompt = DatasourceSetUtil.getDbPrompt();if(StringUtils.hasText(prompt)){return prompt;}if(tokenUtil.hasTokenObject()){AuthObject authObject = tokenUtil.getAuthObject();return authObject.getDbCode();}return "ds0";}@Overridepublic Collection<String> doSharding(Collection<String> availableTargetNames, RangeShardingValue<Long> shardingValue) {String prompt = DatasourceSetUtil.getDbPrompt();if(StringUtils.hasText(prompt)){return List.of(prompt);}ITokenUtil tokenUtil = SpringUtil.getBean(ITokenUtil.class);if(tokenUtil.hasTokenObject()){AuthObject authObject = tokenUtil.getAuthObject();return List.of(authObject.getDbCode());}return List.of("ds0");}
}

4.1.3 讲解

这里，我们继承了StandardShardingAlgorithm，其实还可以继承ComplexKeysShardingAlgorithm或HintShardingAlgorithm，具体用法大家可以参见官方文档。我们这里仅详细讲下StandardShardingAlgorithm。
第一个回调，doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) ，该回调用于处理equal时的分片。而doSharding(Collection collection, RangeShardingValue rangeShardingValue)，用于处理范围查询时的分片。
getType回调，用于标识算法的名字，用于和配置关联。

4.2 meta-info的配置

仅仅写了算法，系统还不能识别，需要在Resource下的META-INFO.service中，配置都有哪些类是算法。

indi.zhifa.study2024.common.auth.sharding.YearMonthTableShardingAlgorithm
indi.zhifa.study2024.common.auth.sharding.EnterpriseShardingAlgorithm

五、数据源配置

如果手动配置数据源，并且结合mp使用，还是要在配置SqlSessionFactory时，像之前讲的一样，参考mp的自动配置，做一系列的操作。这里就不在帖子中展示那些冗余代码了，大家去参考代码中看。这里仅展示核心内容：

@Bean(name = "shardingSphereDataSource")public DataSource shardingSphereDataSource() throws SQLException, IOException {File file = new File(getClass().getClassLoader().getResource("shardingsphere.yml").getFile());DataSource dataSource = YamlShardingSphereDataSourceFactory.createDataSource(file);return dataSource;}@Bean(name = "shardingSqlSessionFactory")public SqlSessionFactory shardingSqlSessionFactory(@Qualifier("shardingSphereDataSource") DataSource dataSource) throws Exception {MybatisSqlSessionFactoryBean factory = new MybatisSqlSessionFactoryBean();factory.setDataSource(dataSource);enableMpSqlSessionFactory(factory);return factory.getObject();}@Primary@Bean(name = "shardingTransactionManager")public PlatformTransactionManager shardingTransactionManager(@Qualifier("shardingSphereDataSource") DataSource monitorDataSource) {return new DataSourceTransactionManager(monitorDataSource);}

六、遇到的坑

shardingsphere分表时有个坑，表名必须为逻辑名+_+数字
如consign_20221，千万不能写成consign_2022_1，不然在bindingTables的配置的检测过程时，会出错。
代码在文件
org.apache.shardingsphere.sharding.rule.checker.ShardingRuleChecker

private boolean isValidActualTableName(final ShardingTable sampleShardingTable, final ShardingTable shardingTable) {for (String each : sampleShardingTable.getActualDataSourceNames()) {Collection<String> sampleActualTableNames = sampleShardingTable.getActualTableNames(each).stream().map(actualTableName -> actualTableName.replace(sampleShardingTable.getTableDataNode().getPrefix(), "")).collect(Collectors.toSet());Collection<String> actualTableNames =shardingTable.getActualTableNames(each).stream().map(optional -> optional.replace(shardingTable.getTableDataNode().getPrefix(), "")).collect(Collectors.toSet());if (!sampleActualTableNames.equals(actualTableNames)) {return false;}}return true;}