需求
- 做gps导航类应用,每天每次台车都有3000左右的gps点位数据,按日活3000计算,一天900万条,正常数据库单表极限,至少需要按天分表
- 2023-07-28更新,线上已经有8万台车,每天已经有200万数据,已经远超预期,业务上主要是按天查,速度还算不错
方案
- 将点位数据和其他业务数据通过分库隔离(线上使用rds独立实例),避免影响其他业务
- 使用ShardingSphere对点位数据按天分表
- 由于数据线性输出,不需要分页
实践
- 引入依赖,使用了相对成熟的4.x版本
<dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>sharding-jdbc-core</artifactId> <version>4.1.1</version> </dependency> <dependency> <groupId>org.apache.shardingsphere</groupId> <artifactId>sharding-jdbc-spring-namespace</artifactId> <version>4.1.1</version> </dependency> <!--注意连接池需要使用原生不带starter后缀的,不然会和shardingsphere的sql工厂类冲突--> <dependency> <groupId>com.alibaba</groupId> <artifactId>druid</artifactId> <version>1.2.11</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.47</version> </dependency> <dependency> <groupId>com.baomidou</groupId> <artifactId>mybatis-plus-boot-starter</artifactId> <version>3.4.3.4</version> </dependency>
- 数据源配置,使用数据源全部给shardingsphere管理,不分表的数据走默认数据源方案
spring:
jackson:
time-zone: Asia/Shanghai
date-format: yyyy-MM-dd HH:mm:ss
shardingsphere:
datasource:
names: ds0,ds1
ds0:
url: jdbc:mysql://192.168.31.250:3306/ds0?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
username: root
password: root
type: com.alibaba.druid.pool.DruidDataSource
driverClassName: com.mysql.jdbc.Driver
initialSize: 10
minIdle: 10
maxActive: 20
maxWait: 60000
timeBetweenEvictionRunsMillis: 60000
minEvictableIdleTimeMillis: 300000
validationQuery: SELECT 1
poolPreparedStatements: true
maxPoolPreparedStatementPerConnectionSize: 20
ds1:
url: jdbc:mysql://192.168.31.250:3306/ds1?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
username: root
password: root
type: com.alibaba.druid.pool.DruidDataSource
driverClassName: com.mysql.jdbc.Driver
initialSize: 10
minIdle: 10
maxActive: 20
maxWait: 60000
timeBetweenEvictionRunsMillis: 60000
minEvictableIdleTimeMillis: 300000
validationQuery: SELECT 1
poolPreparedStatements: true
maxPoolPreparedStatementPerConnectionSize: 20
sharding:
defaultDataSourceName: ds0
- 分表规则,由于我只需要分一个表,所以只需要写一个tables
sharding:
defaultDataSourceName: ds0
tables:
track:
actual-data-nodes: ds1.track_$->{2022..2200}_$->{1..12}_${1..31}
table-strategy:
standard:
shardingColumn: locationTime
#精确分表策略用于=,in
preciseAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingAlgorithmByDay
#范围分表策略用于< > between
rangeAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingRangeAlgorithmByDay
两个分表逻辑类如下,简单实现,找不到走默认表:
import cn.hutool.core.date.DateUtil;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;
import java.util.Collection;
import java.util.Date;
import java.util.Objects;
public class DateShardingAlgorithmByDay implements PreciseShardingAlgorithm<Date> {
final static String tablePrefix = "track";
final static String linkChar = "_";
@Override
public String doSharding(Collection<String> tablesNames, PreciseShardingValue<Date> preciseShardingValue) {
var dateValue = preciseShardingValue.getValue();
//空时返回主表
if(Objects.isNull(dateValue)) return tablePrefix;
var suffix = DateUtil.year(dateValue) + linkChar + DateUtil.month(dateValue) + linkChar + DateUtil.dayOfMonth(dateValue);
for (String tablesName : tablesNames) {
if(tablesName.endsWith(suffix)) return tablesName;
}
return tablePrefix;
}
}
import cn.hutool.core.date.DateUtil;
import com.baomidou.mybatisplus.core.toolkit.CollectionUtils;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingValue;
import java.util.*;
import java.util.stream.Collectors;
public class DateShardingRangeAlgorithmByDay implements RangeShardingAlgorithm<Date> {
final static String tablePrefix = "track";
final static String linkChar = "_";
@Override
public Collection<String> doSharding(Collection<String> tableNames, RangeShardingValue<Date> rangeShardingValue) {
var startDate = rangeShardingValue.getValueRange().lowerEndpoint();
var endDate = rangeShardingValue.getValueRange().upperEndpoint();
List<String> result = new ArrayList<>();
tableNames.forEach(x-> {
var yyyyMMDD = Arrays.stream(x.split(linkChar)).skip(1).collect(Collectors.joining("_"));
var tableDate = DateUtil.parseDate(yyyyMMDD);
if ((startDate.before(tableDate) || Objects.equals(startDate, tableDate))
&& (endDate.after(tableDate) || Objects.equals(startDate, tableDate))) {
result.add(x);
}
});
if(CollectionUtils.isEmpty(result)) return List.of(tablePrefix);
return result;
}
}
- 写一些简单的测试,需要测试的内容如下
- 默认数据源可正常查询/写入
- OK,shardingsphere正常解析并路由到了默认数据源
- 由于默认数据原不是用于分表的,感觉这个解析有点多余
- 分表track_xxxxxx表可正常建表
- 失败,建表语句并不会根据分表规则路由,从日志上来看,都没解析
- 不影响使用,并且shardingsphere是启动时加载mata信息的,后面新建的表他也不认
- track表可根据分表逻辑按天写入数据
- 失败,全部写到了默认track表里,原因是一个逻辑错误DateUtil.month(dateValue)是从0开始的,导致月份错误,
- ttrack表可根据分表逻辑查询数据
- 异常
- 没有尾部节点时异常,将范围查询代码改为如下代码
var endDate = range.hasUpperBound()? range.upperEndpoint():null;
- 提示 Table 'ldd_track.track_2022_11_31' doesn't exist,问题是11月根本没有31号
调试了一下,他判断有多少表过多全是根据
actual-data-nodes: ds1.track_$->{2022..2200}_$->{1..12}_${1..31}
这行来的,不考虑数据库里有几张表,在写范围匹配逻辑的时候要把负责规则的表过滤掉
解决方案,用groovy的方式解决,细看了一下文档actual-data-nodes$->后面的部分其实是一个闭包表达式,只要能正常返回一个数组,可以随意写,如果有多个参数会取他们的积,这里直接改为天遍历即可, 这么看来groovy没白学,偶尔还用得上actual-data-nodes: ds1.track_$->{(Date.parse("yyyy-MM-dd","2022-11-11")..Date.parse("yyyy-MM-dd","2023-12-31")).collect{ it.format("yyyy_M_d")}}
- 没有尾部节点时异常,将范围查询代码改为如下代码
- 异常
- 默认数据源可正常查询/写入
改进
- 表名统一使用 track_20221111形式,不需要中间串
- 使用Mybatis Plus的ds做数据源切换管理,避免非分表数据源做不必要的解析,同时解决建表时数据源不对的问题
调整后实现如下
- 添加依赖
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>dynamic-datasource-spring-boot-starter</artifactId>
<version>3.5.2</version>
</dependency>
- 配置
spring:
datasource:
dynamic:
druid:
initialSize: 10
minIdle: 10
maxActive: 20
maxWait: 60000
timeBetweenEvictionRunsMillis: 60000
minEvictableIdleTimeMillis: 300000
validationQuery: SELECT 1
poolPreparedStatements: true
maxPoolPreparedStatementPerConnectionSize: 20
init-connection-sqls: SELECT 1
datasource:
master:
url: jdbc:mysql://192.168.31.250:3306/ludingding?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
username: root
password: root
driverClassName: com.mysql.jdbc.Driver
primary: master
strict: false
shardingsphere:
props:
sql:
show: true
datasource:
names: ds0
ds0:
url: jdbc:mysql://192.168.31.250:3306/ldd_track?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
username: root
password: root
type: com.alibaba.druid.pool.DruidDataSource
driverClassName: com.mysql.jdbc.Driver
initialSize: 10
minIdle: 10
maxActive: 20
maxWait: 60000
timeBetweenEvictionRunsMillis: 60000
minEvictableIdleTimeMillis: 300000
validationQuery: SELECT 1
init-connection-sqls: SELECT 1
poolPreparedStatements: true
maxPoolPreparedStatementPerConnectionSize: 20
sharding:
tables:
track:
actual-data-nodes: ds0.track_$->{(Date.parse("yyyy-MM-dd","2022-11-11")..Date.parse("yyyy-MM-dd","2024-12-31")).collect{ it.format("yyyyMMdd")}}
table-strategy:
standard:
shardingColumn: locationTime
#精确分表策略用于=,in
preciseAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingAlgorithmByDay
#范围分表策略用于< > between
rangeAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingRangeAlgorithmByDay
- 数据源配置类,主要参考 https://blog.csdn.net/dls_123/article/details/114653348,但 shardingjdbc有四种数据源这部分注释好像不太对
import com.baomidou.dynamic.datasource.DynamicRoutingDataSource;
import com.baomidou.dynamic.datasource.provider.AbstractDataSourceProvider;
import com.baomidou.dynamic.datasource.provider.DynamicDataSourceProvider;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DataSourceProperty;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DynamicDataSourceCreatorAutoConfiguration;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DynamicDataSourceProperties;
import org.apache.shardingsphere.shardingjdbc.spring.boot.SpringBootConfiguration;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.autoconfigure.AutoConfigureBefore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;
import javax.annotation.Resource;
import javax.sql.DataSource;
import java.util.Map;
//注意SpringBootConfiguration是DS的org.apache.shardingsphere.shardingjdbc.spring.boot.SpringBootConfiguration
// 不是springboot自己的,不然可能报错
@AutoConfigureBefore({DynamicDataSourceCreatorAutoConfiguration.class, SpringBootConfiguration.class})
@Configuration
public class DateSourceConfig {
@Resource
private DynamicDataSourceProperties properties;
@Resource
@Qualifier("shardingDataSource")
private DataSource shardingSphereDataSource;
@Bean
public DynamicDataSourceProvider dynamicDataSourceProvider() {
Map<String, DataSourceProperty> datasourceMap = properties.getDatasource();
return new AbstractDataSourceProvider() {
@Override
public Map<String, DataSource> loadDataSources() {
Map<String, DataSource> dataSourceMap = createDataSourceMap(datasourceMap);
dataSourceMap.put("sharding", shardingSphereDataSource);
return dataSourceMap;
}
};
}
/**
* 将DS源设置为首选的
*/
@Primary
@Bean
public DataSource dataSource() {
DynamicRoutingDataSource dataSource = new DynamicRoutingDataSource();
dataSource.setPrimary(properties.getPrimary());
dataSource.setStrict(properties.getStrict());
dataSource.setStrategy(properties.getStrategy());
dataSource.setP6spy(properties.getP6spy());
dataSource.setSeata(properties.getSeata());
return dataSource;
}
}
- 分片策略类
import cn.hutool.core.date.DateUtil;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;
import java.util.Collection;
import java.util.Date;
import java.util.Objects;
public class DateShardingAlgorithmByDay implements PreciseShardingAlgorithm<Date> {
final static String tablePrefix = "track";
@Override
public String doSharding(Collection<String> tablesNames, PreciseShardingValue<Date> preciseShardingValue) {
var dateValue = preciseShardingValue.getValue();
//空时返回主表
if(Objects.isNull(dateValue)) return tablePrefix;
var suffix = DateUtil.format(dateValue, "yyyyMMdd");
for (String tablesName : tablesNames) {
if(tablesName.endsWith(suffix)) return tablesName;
}
return tablePrefix;
}
}
import cn.hutool.core.date.DateUtil;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingValue;
import org.springframework.util.CollectionUtils;
import java.util.*;
public class DateShardingRangeAlgorithmByDay implements RangeShardingAlgorithm<Date> {
final static String tablePrefix = "track";
final static String linkChar = "_";
final static Date defaultStartDate = DateUtil.parseDate("2022-11-11");
final static Date defaultEndDay = DateUtil.parse("2024-12-31 23:59:59");
@Override
public Collection<String> doSharding(Collection<String> tableNames, RangeShardingValue<Date> rangeShardingValue) {
var range = rangeShardingValue.getValueRange();
var startDate = range.hasLowerBound()? range.lowerEndpoint():null;
var endDate = range.hasUpperBound()? range.upperEndpoint():null;
return getTableNameList(startDate,endDate);
}
List<String> getTableNameList(Date startDate,Date endDate){
var now = new Date();
startDate = Objects.isNull(startDate)? defaultStartDate:startDate;
//如果endTime晚于当前时间,选当前时间
endDate = Objects.isNull(endDate)?now:endDate.after(now)?now:endDate;
var currentDay = startDate;
List<String> tableNames = new ArrayList<>();
while (true) {
var suffix = DateUtil.format(currentDay, "yyyyMMdd");
var tableName = tablePrefix + linkChar + suffix;
tableNames.add(tableName);
currentDay = DateUtil.offsetDay(currentDay, 1);
if (currentDay.after(endDate)) break;
}
if(startDate.before(defaultStartDate) || endDate.after(defaultEndDay)) tableNames.add(tablePrefix);
if(CollectionUtils.isEmpty(tableNames)) return List.of(tablePrefix);
return tableNames;
}
}
- 遗留问题:
- ShardingSphere 启动时扫描表很慢,建了两年的表700多个,启动扫表需要30秒,影响不大,这个扫表好像对分表没啥用,原则上可以直接从源码里干掉,看网上有人这么做
- 2023/07/28更新,此问题可以在shardingsphere加入下面配置,即每一次查询可以调用5个连接,这样就有5个并发同时搜索,时间就缩短了五倍以上,测试从80s->12s,同时这个也适用于查询加速,比如按天分表查询超过1天的就配多线程按表查询再聚合
props: max: connections: size: per: query: 5