需求

  • 做gps导航类应用,每天每次台车都有3000左右的gps点位数据,按日活3000计算,一天900万条,正常数据库单表极限,至少需要按天分表
  • 2023-07-28更新,线上已经有8万台车,每天已经有200万数据,已经远超预期,业务上主要是按天查,速度还算不错

方案

  • 将点位数据和其他业务数据通过分库隔离(线上使用rds独立实例),避免影响其他业务
  • 使用ShardingSphere对点位数据按天分表
  • 由于数据线性输出,不需要分页

实践

  1. 引入依赖,使用了相对成熟的4.x版本
        <dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>sharding-jdbc-core</artifactId>
            <version>4.1.1</version>
        </dependency>
    					<dependency>
            <groupId>org.apache.shardingsphere</groupId>
            <artifactId>sharding-jdbc-spring-namespace</artifactId>
            <version>4.1.1</version>
        </dependency>
    
    					<!--注意连接池需要使用原生不带starter后缀的,不然会和shardingsphere的sql工厂类冲突-->
    					<dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.2.11</version>
        </dependency>
    
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.47</version>
        </dependency>
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.4.3.4</version>
        </dependency>
    
  2. 数据源配置,使用数据源全部给shardingsphere管理,不分表的数据走默认数据源方案
spring:
  jackson:
    time-zone: Asia/Shanghai
    date-format: yyyy-MM-dd HH:mm:ss
  shardingsphere:
    datasource:
      names: ds0,ds1
      ds0:
        url: jdbc:mysql://192.168.31.250:3306/ds0?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
        username: root
        password: root
        type: com.alibaba.druid.pool.DruidDataSource
        driverClassName: com.mysql.jdbc.Driver
        initialSize: 10
        minIdle: 10
        maxActive: 20
        maxWait: 60000
        timeBetweenEvictionRunsMillis: 60000
        minEvictableIdleTimeMillis: 300000
        validationQuery: SELECT 1
        poolPreparedStatements: true
        maxPoolPreparedStatementPerConnectionSize: 20
      ds1:
        url: jdbc:mysql://192.168.31.250:3306/ds1?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
        username: root
        password: root
        type: com.alibaba.druid.pool.DruidDataSource
        driverClassName: com.mysql.jdbc.Driver
        initialSize: 10
        minIdle: 10
        maxActive: 20
        maxWait: 60000
        timeBetweenEvictionRunsMillis: 60000
        minEvictableIdleTimeMillis: 300000
        validationQuery: SELECT 1
        poolPreparedStatements: true
        maxPoolPreparedStatementPerConnectionSize: 20
    sharding:
      defaultDataSourceName: ds0
  1. 分表规则,由于我只需要分一个表,所以只需要写一个tables
    sharding:
      defaultDataSourceName: ds0
      tables:
        track:
          actual-data-nodes: ds1.track_$->{2022..2200}_$->{1..12}_${1..31}
          table-strategy:
            standard:
              shardingColumn: locationTime
              #精确分表策略用于=,in
              preciseAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingAlgorithmByDay
              #范围分表策略用于< > between
              rangeAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingRangeAlgorithmByDay

两个分表逻辑类如下,简单实现,找不到走默认表:

import cn.hutool.core.date.DateUtil;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;

import java.util.Collection;
import java.util.Date;
import java.util.Objects;

public class DateShardingAlgorithmByDay implements PreciseShardingAlgorithm<Date> {
   final  static String tablePrefix = "track";
   final static  String linkChar = "_";

   @Override
   public String doSharding(Collection<String> tablesNames, PreciseShardingValue<Date> preciseShardingValue) {
       var dateValue = preciseShardingValue.getValue();
       //空时返回主表
       if(Objects.isNull(dateValue)) return tablePrefix;
       var suffix = DateUtil.year(dateValue) + linkChar + DateUtil.month(dateValue) + linkChar + DateUtil.dayOfMonth(dateValue);
       for (String tablesName : tablesNames) {
           if(tablesName.endsWith(suffix)) return tablesName;
       }
       return tablePrefix;
   }
}
import cn.hutool.core.date.DateUtil;
import com.baomidou.mybatisplus.core.toolkit.CollectionUtils;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingValue;

import java.util.*;
import java.util.stream.Collectors;

public class DateShardingRangeAlgorithmByDay implements RangeShardingAlgorithm<Date> {
    final  static String tablePrefix = "track";
    final static  String linkChar = "_";

    @Override
    public Collection<String> doSharding(Collection<String> tableNames, RangeShardingValue<Date> rangeShardingValue) {
        var startDate = rangeShardingValue.getValueRange().lowerEndpoint();
        var endDate = rangeShardingValue.getValueRange().upperEndpoint();
        List<String> result = new ArrayList<>();
        tableNames.forEach(x-> {
            var yyyyMMDD = Arrays.stream(x.split(linkChar)).skip(1).collect(Collectors.joining("_"));
            var tableDate = DateUtil.parseDate(yyyyMMDD);
            if ((startDate.before(tableDate) || Objects.equals(startDate, tableDate))
                    && (endDate.after(tableDate) || Objects.equals(startDate, tableDate))) {
                result.add(x);
            }
        });
        if(CollectionUtils.isEmpty(result)) return List.of(tablePrefix);
        return result;
    }
}
  1. 写一些简单的测试,需要测试的内容如下
    • 默认数据源可正常查询/写入
      • OK,shardingsphere正常解析并路由到了默认数据源
      • 由于默认数据原不是用于分表的,感觉这个解析有点多余
    • 分表track_xxxxxx表可正常建表
      • 失败,建表语句并不会根据分表规则路由,从日志上来看,都没解析
      • 不影响使用,并且shardingsphere是启动时加载mata信息的,后面新建的表他也不认
    • track表可根据分表逻辑按天写入数据
      • 失败,全部写到了默认track表里,原因是一个逻辑错误DateUtil.month(dateValue)是从0开始的,导致月份错误,
    • ttrack表可根据分表逻辑查询数据
      • 异常
        1. 没有尾部节点时异常,将范围查询代码改为如下代码
          	var endDate = range.hasUpperBound()? range.upperEndpoint():null;
          
        2. 提示 Table 'ldd_track.track_2022_11_31' doesn't exist,问题是11月根本没有31号

        调试了一下,他判断有多少表过多全是根据actual-data-nodes: ds1.track_$->{2022..2200}_$->{1..12}_${1..31} 这行来的,不考虑数据库里有几张表,在写范围匹配逻辑的时候要把负责规则的表过滤掉
        解决方案,用groovy的方式解决,细看了一下文档actual-data-nodes$->后面的部分其实是一个闭包表达式,只要能正常返回一个数组,可以随意写,如果有多个参数会取他们的积,这里直接改为天遍历即可, 这么看来groovy没白学,偶尔还用得上

        	actual-data-nodes: ds1.track_$->{(Date.parse("yyyy-MM-dd","2022-11-11")..Date.parse("yyyy-MM-dd","2023-12-31")).collect{ it.format("yyyy_M_d")}}
        

改进

  • 表名统一使用 track_20221111形式,不需要中间串
  • 使用Mybatis Plus的ds做数据源切换管理,避免非分表数据源做不必要的解析,同时解决建表时数据源不对的问题

调整后实现如下

  1. 添加依赖
        <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>dynamic-datasource-spring-boot-starter</artifactId>
            <version>3.5.2</version>
        </dependency>

  1. 配置
spring:
  datasource:
    dynamic:
      druid:
        initialSize: 10
        minIdle: 10
        maxActive: 20
        maxWait: 60000
        timeBetweenEvictionRunsMillis: 60000
        minEvictableIdleTimeMillis: 300000
        validationQuery: SELECT 1
        poolPreparedStatements: true
        maxPoolPreparedStatementPerConnectionSize: 20
        init-connection-sqls: SELECT 1
      datasource:
        master:
          url: jdbc:mysql://192.168.31.250:3306/ludingding?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
          username: root
          password: root
          driverClassName: com.mysql.jdbc.Driver
      primary: master
      strict: false
  shardingsphere:
    props:
      sql:
        show: true
    datasource:
      names: ds0
      ds0:
        url: jdbc:mysql://192.168.31.250:3306/ldd_track?useUnicode=true&characterEncoding=utf8&useSSL=false&serverTimezone=Asia/Shanghai
        username: root
        password: root
        type: com.alibaba.druid.pool.DruidDataSource
        driverClassName: com.mysql.jdbc.Driver
        initialSize: 10
        minIdle: 10
        maxActive: 20
        maxWait: 60000
        timeBetweenEvictionRunsMillis: 60000
        minEvictableIdleTimeMillis: 300000
        validationQuery: SELECT 1
        init-connection-sqls: SELECT 1
        poolPreparedStatements: true
        maxPoolPreparedStatementPerConnectionSize: 20
    sharding:
      tables:
        track:
          actual-data-nodes: ds0.track_$->{(Date.parse("yyyy-MM-dd","2022-11-11")..Date.parse("yyyy-MM-dd","2024-12-31")).collect{ it.format("yyyyMMdd")}}
          table-strategy:
            standard:
              shardingColumn: locationTime
              #精确分表策略用于=,in
              preciseAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingAlgorithmByDay
              #范围分表策略用于< > between
              rangeAlgorithmClassName: com.chebaidu.ludingding.tabletest.utils.DateShardingRangeAlgorithmByDay
  1. 数据源配置类,主要参考 https://blog.csdn.net/dls_123/article/details/114653348,但 shardingjdbc有四种数据源这部分注释好像不太对
import com.baomidou.dynamic.datasource.DynamicRoutingDataSource;
import com.baomidou.dynamic.datasource.provider.AbstractDataSourceProvider;
import com.baomidou.dynamic.datasource.provider.DynamicDataSourceProvider;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DataSourceProperty;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DynamicDataSourceCreatorAutoConfiguration;
import com.baomidou.dynamic.datasource.spring.boot.autoconfigure.DynamicDataSourceProperties;
import org.apache.shardingsphere.shardingjdbc.spring.boot.SpringBootConfiguration;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.autoconfigure.AutoConfigureBefore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Primary;

import javax.annotation.Resource;
import javax.sql.DataSource;
import java.util.Map;

//注意SpringBootConfiguration是DS的org.apache.shardingsphere.shardingjdbc.spring.boot.SpringBootConfiguration
//  不是springboot自己的,不然可能报错
@AutoConfigureBefore({DynamicDataSourceCreatorAutoConfiguration.class, SpringBootConfiguration.class})
@Configuration
public class DateSourceConfig {

        @Resource
        private DynamicDataSourceProperties properties;

        @Resource
        @Qualifier("shardingDataSource")
        private DataSource shardingSphereDataSource;


        @Bean
        public DynamicDataSourceProvider dynamicDataSourceProvider() {
            Map<String, DataSourceProperty> datasourceMap = properties.getDatasource();
            return new AbstractDataSourceProvider() {
                @Override
                public Map<String, DataSource> loadDataSources() {
                    Map<String, DataSource> dataSourceMap = createDataSourceMap(datasourceMap);
                    dataSourceMap.put("sharding", shardingSphereDataSource);
                    return dataSourceMap;
                }
            };
        }

        /**
         * 将DS源设置为首选的
         */
        @Primary
        @Bean
        public DataSource dataSource() {
            DynamicRoutingDataSource dataSource = new DynamicRoutingDataSource();
            dataSource.setPrimary(properties.getPrimary());
            dataSource.setStrict(properties.getStrict());
            dataSource.setStrategy(properties.getStrategy());
            dataSource.setP6spy(properties.getP6spy());
            dataSource.setSeata(properties.getSeata());
            return dataSource;
        }
}

  1. 分片策略类
import cn.hutool.core.date.DateUtil;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.PreciseShardingValue;

import java.util.Collection;
import java.util.Date;
import java.util.Objects;

public class DateShardingAlgorithmByDay implements PreciseShardingAlgorithm<Date> {
    final  static String tablePrefix = "track";

    @Override
    public String doSharding(Collection<String> tablesNames, PreciseShardingValue<Date> preciseShardingValue) {
        var dateValue = preciseShardingValue.getValue();
        //空时返回主表
        if(Objects.isNull(dateValue)) return tablePrefix;
        var suffix = DateUtil.format(dateValue, "yyyyMMdd");
        for (String tablesName : tablesNames) {
            if(tablesName.endsWith(suffix)) return tablesName;
        }
        return tablePrefix;
    }
}
import cn.hutool.core.date.DateUtil;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingAlgorithm;
import org.apache.shardingsphere.api.sharding.standard.RangeShardingValue;
import org.springframework.util.CollectionUtils;

import java.util.*;

public class DateShardingRangeAlgorithmByDay implements RangeShardingAlgorithm<Date> {
    final  static String tablePrefix = "track";
    final static  String linkChar = "_";
    final static Date defaultStartDate = DateUtil.parseDate("2022-11-11");
    final static Date defaultEndDay = DateUtil.parse("2024-12-31 23:59:59");


    @Override
    public Collection<String> doSharding(Collection<String> tableNames, RangeShardingValue<Date> rangeShardingValue) {
        var range = rangeShardingValue.getValueRange();
        var startDate = range.hasLowerBound()? range.lowerEndpoint():null;
        var endDate = range.hasUpperBound()? range.upperEndpoint():null;
        return getTableNameList(startDate,endDate);
    }

    List<String> getTableNameList(Date startDate,Date endDate){
        var now = new Date();
        startDate = Objects.isNull(startDate)? defaultStartDate:startDate;
        //如果endTime晚于当前时间,选当前时间
        endDate = Objects.isNull(endDate)?now:endDate.after(now)?now:endDate;
        var currentDay = startDate;
        List<String> tableNames = new ArrayList<>();
        while (true) {
            var suffix = DateUtil.format(currentDay, "yyyyMMdd");
            var tableName = tablePrefix + linkChar + suffix;
            tableNames.add(tableName);
            currentDay = DateUtil.offsetDay(currentDay, 1);
            if (currentDay.after(endDate)) break;
        }
        if(startDate.before(defaultStartDate) || endDate.after(defaultEndDay)) tableNames.add(tablePrefix);
        if(CollectionUtils.isEmpty(tableNames)) return List.of(tablePrefix);
        return tableNames;
    }
}
  1. 遗留问题:
    1. ShardingSphere 启动时扫描表很慢,建了两年的表700多个,启动扫表需要30秒,影响不大,这个扫表好像对分表没啥用,原则上可以直接从源码里干掉,看网上有人这么做
    2. 2023/07/28更新,此问题可以在shardingsphere加入下面配置,即每一次查询可以调用5个连接,这样就有5个并发同时搜索,时间就缩短了五倍以上,测试从80s->12s,同时这个也适用于查询加速,比如按天分表查询超过1天的就配多线程按表查询再聚合
    	 props:
      max:
        connections:
          size:
            per:
              query: 5