阅读 86

sharding-JDBC内幕之subquery

subquery

子查询指一个查询语句嵌套在另一个查询语句内部的查询,在 SELECT 子句中先计算子查询,子查询结果作为外层另一个查询的过滤条件,查询可以基于一个表或者多个表,子查询中常用的操作符有 ANY(SOME)、ALL、IN 和 EXISTS。常见形式如下

数据中间件一般处理subquery思路是拆分SQL语句,先查询内部语句结果作为外部条件继续查询,整体流程类似下图

另外一种处理方式就是确定subquery内部查询与外部查询是否在同一个分片(db)上,如果可以确定这层关系,则可以将SQL直接发送到某个分片上,一般在非代理数据中间件实现中,可以考虑这种方式规避拆分subquery,尤其在oracle嵌套分页查询时需要面临这个问题。

sharding subquery

了解subquery后,我们来看看sharding-JDBC是怎么处理subquery的,希望读者对原理有了解后,能正确使用子查询,首先我们来验证上边这个场景,SQL如下

select * from t_order where order_id in (select order_id from t_order_item where order_name ='zhangsan')
复制代码

执行结果有点出乎意料

java.lang.IllegalStateException: Must have sharding column with subquery.

 at com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at org.apache.shardingsphere.core.route.router.sharding.ParsingSQLRouter.checkSubqueryShardingValues(ParsingSQLRouter.java:111)
 at org.apache.shardingsphere.core.route.router.sharding.ParsingSQLRouter.route(ParsingSQLRouter.java:75)
 at org.apache.shardingsphere.core.route.StatementRoutingEngine.route(StatementRoutingEngine.java:56)
复制代码

提示我们必须要分片键,我们可以把t_order_item表分片键改为order_name

 t_order_item:
      actualDataNodes: ds_0.t_order_item
      tableStrategy:
        inline:
          shardingColumn: order_name
          algorithmExpression: t_order_item
      databaseStrategy:
        inline:
          shardingColumn: order_name
          algorithmExpression: ds_0
复制代码

执行结果还是同样的错误,将SQL改写如下

select * from t_order where order_id in (select order_id from t_order_item where order_id =1001)
复制代码

执行正常,不猜了,看下源码是怎么处理的,这块和路由相关,先看下路由模块代码

    public SQLRouteResult route(final String logicSQL, final List<Object> parameters, final SQLStatement sqlStatement) {
      //此处OptimizedStatement属于ShardingSelectOptimizedStatement,主要是对SQL中的关注的部分从语法树中提取,比如order by , 聚合函数等
        OptimizedStatement optimizedStatement = ShardingOptimizeEngineFactory.newInstance(sqlStatement).optimize(shardingRule, shardingMetaData.getTable(), logicSQL, parameters, sqlStatement);
        boolean needMergeShardingValues = isNeedMergeShardingValues(optimizedStatement);
        if (optimizedStatement instanceof ShardingConditionOptimizedStatement && needMergeShardingValues) {
            checkSubqueryShardingValues(optimizedStatement, ((ShardingConditionOptimizedStatement) optimizedStatement).getShardingConditions());
            mergeShardingConditions(((ShardingConditionOptimizedStatement) optimizedStatement).getShardingConditions());
        }
        RoutingResult routingResult = RoutingEngineFactory.newInstance(shardingRule, shardingMetaData.getDataSource(), optimizedStatement).route();
        if (needMergeShardingValues) {
            Preconditions.checkState(1 == routingResult.getRoutingUnits().size(), "Must have one sharding with subquery.");
        }
        if (optimizedStatement instanceof ShardingInsertOptimizedStatement) {
            setGeneratedValues((ShardingInsertOptimizedStatement) optimizedStatement);
        }
        SQLRouteResult result = new SQLRouteResult(optimizedStatement);
        result.setRoutingResult(routingResult);
        return result;
    }
复制代码

optimize方法

    public ShardingSelectOptimizedStatement optimize(final ShardingRule shardingRule,
                                                     final ShardingTableMetaData shardingTableMetaData, final String sql, final List<Object> parameters, final SelectStatement sqlStatement) {
        WhereClauseShardingConditionEngine shardingConditionEngine = new WhereClauseShardingConditionEngine(shardingRule, shardingTableMetaData);
        WhereClauseEncryptConditionEngine encryptConditionEngine = new WhereClauseEncryptConditionEngine(shardingRule.getEncryptRule(), shardingTableMetaData);
        GroupByEngine groupByEngine = new GroupByEngine();
        OrderByEngine orderByEngine = new OrderByEngine();
        SelectItemsEngine selectItemsEngine = new SelectItemsEngine(shardingTableMetaData);
        PaginationEngine paginationEngine = new PaginationEngine();
      //核心点,获取条件,包括where条件和subquery条件
        List<ShardingCondition> shardingConditions = shardingConditionEngine.createShardingConditions(sqlStatement, parameters);
        List<EncryptCondition> encryptConditions = encryptConditionEngine.createEncryptConditions(sqlStatement);
        GroupBy groupBy = groupByEngine.createGroupBy(sqlStatement);
        OrderBy orderBy = orderByEngine.createOrderBy(sqlStatement, groupBy);
        SelectItems selectItems = selectItemsEngine.createSelectItems(sql, sqlStatement, groupBy, orderBy);
        Pagination pagination = paginationEngine.createPagination(sqlStatement, selectItems, parameters);
        ShardingSelectOptimizedStatement result = new ShardingSelectOptimizedStatement(sqlStatement, shardingConditions, encryptConditions, groupBy, orderBy, selectItems, pagination);
        setContainsSubquery(sqlStatement, result);
        return result;
    }
复制代码

createShardingConditions方法

    public List<ShardingCondition> createShardingConditions(final SQLStatement sqlStatement, final List<Object> parameters) {
        if (!(sqlStatement instanceof WhereSegmentAvailable)) {
            return Collections.emptyList();
        }
        List<ShardingCondition> result = new ArrayList<>();
      //获取where条件
        Optional<WhereSegment> whereSegment = ((WhereSegmentAvailable) sqlStatement).getWhere();
      //关键点,查询中涉及到的所有表
        Tables tables = new Tables(sqlStatement);
        if (whereSegment.isPresent()) {
            result.addAll(createShardingConditions(tables, whereSegment.get().getAndPredicates(), parameters));
        }
      //获取subquery条件
        Collection<SubqueryPredicateSegment> subqueryPredicateSegments = sqlStatement.findSQLSegments(SubqueryPredicateSegment.class);
        for (SubqueryPredicateSegment each : subqueryPredicateSegments) {
          //获取具体的subquery查询条件
            Collection<ShardingCondition> subqueryShardingConditions = createShardingConditions(tables, each.getAndPredicates(), parameters);
            if (!result.containsAll(subqueryShardingConditions)) {
                result.addAll(subqueryShardingConditions);
            }
        }
        return result;
    }
复制代码

我们就用上边SQL看下Tables tables涉及的表

从解析结果中可以知道并未对子查询内部表名提取,则暗含着内部与外部是同一个表,验证我们猜测对不对

createShardingConditions方法获取subquery条件

   private Collection<ShardingCondition> createShardingConditions(final Tables tables, final Collection<AndPredicate> andPredicates, final List<Object> parameters) {
        Collection<ShardingCondition> result = new LinkedList<>();
        for (AndPredicate each : andPredicates) {
        //从子查询条件中获取路由单元,即column-shardingValue数据结构
            Map<Column, Collection<RouteValue>> routeValueMap = createRouteValueMap(tables, each, parameters);
            if (routeValueMap.isEmpty()) {
                return Collections.emptyList();
            }
            result.add(createShardingCondition(routeValueMap));
        }
        return result;
    }
复制代码

createRouteValueMap方法

private Map<Column, Collection<RouteValue>> createRouteValueMap(final Tables tables, final AndPredicate andPredicate, final List<Object> parameters) {
    Map<Column, Collection<RouteValue>> result = new HashMap<>();
    for (PredicateSegment each : andPredicate.getPredicates()) {
        Optional<String> tableName = tables.findTableName(each.getColumn(), shardingTableMetaData);
        //判断当前条件中的column是不是分片键,上一步我们知道tables中只有t_order一个表
        //由于当前column是 order_name 并不是t_order表的分片键,所以此处并未构造出RouteValue,
        //所有就看到第一开始的异常信息 Must have sharding column with subquery.
        if (!tableName.isPresent() || !shardingRule.isShardingColumn(each.getColumn().getName(), tableName.get())) {
            continue;
        }
        Column column = new Column(each.getColumn().getName(), tableName.get());
        Optional<RouteValue> routeValue = ConditionValueGeneratorFactory.generate(each.getRightValue(), column, parameters);
        if (!routeValue.isPresent()) {
            continue;
        }
        if (!result.containsKey(column)) {
            result.put(column, new LinkedList<RouteValue>());
        }
        result.get(column).add(routeValue.get());
    }
    return result;
}
复制代码

到这块基本处理逻辑应该了解的差不多,subquery应该是处理自生表的嵌套子查询即一个表的嵌套子查询,那我们可以推断如下的SQL应该也是可以正常执行的

select * from t_order where order_id in (select order_name from t_order_item where order_id = 1001)
复制代码

直接结果和推断的一样,SQL直接发往其中一个db上,这明显不正确,因为order_name 和order_id 没有任何关系,并且t_order_item的分片键并不是order_id,分析到这基本结束。

the end

由上可见sharding-JDBC对子查询支持并没有我们想象的那么强大,希望读者了解原理后可以正确的使用sharding subquery,我觉得对于subquery处理还是有不足的地方,比如类似SQL select * from t_order where order_id in (select order_name from t_order_item where order_id = 1001) 涉及多表子查询应该抛异常获取提示信息,不能直接根据 subquery条件order_id = 1001 与外部表的 路由算法直接路由,因为可能存在两张表毫无关系的情况。希望读者清楚sharding-JDBC处理对subquery的处理逻辑,正确的使用subquery

关注下面的标签,发现更多相似文章
评论