Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer convert hash semi join with nulleq condition to cross semi join with other condition #57583

Closed
Tracked by #57366
windtalker opened this issue Nov 21, 2024 · 0 comments · Fixed by #58075
Closed
Tracked by #57366
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. severity/moderate sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@windtalker
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!
In TiDB, it will convert intersect/except into a hash join with nulleq join condition:

mysql> create table t1(id int, v1 int, v2 int, v3 int);
Query OK, 0 rows affected (1.47 sec)

mysql> create table t2(id int, v1 int, v2 int, v3 int);
Query OK, 0 rows affected (1.19 sec)
mysql> explain select id from t1 intersect select id from t1;
+------------------------------+----------+-----------+---------------+-------------------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                               |
+------------------------------+----------+-----------+---------------+-------------------------------------------------------------+
| HashJoin_7                   | 6400.00  | root      |               | semi join, equal:[nulleq(test.t1.id, test.t1.id)]           |
| ├─TableReader_18(Build)      | 10000.00 | root      |               | data:TableFullScan_17                                       |
| │ └─TableFullScan_17         | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                              |
| └─HashAgg_12(Probe)          | 8000.00  | root      |               | group by:test.t1.id, funcs:firstrow(test.t1.id)->test.t1.id |
|   └─TableReader_13           | 8000.00  | root      |               | data:HashAgg_8                                              |
|     └─HashAgg_8              | 8000.00  | cop[tikv] |               | group by:test.t1.id,                                        |
|       └─TableFullScan_11     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                              |
+------------------------------+----------+-----------+---------------+-------------------------------------------------------------+

As you can see, HashJoin_7 is a hash join with nulleq join condition
However, if the query is a little bit complex, TiDB optimizer will generate a CARTESIAN semi join

mysql> explain select t1.id from t1 join t2 on t1.v1 = t2.v2 intersect select t1.id from t1 join t2 on t1.v1 = t2.v2;
+----------------------------------+----------+-----------+---------------+----------------------------------------------------------------+
| id                               | estRows  | task      | access object | operator info                                                  |
+----------------------------------+----------+-----------+---------------+----------------------------------------------------------------+
| HashJoin_15                      | 6393.60  | root      |               | CARTESIAN semi join, other cond:nulleq(test.t1.id, test.t1.id) |
| ├─HashJoin_26(Build)             | 12487.50 | root      |               | inner join, equal:[eq(test.t1.v1, test.t2.v2)]                 |
| │ ├─TableReader_33(Build)        | 9990.00  | root      |               | data:Selection_32                                              |
| │ │ └─Selection_32               | 9990.00  | cop[tikv] |               | not(isnull(test.t2.v2))                                        |
| │ │   └─TableFullScan_31         | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                 |
| │ └─TableReader_30(Probe)        | 9990.00  | root      |               | data:Selection_29                                              |
| │   └─Selection_29               | 9990.00  | cop[tikv] |               | not(isnull(test.t1.v1))                                        |
| │     └─TableFullScan_28         | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                 |
| └─HashAgg_16(Probe)              | 7992.00  | root      |               | group by:test.t1.id, funcs:firstrow(test.t1.id)->test.t1.id    |
|   └─HashJoin_17                  | 12487.50 | root      |               | inner join, equal:[eq(test.t1.v1, test.t2.v2)]                 |
|     ├─TableReader_24(Build)      | 9990.00  | root      |               | data:Selection_23                                              |
|     │ └─Selection_23             | 9990.00  | cop[tikv] |               | not(isnull(test.t2.v2))                                        |
|     │   └─TableFullScan_22       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                 |
|     └─TableReader_21(Probe)      | 9990.00  | root      |               | data:Selection_20                                              |
|       └─Selection_20             | 9990.00  | cop[tikv] |               | not(isnull(test.t1.v1))                                        |
|         └─TableFullScan_19       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                 |
+----------------------------------+----------+-----------+---------------+----------------------------------------------------------------+

The direct cause seems in
https://github.com/pingcap/tidb/tree/c091dba89718599dc1b3d3e45f9b0308a8d49ef0/pkg/planner/core/operator/logicalop/logical_join.go#L200

For inner/semi join, it will collect all join's condition together with the predicate, and call LogicalJoin::extractOnCondition to re-extract the condition, and in
https://github.com/pingcap/tidb/tree/c091dba89718599dc1b3d3e45f9b0308a8d49ef0/pkg/planner/core/operator/logicalop/logical_join.go#L1214
only eq is treated as equal condition:
https://github.com/pingcap/tidb/tree/c091dba89718599dc1b3d3e45f9b0308a8d49ef0/pkg/planner/core/operator/logicalop/logical_join.go#L1257-L1261

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

@windtalker windtalker added type/bug The issue is confirmed as a bug. sig/planner SIG: Planner labels Nov 21, 2024
@hawkingrei hawkingrei self-assigned this Nov 21, 2024
@hawkingrei hawkingrei added the affects-8.5 This bug affects the 8.5.x(LTS) versions. label Dec 16, 2024
@hawkingrei hawkingrei added affects-8.1 This bug affects the 8.1.x(LTS) versions. may-affects-8.5 labels Dec 16, 2024
@hawkingrei hawkingrei added affects-7.5 This bug affects the 7.5.x(LTS) versions. and removed type/bug The issue is confirmed as a bug. sig/planner SIG: Planner severity/moderate may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 affects-8.1 This bug affects the 8.1.x(LTS) versions. may-affects-8.5 affects-8.5 This bug affects the 8.5.x(LTS) versions. labels Dec 16, 2024
@hawkingrei hawkingrei added type/bug The issue is confirmed as a bug. sig/planner SIG: Planner affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. severity/moderate affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. and removed may-affects-7.5 labels Dec 16, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in f744959 Dec 16, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. severity/moderate sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
4 participants