-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](mtmv) Fix getting related partition table wrongly when multi base partition table exists #34781
[fix](mtmv) Fix getting related partition table wrongly when multi base partition table exists #34781
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 41800 ms
|
TPC-DS: Total hot run time: 187483 ms
|
a4ad559
to
12d0109
Compare
run buildall |
12d0109
to
9c1f063
Compare
run buildall |
TPC-H: Total hot run time: 40779 ms
|
TPC-DS: Total hot run time: 170445 ms
|
ClickBench: Total hot run time: 30.74 s
|
run buildall |
run compile |
run buildall |
…se partition table exists
a2688f8
to
a280d9e
Compare
run buildall |
TPC-H: Total hot run time: 41141 ms
|
TPC-DS: Total hot run time: 169590 ms
|
ClickBench: Total hot run time: 30.42 s
|
...-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/MaterializedViewUtils.java
Outdated
Show resolved
Hide resolved
List<Object> catalogRelationObjs = materializedViewPlan.collectToList( | ||
planTreeNode -> planTreeNode instanceof CatalogRelation); | ||
ImmutableMultimap.Builder<TableIdentifier, CatalogRelation> tableCatalogRelationMultimapBuilder = | ||
ImmutableMultimap.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use expectedSize builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableMultimap.Builder seems doesn't have expectedSize builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImmutableMap.builderWithExpectedSize()
run buildall |
TPC-H: Total hot run time: 41188 ms
|
TPC-DS: Total hot run time: 167213 ms
|
ClickBench: Total hot run time: 30.56 s
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by anyone and no changes requested. |
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…se partition table exists (#34781) Fix getting related partition table wrongly when multi base partition table exists such as base table def is as following: CREATE TABLE `test1` ( `pre_batch_no` VARCHAR(100) NULL COMMENT 'pre_batch_no', `batch_no` VARCHAR(100) NULL COMMENT 'batch_no', `vin_type1` VARCHAR(50) NULL COMMENT 'vin', `upgrade_day` date COMMENT 'upgrade_day' ) ENGINE=OLAP unique KEY(`pre_batch_no`,`batch_no`, `vin_type1`, `upgrade_day`) COMMENT 'OLAP' PARTITION BY RANGE(`upgrade_day`) ( FROM ("2024-03-20") TO ("2024-03-31") INTERVAL 1 DAY ) DISTRIBUTED BY HASH(`vin_type1`) BUCKETS 10 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE `test2` ( `batch_no` VARCHAR(100) NULL COMMENT 'batch_no', `vin_type2` VARCHAR(50) NULL COMMENT 'vin', `status` VARCHAR(50) COMMENT 'status', `upgrade_day` date not null COMMENT 'upgrade_day' ) ENGINE=OLAP Duplicate KEY(`batch_no`,`vin_type2`) COMMENT 'OLAP' PARTITION BY RANGE(`upgrade_day`) ( FROM ("2024-01-01") TO ("2024-01-10") INTERVAL 1 DAY ) DISTRIBUTED BY HASH(`vin_type2`) BUCKETS 10 PROPERTIES ( "replication_num" = "1" ); if you create partition mv which partition by ` t1.upgrade_day` as following it will be successful select t1.upgrade_day, t1.batch_no, t1.vin_type1 from ( SELECT batch_no, vin_type1, upgrade_day FROM test1 where batch_no like 'c%' group by batch_no, vin_type1, upgrade_day ) t1 left join ( select batch_no, vin_type2, status from test2 group by batch_no, vin_type2, status ) t2 on t1.vin_type1 = t2.vin_type2;
…se partition table exists (#34781) Fix getting related partition table wrongly when multi base partition table exists such as base table def is as following: CREATE TABLE `test1` ( `pre_batch_no` VARCHAR(100) NULL COMMENT 'pre_batch_no', `batch_no` VARCHAR(100) NULL COMMENT 'batch_no', `vin_type1` VARCHAR(50) NULL COMMENT 'vin', `upgrade_day` date COMMENT 'upgrade_day' ) ENGINE=OLAP unique KEY(`pre_batch_no`,`batch_no`, `vin_type1`, `upgrade_day`) COMMENT 'OLAP' PARTITION BY RANGE(`upgrade_day`) ( FROM ("2024-03-20") TO ("2024-03-31") INTERVAL 1 DAY ) DISTRIBUTED BY HASH(`vin_type1`) BUCKETS 10 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE `test2` ( `batch_no` VARCHAR(100) NULL COMMENT 'batch_no', `vin_type2` VARCHAR(50) NULL COMMENT 'vin', `status` VARCHAR(50) COMMENT 'status', `upgrade_day` date not null COMMENT 'upgrade_day' ) ENGINE=OLAP Duplicate KEY(`batch_no`,`vin_type2`) COMMENT 'OLAP' PARTITION BY RANGE(`upgrade_day`) ( FROM ("2024-01-01") TO ("2024-01-10") INTERVAL 1 DAY ) DISTRIBUTED BY HASH(`vin_type2`) BUCKETS 10 PROPERTIES ( "replication_num" = "1" ); if you create partition mv which partition by ` t1.upgrade_day` as following it will be successful select t1.upgrade_day, t1.batch_no, t1.vin_type1 from ( SELECT batch_no, vin_type1, upgrade_day FROM test1 where batch_no like 'c%' group by batch_no, vin_type1, upgrade_day ) t1 left join ( select batch_no, vin_type2, status from test2 group by batch_no, vin_type2, status ) t2 on t1.vin_type1 = t2.vin_type2;
… optimize the fail reason (#35562) this depends on #34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (#35562) this depends on #34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
… optimize the fail reason (apache#35562) this depends on apache#34781 1. Materialized view partition track supports date_trunc and optimize the fail reason. 2. it supports create partition mv as following: this mv will be partition updated by day CREATE MATERIALIZED VIEW mv_6 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(date_trunc(date_alias, 'day')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS SELECT date_trunc(t1.L_SHIPDATE, 'hour') as date_alias, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS, count(distinct case when t1.L_SUPPKEY > 0 then t2.O_ORDERSTATUS else null end) as cnt_1 from (select * from lineitem where L_SHIPDATE in ('2017-01-30')) t1 left join (select * from orders where O_ORDERDATE in ('2017-01-30')) t2 on t1.L_ORDERKEY = t2.O_ORDERKEY group by t1.L_SHIPDATE, t2.O_ORDERDATE, t1.L_QUANTITY, t2.O_ORDERSTATUS;
…elated partition side (#43531) ### What problem does this PR solve? Related PR: #34781 Problem Summary: Table def as following, if create partition mv as following will throw exception ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to find a suitable base table for partitioning, the fail reason is can't not find valid partition track column CREATE MATERIALIZED VIEW mv_10086 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(l_orderkey) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l1.*, O_CUSTKEY from lineitem_list_partition l1 left outer join orders_list_partition on l1.l_shipdate = o_orderdate; CREATE TABLE `orders_list_partition` ( `o_orderkey` BIGINT not NULL, `o_custkey` INT NULL, `o_orderstatus` VARCHAR(1) NULL, `o_totalprice` DECIMAL(15, 2) NULL, `o_orderpriority` VARCHAR(15) NULL, `o_clerk` VARCHAR(15) NULL, `o_shippriority` INT NULL, `o_comment` VARCHAR(79) NULL, `o_orderdate` DATE NULL ) ENGINE=OLAP DUPLICATE KEY(`o_orderkey`, `o_custkey`) COMMENT 'OLAP' PARTITION BY list(o_orderkey) ( PARTITION p1 VALUES in ('1'), PARTITION p2 VALUES in ('2'), PARTITION p3 VALUES in ('3'), PARTITION p4 VALUES in ('4') ) DISTRIBUTED BY HASH(`o_orderkey`) BUCKETS 3 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE `lineitem_list_partition` ( `l_orderkey` BIGINT not NULL, `l_linenumber` INT NULL, `l_partkey` INT NULL, `l_suppkey` INT NULL, `l_quantity` DECIMAL(15, 2) NULL, `l_extendedprice` DECIMAL(15, 2) NULL, `l_discount` DECIMAL(15, 2) NULL, `l_tax` DECIMAL(15, 2) NULL, `l_returnflag` VARCHAR(1) NULL, `l_linestatus` VARCHAR(1) NULL, `l_commitdate` DATE NULL, `l_receiptdate` DATE NULL, `l_shipinstruct` VARCHAR(25) NULL, `l_shipmode` VARCHAR(10) NULL, `l_comment` VARCHAR(44) NULL, `l_shipdate` DATE NULL ) ENGINE=OLAP DUPLICATE KEY(l_orderkey, l_linenumber, l_partkey, l_suppkey ) COMMENT 'OLAP' PARTITION BY list(l_orderkey) ( PARTITION p1 VALUES in ('1'), PARTITION p2 VALUES in ('2'), PARTITION p3 VALUES in ('3') ) DISTRIBUTED BY HASH(`l_orderkey`) BUCKETS 3 PROPERTIES ("replication_num" = "1" ); ### Release note Fix partition track column fail when 'select *' used in related partition side
…elated partition side (#43531) ### What problem does this PR solve? Related PR: #34781 Problem Summary: Table def as following, if create partition mv as following will throw exception ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to find a suitable base table for partitioning, the fail reason is can't not find valid partition track column CREATE MATERIALIZED VIEW mv_10086 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(l_orderkey) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l1.*, O_CUSTKEY from lineitem_list_partition l1 left outer join orders_list_partition on l1.l_shipdate = o_orderdate; CREATE TABLE `orders_list_partition` ( `o_orderkey` BIGINT not NULL, `o_custkey` INT NULL, `o_orderstatus` VARCHAR(1) NULL, `o_totalprice` DECIMAL(15, 2) NULL, `o_orderpriority` VARCHAR(15) NULL, `o_clerk` VARCHAR(15) NULL, `o_shippriority` INT NULL, `o_comment` VARCHAR(79) NULL, `o_orderdate` DATE NULL ) ENGINE=OLAP DUPLICATE KEY(`o_orderkey`, `o_custkey`) COMMENT 'OLAP' PARTITION BY list(o_orderkey) ( PARTITION p1 VALUES in ('1'), PARTITION p2 VALUES in ('2'), PARTITION p3 VALUES in ('3'), PARTITION p4 VALUES in ('4') ) DISTRIBUTED BY HASH(`o_orderkey`) BUCKETS 3 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE `lineitem_list_partition` ( `l_orderkey` BIGINT not NULL, `l_linenumber` INT NULL, `l_partkey` INT NULL, `l_suppkey` INT NULL, `l_quantity` DECIMAL(15, 2) NULL, `l_extendedprice` DECIMAL(15, 2) NULL, `l_discount` DECIMAL(15, 2) NULL, `l_tax` DECIMAL(15, 2) NULL, `l_returnflag` VARCHAR(1) NULL, `l_linestatus` VARCHAR(1) NULL, `l_commitdate` DATE NULL, `l_receiptdate` DATE NULL, `l_shipinstruct` VARCHAR(25) NULL, `l_shipmode` VARCHAR(10) NULL, `l_comment` VARCHAR(44) NULL, `l_shipdate` DATE NULL ) ENGINE=OLAP DUPLICATE KEY(l_orderkey, l_linenumber, l_partkey, l_suppkey ) COMMENT 'OLAP' PARTITION BY list(l_orderkey) ( PARTITION p1 VALUES in ('1'), PARTITION p2 VALUES in ('2'), PARTITION p3 VALUES in ('3') ) DISTRIBUTED BY HASH(`l_orderkey`) BUCKETS 3 PROPERTIES ("replication_num" = "1" ); ### Release note Fix partition track column fail when 'select *' used in related partition side
…elated partition side (#43531) ### What problem does this PR solve? Related PR: #34781 Problem Summary: Table def as following, if create partition mv as following will throw exception ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to find a suitable base table for partitioning, the fail reason is can't not find valid partition track column CREATE MATERIALIZED VIEW mv_10086 BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by(l_orderkey) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l1.*, O_CUSTKEY from lineitem_list_partition l1 left outer join orders_list_partition on l1.l_shipdate = o_orderdate; CREATE TABLE `orders_list_partition` ( `o_orderkey` BIGINT not NULL, `o_custkey` INT NULL, `o_orderstatus` VARCHAR(1) NULL, `o_totalprice` DECIMAL(15, 2) NULL, `o_orderpriority` VARCHAR(15) NULL, `o_clerk` VARCHAR(15) NULL, `o_shippriority` INT NULL, `o_comment` VARCHAR(79) NULL, `o_orderdate` DATE NULL ) ENGINE=OLAP DUPLICATE KEY(`o_orderkey`, `o_custkey`) COMMENT 'OLAP' PARTITION BY list(o_orderkey) ( PARTITION p1 VALUES in ('1'), PARTITION p2 VALUES in ('2'), PARTITION p3 VALUES in ('3'), PARTITION p4 VALUES in ('4') ) DISTRIBUTED BY HASH(`o_orderkey`) BUCKETS 3 PROPERTIES ( "replication_num" = "1" ); CREATE TABLE `lineitem_list_partition` ( `l_orderkey` BIGINT not NULL, `l_linenumber` INT NULL, `l_partkey` INT NULL, `l_suppkey` INT NULL, `l_quantity` DECIMAL(15, 2) NULL, `l_extendedprice` DECIMAL(15, 2) NULL, `l_discount` DECIMAL(15, 2) NULL, `l_tax` DECIMAL(15, 2) NULL, `l_returnflag` VARCHAR(1) NULL, `l_linestatus` VARCHAR(1) NULL, `l_commitdate` DATE NULL, `l_receiptdate` DATE NULL, `l_shipinstruct` VARCHAR(25) NULL, `l_shipmode` VARCHAR(10) NULL, `l_comment` VARCHAR(44) NULL, `l_shipdate` DATE NULL ) ENGINE=OLAP DUPLICATE KEY(l_orderkey, l_linenumber, l_partkey, l_suppkey ) COMMENT 'OLAP' PARTITION BY list(l_orderkey) ( PARTITION p1 VALUES in ('1'), PARTITION p2 VALUES in ('2'), PARTITION p3 VALUES in ('3') ) DISTRIBUTED BY HASH(`l_orderkey`) BUCKETS 3 PROPERTIES ("replication_num" = "1" ); ### Release note Fix partition track column fail when 'select *' used in related partition side
Proposed changes
Fix getting related partition table wrongly when multi base partition table exists
such as base table def is as following:
if you create partition mv which partition by
t1.upgrade_day
as following it will be successfulFurther comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...