[feature](mtmv) pick some mtmv pr from master #37651

seawinde · 2024-07-11T06:49:00Z

Proposed changes

cherry-pick to 2.1
pr: #36318
commitId: c199947

pr: #36111
commitId: 35ebef6

pr: #36175
commitId: 4c8e66b

pr: #36414
commitId: 5e009b5

pr: #36770
commitId: 19e2126

pr: #36567
commitId: 3da8351

…e function is distinct (apache#36318) ## Proposed changes This extend the query rewrite by materialized view ability For example mv def is > CREATE MATERIALIZED VIEW mv1 > BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ('replication_num' = '1') > AS > select > count(o_totalprice), > o_shippriority, > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > o_shippriority, > bin(o_orderkey); the query as following can be rewritten by materialized view successfully though `sum(distinct o_shippriority)` in query is not appear in mv output, but query aggregate function is distinct and it use the group by dimension in mv, in this scene, the `sum(distinct o_shippriority)` can use mv group dimension `o_shippriority` directly and the result is true. Suppport the following distinct aggregate function currently, others are supported in the furture on demand - max(distinct arg) - min(distinct arg) - sum(distinct arg) - avg(distinct arg) - count(distinct arg) > select > count(o_totalprice), > max(distinct o_shippriority), > min(distinct o_shippriority), > avg(distinct o_shippriority), > sum(distinct o_shippriority) / count(distinct o_shippriority) > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > bin(o_orderkey);

doris-robot · 2024-07-11T06:49:05Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

seawinde · 2024-07-11T06:49:52Z

run buildall

…async mv (apache#36111) Support to use current_date() when create async materialized view by adding 'enable_nondeterministic_function' = 'true' in properties when create materialized view. `enable_nondeterministic_function` is default false. Here is a example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp(k3, '%Y-%m-%d %H:%i-%s') from ${tableName} where current_date() > k3; Note: unix_timestamp is nondeterministic when has no params. it is deterministic when has params which means format column k3 date another example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp() from ${tableName} where current_date() > k3; though unix_timestamp() is nondeterministic, we add 'enable_date_nondeterministic_function' = 'true' in properties

…by (apache#36175) This is brought by apache#35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.

… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;

…stability (apache#36770) When union rewrite by materialized view, the final plan chosen by CBO is instability. So the regression test only check mv is rewritten successful or not, doesn't check is chosen by CBO or not. Optimize to make sure chosen by CBO would be anther pr to fix this thoroughly。

…n, because low level mv aggregate roll up (apache#36567) Query is aggregate, the query group by expression is less than materialzied view group by expression. when the more dimensions than queries in materialzied view can be eliminated with functional dependencies. it can be rewritten with out roll up aggregate. For example as following: mv def is CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey; query is as following: select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey; we can see that query doesn't use `ps_partkey` which is in mv group by expression. Normally will add roll up aggragate on materialized view if the gorup by dimension in mv is mucher than query group by dimension. And, in this scane we can get the function dependency on `l_suppkey = ps_suppkey `. and we doesn't need to add roll up aggregate on materialized view in rewritten plan. this improve performance and is beneficial for nest materialized view rewrite.

seawinde · 2024-07-11T07:07:54Z

run buildall

seawinde added 5 commits July 11, 2024 15:00

morrySnow changed the title ~~[feature](mtmv) Support to use mv group dimension when query aggregate function is distinct (#36318)~~ [feature](mtmv) pick some mtmv pr from master Jul 12, 2024

morrySnow merged commit ffa9e49 into apache:branch-2.1 Jul 12, 2024
22 of 23 checks passed

yiguolei mentioned this pull request Jul 19, 2024

2.1.5 Release Notes #38111

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature](mtmv) pick some mtmv pr from master #37651

[feature](mtmv) pick some mtmv pr from master #37651

seawinde commented Jul 11, 2024 •

edited

Loading

doris-robot commented Jul 11, 2024

seawinde commented Jul 11, 2024

seawinde commented Jul 11, 2024

[feature](mtmv) pick some mtmv pr from master #37651

[feature](mtmv) pick some mtmv pr from master #37651

Conversation

seawinde commented Jul 11, 2024 • edited Loading

Proposed changes

doris-robot commented Jul 11, 2024

seawinde commented Jul 11, 2024

seawinde commented Jul 11, 2024

seawinde commented Jul 11, 2024 •

edited

Loading