-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](mtmv) pick some mtmv pr from master #37651
Merged
morrySnow
merged 6 commits into
apache:branch-2.1
from
seawinde:suport_use_mv_dimension_when_query_distinct
Jul 12, 2024
Merged
[feature](mtmv) pick some mtmv pr from master #37651
morrySnow
merged 6 commits into
apache:branch-2.1
from
seawinde:suport_use_mv_dimension_when_query_distinct
Jul 12, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…e function is distinct (apache#36318) ## Proposed changes This extend the query rewrite by materialized view ability For example mv def is > CREATE MATERIALIZED VIEW mv1 > BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ('replication_num' = '1') > AS > select > count(o_totalprice), > o_shippriority, > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > o_shippriority, > bin(o_orderkey); the query as following can be rewritten by materialized view successfully though `sum(distinct o_shippriority)` in query is not appear in mv output, but query aggregate function is distinct and it use the group by dimension in mv, in this scene, the `sum(distinct o_shippriority)` can use mv group dimension `o_shippriority` directly and the result is true. Suppport the following distinct aggregate function currently, others are supported in the furture on demand - max(distinct arg) - min(distinct arg) - sum(distinct arg) - avg(distinct arg) - count(distinct arg) > select > count(o_totalprice), > max(distinct o_shippriority), > min(distinct o_shippriority), > avg(distinct o_shippriority), > sum(distinct o_shippriority) / count(distinct o_shippriority) > o_orderstatus, > bin(o_orderkey) > from orders > group by > o_orderstatus, > bin(o_orderkey);
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
…async mv (apache#36111) Support to use current_date() when create async materialized view by adding 'enable_nondeterministic_function' = 'true' in properties when create materialized view. `enable_nondeterministic_function` is default false. Here is a example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp(k3, '%Y-%m-%d %H:%i-%s') from ${tableName} where current_date() > k3; Note: unix_timestamp is nondeterministic when has no params. it is deterministic when has params which means format column k3 date another example, it will success > CREATE MATERIALIZED VIEW mv_name > BUILD DEFERRED REFRESH AUTO ON MANUAL > DISTRIBUTED BY RANDOM BUCKETS 2 > PROPERTIES ( > 'replication_num' = '1', > 'enable_nondeterministic_function' = 'true' > ) > AS > SELECT *, unix_timestamp() from ${tableName} where current_date() > k3; though unix_timestamp() is nondeterministic, we add 'enable_date_nondeterministic_function' = 'true' in properties
…by (apache#36175) This is brought by apache#35562 At the pr above when you create partition materialized view as following, which would fail with the message: Unable to find a suitable base table for partitioning CREATE MATERIALIZED VIEW mvName BUILD IMMEDIATE REFRESH AUTO ON MANUAL PARTITION BY (date_trunc(month_alias, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ( 'replication_num' = '1' ) AS SELECT date_trunc(`k2`,'day') AS month_alias, k3, count(*) FROM tableName GROUP BY date_trunc(`k2`,'day'), k3; This pr supports to create partition materialized view when `date_trunc` in group by cluause.
… rewrite by partition rolled up mv (apache#36414) This is brought by apache#35562 When mv is partition rolled up mv, which is rolled up by date_trunc. If base table add new partition. if query rewrite successfully by the partition mv, the data will lost the new partition data. This pr fix this problem. For example as following: mv def is: CREATE MATERIALIZED VIEW roll_up_mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL partition by (date_trunc(`col1`, 'month')) DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey; if run the insert comand insert into lineitem values (1, 2, 3, 4, 5.5, 6.5, 7.5, 8.5, 'o', 'k', '2023-11-21', '2023-11-21', '2023-11-21', 'a', 'b', 'yyyyyyyyy'); then run query as following, result will not return the 2023-11-21 partition data select date_trunc(`l_shipdate`, 'day') as col1, l_shipdate, o_orderdate, l_partkey, l_suppkey, sum(o_totalprice) as sum_total from lineitem left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate group by col1, l_shipdate, o_orderdate, l_partkey, l_suppkey;
…stability (apache#36770) When union rewrite by materialized view, the final plan chosen by CBO is instability. So the regression test only check mv is rewritten successful or not, doesn't check is chosen by CBO or not. Optimize to make sure chosen by CBO would be anther pr to fix this thoroughly。
…n, because low level mv aggregate roll up (apache#36567) Query is aggregate, the query group by expression is less than materialzied view group by expression. when the more dimensions than queries in materialzied view can be eliminated with functional dependencies. it can be rewritten with out roll up aggregate. For example as following: mv def is CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH AUTO ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 2 PROPERTIES ('replication_num' = '1') AS select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, ps_partkey; query is as following: select l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey, cast( sum( IFNULL(ps_suppkey, 0) * IFNULL(ps_partkey, 0) ) as decimal(28, 8) ) as agg2 from lineitem_1 inner join orders_1 on lineitem_1.l_orderkey = orders_1.o_orderkey inner join partsupp_1 on l_partkey = partsupp_1.ps_partkey and l_suppkey = partsupp_1.ps_suppkey where partsupp_1.ps_suppkey > 1 group by l_orderkey, l_partkey, l_suppkey, o_orderkey, o_custkey; we can see that query doesn't use `ps_partkey` which is in mv group by expression. Normally will add roll up aggragate on materialized view if the gorup by dimension in mv is mucher than query group by dimension. And, in this scane we can get the function dependency on `l_suppkey = ps_suppkey `. and we doesn't need to add roll up aggregate on materialized view in rewritten plan. this improve performance and is beneficial for nest materialized view rewrite.
run buildall |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
cherry-pick to 2.1
pr: #36318
commitId: c199947
pr: #36111
commitId: 35ebef6
pr: #36175
commitId: 4c8e66b
pr: #36414
commitId: 5e009b5
pr: #36770
commitId: 19e2126
pr: #36567
commitId: 3da8351