-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Answer Query Using Materialized Views #298
Answer Query Using Materialized Views #298
Conversation
FYI: This pr is based on Incremental Materialized Views(IMV), should wait for #280 merged first, and a FIXME: |
0e71814
to
a7f459e
Compare
Datalog program: Q(T,Y,D) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"
Director(I,D),Actor(I,D)
V2 (I,T,Y) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"
V3(I,D) :−Director(I,D),Actor(ID,D)
V2 and V3 are useful for answering Q. |
We have an offline discussion, it's not related to this MVP0, may be considered in the future. |
a7f459e
to
2e06466
Compare
2e06466
to
b584b5f
Compare
b584b5f
to
e977d0e
Compare
e977d0e
to
4078ac3
Compare
4078ac3
to
9c2951c
Compare
AQUMV for short, is used to compute part or all of a Query from materialized views during planning. It could provide massive improvements in query processing time, especially for aggregation queries over large tables[1]. AQUMV usually uses Incremental Materialized Views(IMV) as candidates, as IMV usually have real time data when there are writable operations on related tables. AQUMV is actually a Equivalent Transformation on Query tree. A materialized view(MV) could be use to compute a Query if: 1.The view contains all rows needed by the query expression. If MV has more rows than query wants, additional filter may be added if possible. 2.All output expressions can be computed from the output of the view. The output expressions could be fully or partially matched from MV's target list. 3.Cost-based. There may be multiple valid MV candidates, or select from MV is not better than select from origin table(ex: has an index and etc), let planner decide the best one. Construct rows by splitting MV query quals(mv_query_quals) and Query quals (origin_query_quals) to difference set and intersection set. And post_quals formed by:{origin_query_quals - mv_query_quals} will be processed by MV query's target list, and rewritten to MV relation's target list expressions. Construct columns expressions using a Greedy Algorithm. Sort the MV query's target list by complexity, and try to rewrite expressions by that order. Expressions that have no Vars are kept to upper(Const Expressions) or be rewritten if there were corresponding expressions. Reference: [1] Optimizing Queries Using Materialized Views: A Practical, Scalable Solution. https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf Authored-by: Zhang Mingli [email protected]
9c2951c
to
7522c1c
Compare
LGTM |
AQUMV for short, is used to compute part or all of a Query from materialized views during planning.
It could provide massive improvements in query processing time, especially for aggregation queries over large tables[1].
AQUMV usually uses Incremental Materialized Views(IMV) as candidates, as IMV usually have the up-to-date data when there are writable operations on related tables.
Example:
Answer Query Using Materialized Views:
This perfect example shows AQUMV's magic,
The mv has the quals :
c1 > 30 and c1 < 40
from table aqumv_t1, and the rows we want to queryc1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8
.It means that all rows in mv meet the requirement
c1 > 30 and c1 < 40
and mvt1 has columnmc3
corresponding toabs(c2)
in aqumv_t1, so that adding a qualsqrt(mc3) > 5.8
to mvt1 will filter all rows we want.And the target list we want is
sqrt(abs(abs(c2) - c1 - 1) + abs(c2))
while mvt1 has the column :abs(abs(c2) - c1 - 1) AS mc4
,abs(c2) AS mc3,
we could compute the target expression from mvt1:sqrt(mc4 + mc3)
.And the query :
could be rewritten to:
This example shows AQUMV has significant improvements
Time: 7384.329 ms (00:07.384)
->Time: 45.701 ms
, not only the rows are reduced to the corresponding results, but also expressions cloud be eliminated for each row.See more in README.cbdb.aqumv and the codes and reference[1].
And internal talk in feishu(Chinese): https://hashdata.feishu.cn/minutes/obcn419j6v19e47snk2pfj6e
slide: https://hashdata.feishu.cn/file/GeMBbVRMNowL7Cxh52acWJD6n0e
AQUMV is actually a Equivalent Transformation on Query tree.
A materialized view(MV) could be use to compute a Query if:
If MV has more rows than query wants, additional filter may be addedif possible.
The output expressions could be fully or partially matched from MV's target list.
There may be multiple valid MV candidates, or select from MV is not better than select from origin table(ex: has an index and etc), let planner decide the best one.
Construct rows by splitting MV query quals(mv_query_quals) and Query quals (origin_query_quals) to difference set and intersection set. And post_quals formed by:{origin_query_quals - mv_query_quals} will be processed by MV query's target list, and rewritten to MV relation's target list expressions.
Construct columns expressions using a Greedy Algorithm. Sort the MV query's target list by complexity, and try to rewrite expressions by that order.
Expressions that have no Vars are kept to upper(Const Expressions) or be rewritten if there were corresponding expressions.
This pr is a start of AQUMV, for MVP0:
Only support SELECT FROM a single relation both for mv_query and the origin_query.
Below are not supported now:
Reference:
[1] Optimizing Queries Using Materialized Views: A Practical,
Scalable Solution.
https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf
Authored-by: Zhang Mingli [email protected]
fix #ISSUE_Number
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel
cloudberrydb/dev
team for review and approval when your PR is ready🥳