Answer Query Using Materialized Views #298

avamingli · 2023-11-14T02:45:43Z

AQUMV for short, is used to compute part or all of a Query from materialized views during planning.
It could provide massive improvements in query processing time, especially for aggregation queries over large tables[1].

AQUMV usually uses Incremental Materialized Views(IMV) as candidates, as IMV usually have the up-to-date data when there are writable operations on related tables.

Example:

create table aqumv_t1(c1 int, c2 int, c3 int) distributed by (c1);
insert into aqumv_t1 select i, i+1, i+2 from generate_series(1, 100000000) i;
analyze aqumv_t1;
gpadmin=# explain(costs off)
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
                                                 QUERY PLAN

----------------------------------------------------------------------------------------------------
---------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on aqumv_t1
         Filter: ((c1 > 30) AND (c1 < 40) AND (sqrt((abs(c2))::double precision) > '5.8'::double pre
cision))
 Optimizer: Postgres query optimizer
(4 rows)

select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
       sqrt
-------------------
                 6
 6.082762530298219
 6.244997998398398
 5.916079783099616
 6.164414002968976
 6.324555320336759
 5.830951894845301
(7 rows)

Time: 7384.329 ms (00:07.384)

Answer Query Using Materialized Views:

create materialized view mvt1 as
  select c1 as mc1, c2 as mc2, abs(c2) as mc3, abs(abs(c2) - c1 - 1) as mc4
  from aqumv_t1 where c1 > 30 and c1 < 40;
analyze mvt1;

set answer_query_using_materialized_views = on;
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
       sqrt
-------------------
                 6
 6.082762530298219
 6.244997998398398
 5.830951894845301
 5.916079783099616
 6.164414002968976
 6.324555320336759
(7 rows)

Time: 45.701 ms

gpadmin=# explain(verbose, costs off)
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: (sqrt(((mc4 + mc3))::double precision))
   ->  Seq Scan on public.mvt1
         Output: sqrt(((mc4 + mc3))::double precision)
         Filter: (sqrt((mvt1.mc3)::double precision) > '5.8'::double precision)
 Settings: answer_query_using_materialized_views = 'on', optimizer = 'off'
 Optimizer: Postgres query optimizer
(7 rows)

This perfect example shows AQUMV's magic,
The mv has the quals : c1 > 30 and c1 < 40 from table aqumv_t1, and the rows we want to query c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8.

It means that all rows in mv meet the requirement c1 > 30 and c1 < 40 and mvt1 has column mc3 corresponding to abs(c2) in aqumv_t1, so that adding a qual sqrt(mc3) > 5.8 to mvt1 will filter all rows we want.

And the target list we want is sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) while mvt1 has the column : abs(abs(c2) - c1 - 1) AS mc4, abs(c2) AS mc3, we could compute the target expression from mvt1:
sqrt(mc4 + mc3).

And the query :

select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;

could be rewritten to:

select sqrt(mc4 + mc3) from mvt1 where sqrt(mc3) > 5.8;

This example shows AQUMV has significant improvements Time: 7384.329 ms (00:07.384) -> Time: 45.701 ms, not only the rows are reduced to the corresponding results, but also expressions cloud be eliminated for each row.

See more in README.cbdb.aqumv and the codes and reference[1].
And internal talk in feishu(Chinese): https://hashdata.feishu.cn/minutes/obcn419j6v19e47snk2pfj6e
slide: https://hashdata.feishu.cn/file/GeMBbVRMNowL7Cxh52acWJD6n0e

AQUMV is actually a Equivalent Transformation on Query tree.
A materialized view(MV) could be use to compute a Query if:

The view contains all rows needed by the query expression.
If MV has more rows than query wants, additional filter may be addedif possible.
All output expressions can be computed from the output of the view.
The output expressions could be fully or partially matched from MV's target list.
Cost-based.
There may be multiple valid MV candidates, or select from MV is not better than select from origin table(ex: has an index and etc), let planner decide the best one.

Construct rows by splitting MV query quals(mv_query_quals) and Query quals (origin_query_quals) to difference set and intersection set. And post_quals formed by:{origin_query_quals - mv_query_quals} will be processed by MV query's target list, and rewritten to MV relation's target list expressions.

Construct columns expressions using a Greedy Algorithm. Sort the MV query's target list by complexity, and try to rewrite expressions by that order.
Expressions that have no Vars are kept to upper(Const Expressions) or be rewritten if there were corresponding expressions.

This pr is a start of AQUMV, for MVP0:
Only support SELECT FROM a single relation both for mv_query and the origin_query.
Below are not supported now:

AGG
Subquery
Order by(for origin_query)
Join
Sublink
Group by
Window Functions
CTE
Distinct On
Refresh Materialized View
Create AS

Reference:
[1] Optimizing Queries Using Materialized Views: A Practical,
Scalable Solution.
https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf

Authored-by: Zhang Mingli [email protected]

fix #ISSUE_Number

Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
Learn the coding contribution guide, including our code conventions, workflow and more.
List your communication in the GitHub Issues or Discussions (if has or needed).
Document changes.
Add tests for the change
Pass make installcheck
Pass make -C src/test installcheck-cbdb-parallel
Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

avamingli · 2023-11-14T02:50:53Z

FYI: This pr is based on Incremental Materialized Views(IMV), should wait for #280 merged first, and a FIXME:
/* AQUMV_FIXME_MVP: TOOD use Incremental Materialized View. */ resolved.

src/backend/optimizer/plan/planner.c

src/test/regress/sql/aqumv.sql

src/backend/optimizer/plan/planner.c

yjhjstz · 2023-11-17T03:01:30Z

Datalog program:

Q(T,Y,D) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"
                   Director(I,D),Actor(I,D)

V2 (I,T,Y) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"

V3(I,D) :−Director(I,D),Actor(ID,D)

V2 and V3 are useful for answering Q.
Q'(T,Y,D) :− V2(I,T,Y), V3(I,D)
How do we express this case ?

avamingli · 2023-11-19T13:04:57Z

Datalog program:
Q(T,Y,D) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"
                   Director(I,D),Actor(I,D)

V2 (I,T,Y) :− Movie(I,T,Y,G),Y ≥1950,G ="comedy"

V3(I,D) :−Director(I,D),Actor(ID,D)
V2 and V3 are useful for answering Q. Q'(T,Y,D) :− V2(I,T,Y), V3(I,D) How do we express this case ?

We have an offline discussion, it's not related to this MVP0, may be considered in the future.
MVP0 has the ability to process multiple mv candidates but only for one relation, see more in READEME.cbdb.aqumv

src/backend/optimizer/plan/planner.c

src/backend/optimizer/README.cbdb.aqumv

src/backend/optimizer/plan/planner.c

AQUMV for short, is used to compute part or all of a Query from materialized views during planning. It could provide massive improvements in query processing time, especially for aggregation queries over large tables[1]. AQUMV usually uses Incremental Materialized Views(IMV) as candidates, as IMV usually have real time data when there are writable operations on related tables. AQUMV is actually a Equivalent Transformation on Query tree. A materialized view(MV) could be use to compute a Query if: 1.The view contains all rows needed by the query expression. If MV has more rows than query wants, additional filter may be added if possible. 2.All output expressions can be computed from the output of the view. The output expressions could be fully or partially matched from MV's target list. 3.Cost-based. There may be multiple valid MV candidates, or select from MV is not better than select from origin table(ex: has an index and etc), let planner decide the best one. Construct rows by splitting MV query quals(mv_query_quals) and Query quals (origin_query_quals) to difference set and intersection set. And post_quals formed by:{origin_query_quals - mv_query_quals} will be processed by MV query's target list, and rewritten to MV relation's target list expressions. Construct columns expressions using a Greedy Algorithm. Sort the MV query's target list by complexity, and try to rewrite expressions by that order. Expressions that have no Vars are kept to upper(Const Expressions) or be rewritten if there were corresponding expressions. Reference: [1] Optimizing Queries Using Materialized Views: A Practical, Scalable Solution. https://courses.cs.washington.edu/courses/cse591d/01sp/opt_views.pdf Authored-by: Zhang Mingli [email protected]

yjhjstz · 2023-12-01T04:03:06Z

LGTM

avamingli self-assigned this Nov 14, 2023

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 0e71814 to a7f459e Compare November 14, 2023 02:55

avamingli requested review from yjhjstz and my-ship-it November 14, 2023 03:24

yjhjstz reviewed Nov 16, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Show resolved Hide resolved

yjhjstz reviewed Nov 16, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved

src/test/regress/sql/aqumv.sql Show resolved Hide resolved

yjhjstz reviewed Nov 16, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Show resolved Hide resolved

yjhjstz reviewed Nov 16, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from a7f459e to 2e06466 Compare November 19, 2023 13:40

yjhjstz reviewed Nov 23, 2023

View reviewed changes

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 2e06466 to b584b5f Compare November 23, 2023 07:05

yjhjstz reviewed Nov 23, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Show resolved Hide resolved

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from b584b5f to e977d0e Compare November 24, 2023 05:58

yjhjstz reviewed Nov 24, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from e977d0e to 4078ac3 Compare November 24, 2023 07:33

yjhjstz reviewed Nov 29, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Show resolved Hide resolved

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 4078ac3 to 9c2951c Compare December 1, 2023 03:01

yjhjstz reviewed Dec 1, 2023

View reviewed changes

src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved

avamingli force-pushed the answer_query_using_materialized_views_mvp0 branch from 9c2951c to 7522c1c Compare December 1, 2023 03:50

yjhjstz approved these changes Dec 1, 2023

View reviewed changes

my-ship-it merged commit 2ffc7e6 into apache:main Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Answer Query Using Materialized Views #298

Answer Query Using Materialized Views #298

avamingli commented Nov 14, 2023 •

edited

Loading

avamingli commented Nov 14, 2023

yjhjstz commented Nov 17, 2023

avamingli commented Nov 19, 2023

yjhjstz commented Dec 1, 2023

Answer Query Using Materialized Views #298

Answer Query Using Materialized Views #298

Conversation

avamingli commented Nov 14, 2023 • edited Loading

Change logs

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Contributor's Checklist

avamingli commented Nov 14, 2023

yjhjstz commented Nov 17, 2023

avamingli commented Nov 19, 2023

yjhjstz commented Dec 1, 2023

avamingli commented Nov 14, 2023 •

edited

Loading