Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airframe-sql: Invalid aggregation key propagation #2646

Closed
xerial opened this issue Dec 17, 2022 · 0 comments
Closed

airframe-sql: Invalid aggregation key propagation #2646

xerial opened this issue Dec 17, 2022 · 0 comments
Assignees
Labels

Comments

@xerial
Copy link
Member

xerial commented Dec 17, 2022

I've found this issue since airframe-sql 22.12.3. I'm working on the fix.
cc: @takezoe

[original]
select name, count(*) cnt from (
  select id, arbitrary(name) name from A
  group by 1
)
group by 1


[resolved]
SELECT arbitrary(name) AS name, count(*) AS cnt   <--------- The first selectItem should be just `name`
FROM (SELECT id, arbitrary(name) AS name FROM default.A GROUP BY id) 
GROUP BY arbitrary(name)  <------------- This also needs to be just `name`

[original plan]
[Aggregate]: (id:?, name:?) => (name:?, cnt:?)
  - SingleColumn(Id(name))
  - SingleColumn(FunctionCall(count, AllColumns(*), distinct:false, window:None) as cnt)
  - GroupingKey(Literal(1),Some(NodeLocation(5,10)))
  [Aggregate]:  => (id:?, name:?)
    - SingleColumn(Id(id))
    - SingleColumn(FunctionCall(arbitrary, Id(name), distinct:false, window:None) as name)
    - GroupingKey(Literal(1),Some(NodeLocation(3,12)))
    [TableRef]
      - A

[resolved plan]
[Aggregate]: (id:long, name:?) => (name:?, cnt:?)
  - SingleColumn(SingleColumn(FunctionCall(arbitrary, name:string <- [A.name], distinct:false, window:None) as name))
  - SingleColumn(FunctionCall(count, AllColumns(), distinct:false, window:None) as cnt)
  - GroupingKey(FunctionCall(arbitrary, name:string <- [A.name], distinct:false, window:None),Some(NodeLocation(2,14)))
  [Aggregate]: (id:long, name:string) => (id:long, name:?)
    - SingleColumn(id:long <- [A.id])
    - SingleColumn(FunctionCall(arbitrary, name:string <- [A.name], distinct:false, window:None) as name)
    - GroupingKey(id:long <- [A.id],None)
    [TableScan]:  => (id:long, name:string)

A problem is that resolveAggregateKey rule wrongly replaces Id(name) into FunctionCall(arbitrary, ...), which can be found in the subquery's outputAttributes:

transformed with resolveAggregationKeys:
[before]
[Aggregate]: (id:?, name:?) => (name:?, cnt:?)
  - SingleColumn(Id(name))
  - SingleColumn(FunctionCall(count, AllColumns(*), distinct:false, window:None) as cnt)
  - GroupingKey(Id(name),Some(NodeLocation(5,10)))
  [Aggregate]:  => (id:?, name:?)
    - SingleColumn(Id(id))
    - SingleColumn(FunctionCall(arbitrary, Id(name), distinct:false, window:None) as name)
    - GroupingKey(Id(id),Some(NodeLocation(3,12)))
    [TableRef]
      - A

[after]
[Aggregate]: (id:long, name:?) => (name:?, cnt:?)
  - SingleColumn(Id(name))
  - SingleColumn(FunctionCall(count, AllColumns(*), distinct:false, window:None) as cnt)
  - GroupingKey(FunctionCall(arbitrary, name:string <- [A.name], distinct:false, window:None),Some(NodeLocation(2,14))) <--- wrong replacement 
  [Aggregate]: (id:long, name:string) => (id:long, name:?)
    - SingleColumn(id:long <- [A.id])
    - SingleColumn(FunctionCall(arbitrary, name:string <- [A.name], distinct:false, window:None) as name)
    - GroupingKey(id:long <- [A.id],None)
    [TableScan]:  => (id:long, name:string)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant