Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): Introduced fold_for_prune to optimize index prune #17533

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Feb 27, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

For any injective function y = f(x), the eq(column, inverse_f(const)) is a necessary condition for eq(f(column), const).
For example, the result of col = '+1'::int contains col::string = '+1'.

part of #13408

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@forsaken628 forsaken628 changed the title feat(WIP): try_eliminate_cast feat(query): Introduced fold_for_prune to optimize index prune Feb 28, 2025
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Feb 28, 2025
@forsaken628 forsaken628 marked this pull request as ready for review February 28, 2025 11:59
@forsaken628 forsaken628 requested review from sundy-li and b41sh and removed request for sundy-li March 1, 2025 07:49
return true;
}

// Ignore FunctionDomain::MayThrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see any codes to match the comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment means that Ignore FunctionDomain::MayThrow is a precondition for is_injective_cast. It will be refined later.

true
}
(DataType::Number(src), DataType::Decimal(_)) if src.is_integer() => true,
(DataType::Number(_), DataType::String) => true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check that string literal matches [1-9][0-9]*

raw expr : eq(CAST(x::Int16 AS String), '+1')
checked expr : eq<String, String>(to_string<Int16>(x), "+1")
input domain : {0: Number(Int16(SimpleDomain { min: 1, max: 100 }))}
optimized expr : eq<Int16, Int16>(x, 1_i16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be false.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fold_for_prune is similar to bloom-filter in that the expressions generated are all necessary but not sufficient conditions.

In fact most of the above tests are necessary but not sufficient, e.g. the sufficient condition for to_int8(bool) = 2 is false because the result of the original expression is the empty set, and only the empty set is a subset of the empty set.

@@ -395,7 +395,7 @@ impl BloomIndex {
)?;

let (new_expr, _) =
ConstantFolder::fold_with_domain(&expr, &domains, &self.func_ctx, &BUILTIN_FUNCTIONS);
ConstantFolder::fold_for_prune(&expr, &domains, &self.func_ctx, &BUILTIN_FUNCTIONS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like fold_for_prune is meant to rewrite the equal expression with a cast column function, like col::dest_type = val as col = val::src_type, which we can rewrite directly inside the visit_expr_column_eq_constant function and don't need redefine fold_for_prune function. If val::src_type fails, the result must be false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants