-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature](Cloud) Support session variable disable_file_cache and enable_segment_cache in query #37141
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
2955ce5
to
39bcd74
Compare
clang-tidy review says "All clean, LGTM! 👍" |
3 similar comments
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
e609c4e
to
d31abdf
Compare
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
TPC-H: Total hot run time: 39896 ms
|
d31abdf
to
d878919
Compare
clang-tidy review says "All clean, LGTM! 👍" |
TPC-DS: Total hot run time: 172155 ms
|
d878919
to
d544243
Compare
ClickBench: Total hot run time: 30.85 s
|
clang-tidy review says "All clean, LGTM! 👍" |
d544243
to
09e731c
Compare
clang-tidy review says "All clean, LGTM! 👍" |
09e731c
to
66dedeb
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 39752 ms
|
TPC-DS: Total hot run time: 172565 ms
|
ClickBench: Total hot run time: 30 s
|
run buildall |
TPC-H: Total hot run time: 39910 ms
|
TPC-DS: Total hot run time: 173427 ms
|
ClickBench: Total hot run time: 30.47 s
|
run p0 |
run feut |
run buildall |
PR approved by at least one committer and no changes requested. |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 41916 ms
|
TPC-DS: Total hot run time: 169551 ms
|
ClickBench: Total hot run time: 30.39 s
|
run buildall |
TPC-H: Total hot run time: 41931 ms
|
TPC-DS: Total hot run time: 168324 ms
|
ClickBench: Total hot run time: 30.89 s
|
…apache#37141 Session variable `disable_file_cache` is processed as "disposable file cache" in beta_rowset_reader.cpp. ``` if (_read_context->runtime_state != nullptr) { _read_options.io_ctx.query_id = &_read_context->runtime_state->query_id(); _read_options.io_ctx.read_file_cache = _read_context->runtime_state->query_options().enable_file_cache; _read_options.io_ctx.is_disposable = _read_context->runtime_state->query_options().disable_file_cache; } ``` We use disposable cache to avoid IO amp and avoid large amount of eviction from the cached data ("normal cache"). We cannot set the read option cache policy to "no cache" because it may cause IO amp: every page IO will cause a remote IO, which is a performance disaster.
…#37141 (#39123) Session variable `disable_file_cache` is processed as "disposable file cache" in beta_rowset_reader.cpp. ``` if (_read_context->runtime_state != nullptr) { _read_options.io_ctx.query_id = &_read_context->runtime_state->query_id(); _read_options.io_ctx.read_file_cache = _read_context->runtime_state->query_options().enable_file_cache; _read_options.io_ctx.is_disposable = _read_context->runtime_state->query_options().disable_file_cache; } ``` We use disposable cache to avoid IO amp and avoid large amount of eviction from the cached data ("normal cache"). We cannot set the read option cache policy to "no cache" because it may cause IO amp: every page IO will cause a remote IO, which is a performance disaster.
…apache#37141 (apache#39123) Session variable `disable_file_cache` is processed as "disposable file cache" in beta_rowset_reader.cpp. ``` if (_read_context->runtime_state != nullptr) { _read_options.io_ctx.query_id = &_read_context->runtime_state->query_id(); _read_options.io_ctx.read_file_cache = _read_context->runtime_state->query_options().enable_file_cache; _read_options.io_ctx.is_disposable = _read_context->runtime_state->query_options().disable_file_cache; } ``` We use disposable cache to avoid IO amp and avoid large amount of eviction from the cached data ("normal cache"). We cannot set the read option cache policy to "no cache" because it may cause IO amp: every page IO will cause a remote IO, which is a performance disaster.
…le_segment_cache in query (#37141) Currently, whether to read from file cache or remote storage is controlled by the BE config `enable_file_cache` in cloud mode. This PR proposed to control the file cache behavior via session variables when executing queries in cloud mode. It's more convenient when have such a session variable, cache behavior could be controlled per query/session without changing BE configs, such as: 1. **Performance test**. Test the query performance when read from local file cache or remote storage for queries. 2. **Data correctness**. Check if it's file cache issue for certain tables or queries. The read path has three kinds of caches: segment cache, page cache and file cache. | module | cache| BE config | session variable| |------------|------|----------| ---- | | Segment | segment cache | disable_segment_cache | **enable_segment_cache** (supportted by this PR) | | PageIO | page cache | disable_storage_page_cache | enable_page_cache | | FileReader | file cache | enable_file_cache | **disable_file_cache** (supportted by this PR) | The modification of the PR: - **enable_segment_cache**: add a new session variable enable_segment_cache to control use segment cache or not. - **disable_file_cache**: disable_file_cache was for write path in cloud mode. It's supported for read path when executing queries in the PR. With this PR, data is read from remote storage without cache: ```sql set enable_segment_cache=false; set enable_page_cache=false; set disable_file_cache=true; ``` Co-authored-by: Gavin Chou <[email protected]>
…#37141 (#39123) Session variable `disable_file_cache` is processed as "disposable file cache" in beta_rowset_reader.cpp. ``` if (_read_context->runtime_state != nullptr) { _read_options.io_ctx.query_id = &_read_context->runtime_state->query_id(); _read_options.io_ctx.read_file_cache = _read_context->runtime_state->query_options().enable_file_cache; _read_options.io_ctx.is_disposable = _read_context->runtime_state->query_options().disable_file_cache; } ``` We use disposable cache to avoid IO amp and avoid large amount of eviction from the cached data ("normal cache"). We cannot set the read option cache policy to "no cache" because it may cause IO amp: every page IO will cause a remote IO, which is a performance disaster.
Proposed changes
Currently, whether to read from file cache or remote storage is controlled by the BE config
enable_file_cache
in cloud mode.This PR proposed to control the file cache behavior via session variables when executing queries in cloud mode.
It's more convenient when have such a session variable, cache behavior could be controlled per query/session without changing BE configs, such as:
The read path has three kinds of caches: segment cache, page cache and file cache.
The modification of the PR:
With this PR, data is read from remote storage without cache: