-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix async coroutine limit not respected and add s3/gcs chunk size #3080
Conversation
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3080 +/- ##
===========================================
- Coverage 74.82% 51.83% -23.00%
===========================================
Files 202 202
Lines 21446 21454 +8
Branches 2763 2763
===========================================
- Hits 16048 11120 -4928
- Misses 4626 9727 +5101
+ Partials 772 607 -165 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run Status
|
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about https://github.com/flyteorg/flytekit/blob/master/flytekit/core/base_task.py#L638 ?
Edit: this one is ok since it's just parallelizing outputs.
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Code Review Agent Run #c3e341Actionable Suggestions - 0Additional Suggestions - 1
Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
) Signed-off-by: Yee Hing Tong <[email protected]>
) (#3083) Signed-off-by: Yee Hing Tong <[email protected]>
…yteorg#3080) Signed-off-by: Yee Hing Tong <[email protected]> Signed-off-by: lu00122 <[email protected]>
Tracking issue
flyteorg/flyte#6188
Why are the changes needed?
Please see issue.
What changes were proposed in this pull request?
The moment the asyncio.Task is created, it is available for running. Once the
await
hits on the run coroutines in batches, all those tasks start running, rendering any batching of coroutines moot.In testing, we also noticed that the s3fs and gcsfs default write chunk sizes were 50MB. We felt that was a bit high and in testing noticed no performance decrease (actually saw possibly non-statistically significant increase) in performance. We are therefore changing it by default to 25MB. Also this setting and the number of coroutine batches the list & dict transformers run will be configurable via environment variables.
The new env vars are:
_F_P_WRITE_CHUNK_SIZE
- This is the number of bytes and by default is set to"26214400"
(25 * 2^20)_F_TE_MAX_COROS
- This is the number of coroutines the type engine will use for lists/dicts and is set to"10"
.How was this patch tested?
Tested on our byoc clusters, and by the OSS OP.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
This pull request introduces improvements to the Flytekit library by adding configurable chunk sizes for S3 and GCS uploads and enhancing coroutine batching in the type engine. The chunk size for S3/GCS uploads is now set to 25MB by default and can be configured via an environment variable.Unit tests added: False
Estimated effort to review (1-5, lower is better): 2