Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimizing db WorkflowAppLog index #14710

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

horochx
Copy link
Contributor

@horochx horochx commented Mar 3, 2025

Summary

Currently, the primary slow queries in our database are originating from the get_paginate_workflow_app_logs method of the WorkflowAppService. I've noticed that there might be room for optimization in the index of the WorkflowAppLog table.

Consequently, I have added the most frequently used fields, created_at and workflow_run_id, to the index.

Close #14752

Screenshots

Before After
... ...

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 💪 enhancement New feature or request labels Mar 3, 2025
@crazywoola crazywoola requested a review from laipz8200 March 3, 2025 06:51
@crazywoola
Copy link
Member

Hello, please open an issue and link it in the description.

@horochx
Copy link
Contributor Author

horochx commented Mar 3, 2025

Hello, please open an issue and link it in the description.

I've created issue #14752 and attached the relevant SQL statements.

I was wondering if adding a partial index on WorkflowRun.status=failed might further optimize query performance. However, I wasn't certain whether this use case was generally applicable or just specific to my situation, so I didn't pursue it further.

@bowenliang123
Copy link
Contributor

Reasonable to add field to the existed index workflow_app_log_app_idx and introduce new index with workflow_run_id for joining.

And please make sure all the DDL operations are applied by running flask db migrate to generation all the necessary changes in db migrations.

@horochx
Copy link
Contributor Author

horochx commented Mar 4, 2025

Reasonable to add field to the existed index workflow_app_log_app_idx and introduce new index with workflow_run_id for joining.

And please make sure all the DDL operations are applied by running flask db migrate to generation all the necessary changes in db migrations.

Thank you for your suggestion. The reason I haven't run flask db migrate yet is because when I tested it, I noticed there are still some pending model migrations in the main branch. I'm unsure if this is the intended behavior within Dify's development workflow, so I've left it as is for now 😂.

@laipz8200
Copy link
Member

Can you provide a performance testing process for large-scale datasets? I would like to understand how significant the impact of this improvement can be. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_paginate_workflow_app_logs experiencing slow query
4 participants