This repository was archived by the owner on Nov 30, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 16
[Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) #810
Labels
enhancement
New feature or request
Comments
To test:
PGPASSWORD=<postgres password from toml file> watch 'psql -h localhost -p 5432 -U postgres -d app -c "SELECT sum(numbackends) FROM pg_stat_database"' This will ping for the number of connections on the fidesops postgres database every 2 seconds.
POST {{host}}/privacy-request [
{
"requested_at": "2021-08-30T16:09:37.359Z",
"identity": {"email": "[email protected]"},
"policy_key": "{{policy_key}}"
},
{
"requested_at": "2021-08-30T16:09:37.359Z",
"identity": {"email": "[email protected]"},
"policy_key": "{{policy_key}}"
},
{
"requested_at": "2021-08-30T16:09:37.359Z",
"identity": {"email": "[email protected]"},
"policy_key": "{{policy_key}}"
},
{
"requested_at": "2021-08-30T16:09:37.359Z",
"identity": {"email": "[email protected]"},
"policy_key": "{{policy_key}}"
},
{
"requested_at": "2021-08-30T16:09:37.359Z",
"identity": {"email": "[email protected]"},
"policy_key": "{{policy_key}}"
},
{
"requested_at": "2021-08-30T16:09:37.359Z",
"identity": {"email": "[email protected]"},
"policy_key": "{{policy_key}}"
}
Early thoughts: Connections are quickly swamped. Numerous ROLLBACK entries, idle connections waiting on another query. It seems like we’re adding more rather than reusing existing ones. You can very quickly get over 100.
I'm guessing we are using SQLAlchemy's default |
10 tasks
seanpreston
pushed a commit
that referenced
this issue
Jul 27, 2022
* Reduce number of open connections: - Limit task concurrency to two per worker. - Create one Engine per celery process which opens up a connection pool. Create one Session per celery process and use that session across privacy requests. - Close the session after the privacy request has finished executing. This just resets the session and returns connections back to the pool. It can be reused. - Remove unnecessary places where session is closed manually because the session is being used as a context manager and is already closed through that. - Pass the same Session that the privacy request is using through to TaskResources to be re-used to create ExecutionLogs instead of opening up a new Session. - Don't close the session when passing it into the Execution Log, wait until the entire privacy request is complete/exited. * Define "self" for run_privacy_task - it's the task itself. For mypy's benefits, define that the session is a context manager. * Make a session non-optional for graph_task.run_access_request, graph_task.run_erasure, and for instantiating taskResources * Use missing db fixture. * Add missing db resource. * Update test to reflect new behavior that disabling a datasource while a request is in progress can cause related collections to be skipped once the current session is expired and the connection config has the most recent state. Because the same Session that is being used to run the PrivacyRequest is now being used for ExecutionLogs, the process of saving an ExecutionLog runs a session.commit() which expires the Session and causes the ConnectionConfig to have the most recent state the next time it is accessed. * Update CHANGELOG.
sanders41
pushed a commit
that referenced
this issue
Sep 22, 2022
* Reduce number of open connections: - Limit task concurrency to two per worker. - Create one Engine per celery process which opens up a connection pool. Create one Session per celery process and use that session across privacy requests. - Close the session after the privacy request has finished executing. This just resets the session and returns connections back to the pool. It can be reused. - Remove unnecessary places where session is closed manually because the session is being used as a context manager and is already closed through that. - Pass the same Session that the privacy request is using through to TaskResources to be re-used to create ExecutionLogs instead of opening up a new Session. - Don't close the session when passing it into the Execution Log, wait until the entire privacy request is complete/exited. * Define "self" for run_privacy_task - it's the task itself. For mypy's benefits, define that the session is a context manager. * Make a session non-optional for graph_task.run_access_request, graph_task.run_erasure, and for instantiating taskResources * Use missing db fixture. * Add missing db resource. * Update test to reflect new behavior that disabling a datasource while a request is in progress can cause related collections to be skipped once the current session is expired and the connection config has the most recent state. Because the same Session that is being used to run the PrivacyRequest is now being used for ExecutionLogs, the process of saving an ExecutionLog runs a session.commit() which expires the Session and causes the ConnectionConfig to have the most recent state the next time it is accessed. * Update CHANGELOG.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Is your feature request related to a specific problem?
as we open up too many connections against the fidesops postgres database.
Describe the solution you'd like
Add sane defaults for limiting concurrency of the number of privacy requests that are executed simultaneously. We used to have this when we limited the number of concurrent threads when we were using asycnio Reduce Number of Concurrent Threads #145, but now that we've moved to Celery, we could use the
--concurrency
flag.Verify we're closing connections after we open them.
Describe alternatives you've considered, if any
A description of any alternative solutions or features you've considered.
Additional context
If we open too many connections to the datastore (mainly application database) concurrently, it can't do it's job properly. When we introduced celery, we introduced this issue. In the future we can tune this to use other methods to improve accuracy and performance.
The text was updated successfully, but these errors were encountered: