[Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) #810

pattisdr · 2022-07-06T13:11:40Z

Is your feature request related to a specific problem?

Locally, when executing multiple consecutive privacy requests, you can run into:

worker_1            | [2022-07-06 13:03:43,882: WARNING/ForkPoolWorker-1]   File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 584, in connect
worker_1            |     return self.dbapi.connect(*cargs, **cparams)
worker_1            | [2022-07-06 13:03:43,883: WARNING/ForkPoolWorker-1]   File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
worker_1            |     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
worker_1            | [2022-07-06 13:03:43,885: WARNING/ForkPoolWorker-1] sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL:  sorry, too many clients already
worker_1            |

as we open up too many connections against the fidesops postgres database.

Also, some time after running all these privacy requests, I still have a lot of open connections. Investigate if there's been a regression in closing connections.

Describe the solution you'd like

Add sane defaults for limiting concurrency of the number of privacy requests that are executed simultaneously. We used to have this when we limited the number of concurrent threads when we were using asycnio Reduce Number of Concurrent Threads #145, but now that we've moved to Celery, we could use the --concurrency flag.
Verify we're closing connections after we open them.

Describe alternatives you've considered, if any

A description of any alternative solutions or features you've considered.

Additional context

If we open too many connections to the datastore (mainly application database) concurrently, it can't do it's job properly. When we introduced celery, we introduced this issue. In the future we can tune this to use other methods to improve accuracy and performance.

The text was updated successfully, but these errors were encountered:

pattisdr · 2022-07-13T15:56:39Z

To test:

In terminal, run

PGPASSWORD=<postgres password from toml file> watch 'psql -h localhost -p 5432 -U postgres -d app -c "SELECT sum(numbackends) FROM pg_stat_database"'

This will ping for the number of connections on the fidesops postgres database every 2 seconds.

In postman, after running the prerequisite calls for "Minimum API calls to create an access request", send multiple requests to create a privacy request. To quickly exceed the default number of allowed connections (100), Post multiple requests at once. Here, I am just running access requests for the same identity, and my graph just has the postgres collections in it.

POST {{host}}/privacy-request

[
    {
        "requested_at": "2021-08-30T16:09:37.359Z",
        "identity": {"email": "[email protected]"},
        "policy_key": "{{policy_key}}"
    },
      {
        "requested_at": "2021-08-30T16:09:37.359Z",
        "identity": {"email": "[email protected]"},
        "policy_key": "{{policy_key}}"
    },
      {
        "requested_at": "2021-08-30T16:09:37.359Z",
        "identity": {"email": "[email protected]"},
        "policy_key": "{{policy_key}}"
    },
      {
        "requested_at": "2021-08-30T16:09:37.359Z",
        "identity": {"email": "[email protected]"},
        "policy_key": "{{policy_key}}"
    },
      {
        "requested_at": "2021-08-30T16:09:37.359Z",
        "identity": {"email": "[email protected]"},
        "policy_key": "{{policy_key}}"
    },
      {
        "requested_at": "2021-08-30T16:09:37.359Z",
        "identity": {"email": "[email protected]"},
        "policy_key": "{{policy_key}}"
    }

Early thoughts:

Connections are quickly swamped. Numerous ROLLBACK entries, idle connections waiting on another query. It seems like we’re adding more rather than reusing existing ones. You can very quickly get over 100.

16384 | app     | 472 |       10 | postgres |                  | 172.18.0.10 |                 |       60566 | 2022-07-12 20:20:39.683792+00 |                               | 2022-07-12 20:20:40.780926+00 | 2022-07-12 20:20:40.781194+00 | Client          | ClientRead          | idle   |             |              | ROLLBACK

I'm guessing we are using SQLAlchemy's default QueuePool. I haven't looked into what happened with the addition of Celery yet but we need to make sure that the connection pool is not being shared across several celery workers. Setting a concurrency limit on the celery workers should help, but I think we also need to make sure we're reusing connections.

* Reduce number of open connections: - Limit task concurrency to two per worker. - Create one Engine per celery process which opens up a connection pool. Create one Session per celery process and use that session across privacy requests. - Close the session after the privacy request has finished executing. This just resets the session and returns connections back to the pool. It can be reused. - Remove unnecessary places where session is closed manually because the session is being used as a context manager and is already closed through that. - Pass the same Session that the privacy request is using through to TaskResources to be re-used to create ExecutionLogs instead of opening up a new Session. - Don't close the session when passing it into the Execution Log, wait until the entire privacy request is complete/exited. * Define "self" for run_privacy_task - it's the task itself. For mypy's benefits, define that the session is a context manager. * Make a session non-optional for graph_task.run_access_request, graph_task.run_erasure, and for instantiating taskResources * Use missing db fixture. * Add missing db resource. * Update test to reflect new behavior that disabling a datasource while a request is in progress can cause related collections to be skipped once the current session is expired and the connection config has the most recent state. Because the same Session that is being used to run the PrivacyRequest is now being used for ExecutionLogs, the process of saving an ExecutionLog runs a session.commit() which expires the Session and causes the ConnectionConfig to have the most recent state the next time it is accessed. * Update CHANGELOG.

pattisdr added the enhancement New feature or request label Jul 6, 2022

pattisdr changed the title ~~Privacy Request Limit Concurrency~~ Privacy Request Limit Concurrency / Too Many Open Connections Jul 6, 2022

pattisdr assigned pattisdr and unassigned pattisdr Jul 11, 2022

pattisdr self-assigned this Jul 13, 2022

adriaaaa unassigned pattisdr Jul 15, 2022

pattisdr self-assigned this Jul 20, 2022

adriaaaa changed the title ~~Privacy Request Limit Concurrency / Too Many Open Connections~~ [Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) Jul 20, 2022

pattisdr mentioned this issue Jul 26, 2022

Reduce # of clients connected to the application db [#810] #944

Merged

10 tasks

seanpreston closed this as completed in #944 Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) #810

[Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) #810

pattisdr commented Jul 6, 2022 •

edited by adriaaaa

Loading

pattisdr commented Jul 13, 2022 •

edited

Loading

[Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) #810

[Spike] Privacy Request Limit Concurrency / Too Many Open Connections (Timebox: 3 days) #810

Comments

pattisdr commented Jul 6, 2022 • edited by adriaaaa Loading

Is your feature request related to a specific problem?

Describe the solution you'd like

Describe alternatives you've considered, if any

Additional context

pattisdr commented Jul 13, 2022 • edited Loading

pattisdr commented Jul 6, 2022 •

edited by adriaaaa

Loading

pattisdr commented Jul 13, 2022 •

edited

Loading