Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sessionstorage is constantly growing #5010

Open
xshadow opened this issue Apr 17, 2021 · 6 comments
Open

Sessionstorage is constantly growing #5010

xshadow opened this issue Apr 17, 2021 · 6 comments
Labels

Comments

@xshadow
Copy link

xshadow commented Apr 17, 2021

The session storage seems to be constantly growing
When running etherpad we see a constantly growing number of session storage values in the database.

Is there a way to clean them up? We already looked into the script, which didn't help. https://github.com/ether/etherpad-lite/blob/develop/src/bin/deleteAllGroupSessions.js . So it seems that this sessions are no group sessions.

A standard session storage entry looks like:

| sessionstorage:----KdezcUojWiGfuzBgo1L-OnF4SbbZ | {"cookie":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"sameSite":"Lax","secure":false}} |
| sessionstorage:---0sPTO0IIvtWBpbYpccywxSImVqSf_ | {"cookie":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"sameSite":"Lax","secure":false}} |
| sessionstorage:---2pUJfjIE00yXpIkrzvHvUEOCLKI2X | {"cookie":{"path":"/","_expires":null,"originalMaxAge":null,"httpOnly":true,"sameSite":"Lax","secure":false}} |

Server (please complete the following information):

  • Etherpad version: 1.8.13
  • OS: Debian Buster
  • Node.js version (node --version): v14.16.1
  • npm version (npm --version): 6.14.12

Additional context

All session storage keys values from mariadb:

MariaDB [etherpad]> SELECT COUNT(*) FROM store WHERE `key` LIKE 'sessionstorage%';
+----------+
| COUNT(*) |
+----------+
| 16306299 |
+----------+
1 row in set (14.974 sec)

None session storage keys values from mariadb:

MariaDB [etherpad]> SELECT COUNT(*) FROM store WHERE `key` NOT LIKE 'sessionstorage%';
+----------+
| COUNT(*) |
+----------+
| 54156416 |
+----------+
1 row in set (36.645 sec)
@webzwo0i
Copy link
Member

webzwo0i commented Apr 17, 2021

My first guess is that you can delete them after stopping Etherpad in case you run the default configuration (no special auth plugins and settings.json with requireAuthentication:false, requireAuthorization:false), but I'll take a deeper look when I have more time. Afaik sessions would just be re-generated as soon as users reconnect - but I need to be sure about this. They are different from group sessions (which you'd use, if you run an instance that can't be used without a valid session and generate the sessions via API), and thus as you noticed the script won't work.

They are also not used for author information (user names/author attribution across pads).

What they are used for is accessing /admin endpoints and other auth stuff depending on your configuration/plugins used.

#4898 has more discussion on the underlying issue.

@xshadow
Copy link
Author

xshadow commented Apr 17, 2021

Thanks for etherpad as software and the fast reply. Thanks for pointing me to this issue.

If I calculated this correct, those 16 million session won't require more space than 300MB, so this is fine. But I think it would be great if they could expire automatically or by or a session garbage collector job :)

@stale
Copy link

stale bot commented Jun 16, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix Wont Fix these things, no hate. label Jun 16, 2021
@webzwo0i
Copy link
Member

Closing this, as #4898 has more info and in the course of fixing #4898 we will need a way to purge old sessions anyway (manually or even automatically).

@webzwo0i
Copy link
Member

webzwo0i commented Dec 3, 2021

Reopening again, as #4898 does not necessarly require cleaning up old sessions...

@webzwo0i webzwo0i reopened this Dec 3, 2021
@stale stale bot removed wontfix Wont Fix these things, no hate. labels Dec 3, 2021
@webzwo0i webzwo0i added Bug and removed Question labels Dec 3, 2021
@rhansen
Copy link
Member

rhansen commented Dec 3, 2021

Design idea:

  • express-session config changes:
    • set maxAge, value TBD (maybe 7 days so the user doesn't have to keep logging in?)
    • set rolling=true so that a session stays alive as long as the user stays connected
    • set saveUninitialized=false to lower the load on the db (though it might not help in our case because I think we always modify the session object)
    • set resave=false to lower the load on the db (though it might not help due to rolling=true)
  • SessionStore changes:
    • implement the touch() method
    • when creating/updating/touching a session, (re)set a timer that auto-deletes the session's db record when it expires
  • call req.session.touch() each time the server receives a socket.io ping or message from the client

A problem with the above scheme: The constant .touch() calls might increase the load on the db considerably. We could reduce db load by skipping a db write unless it would extend the saved lifetime by more than some threshold. (In other words: Trade accuracy for reduced load.) For example, we could set maxAge to 14 days and only update the db record if it would extend the lifetime by 7 days or more. That should reduce db load to one write per week per session, which is trivial.

Another problem: We would need to clean up old session records after a dirty shutdown, or after upgrading from a version of Etherpad that doesn't expire sessions. ueberdb doesn't have cursor support so we can't just iterate over all records that match sessionstorage:*. We could add cursor support to ueberdb, but that would take a lot of effort. Alternatively, with some clever key prefixing we can iterate over old sessions in small batches. Here's one approach:

  • Add two new numeric db records:
    • instanceId: A counter that is incremented each time Etherpad starts up.
    • caughtUpToInstanceId: Used to clean up session state from previous Etherpad runs. Always less than or equal to instanceId.
  • When saving a session, use key `sessionstore:${instanceId}:${sid}` (instead of the current `sessionstorage:${sid}`) and delete records `sessionstore:${caughtUpToInstanceId}:${sid}` through `sessionstore:${instanceId - 1}:${sid}`.
  • When looking up a session ID, fetch all keys `sessionstore:${caughtUpToInstanceId}:${sid}` through `sessionstore:${instanceId}:${sid}` (inclusive) in parallel. Of the non-null results, use the one with the highest instance ID.
  • On startup, launch a worker thread to clean up or migrate session state from previous runs:
    for (; caughtUpToInstanceId < instanceId; ++caughtUpToInstanceId) {
      // The following query should return a small number of keys (the number of
      // sessions that were active when instance caughtUpToInstanceId exited).
      for (const key of getAllDbKeysWithPrefix(`sessionstore:${caughtUpToInstanceId}:`)) {
        const sess = getDbValue(key);
        deleteDbRecord(key);
        // If the session is still valid, save under the current instanceId and start a cleanup timer.
        if (!isSessionExpired(sess)) saveSession(sess);
      }
    }

To clean up legacy `sessionstorage:${sid}` records, we could do something like this:

const alphabet = 'abcdefghijklmnopqrstuvwxyz';
const sidCharset = `_-0123456789${alphabet}${alphabet.toUpperCase()}`;
for (const chars of cartesianProductGenerator(Array(4).fill(sidCharset))) {
  for (const key of getAllDbKeysWithPrefix(`sessionstorage:${chars.join('')}`) {
    deleteDbRecord(key);
  }
}      

But it would probably be better to just have the user issue a native DB query to delete all of the legacy records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants