-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[24.2] Fix private role name performance issue #19679
[24.2] Fix private role name performance issue #19679
Conversation
This won't work. This breaks logic that assumes a private role name to contain a user's email. |
Or we could paginate the list of roles. That, at least, will fix the issue with the timeout. Anything else may be too intrusive for a bug fix. I'll try that. |
I've tried exploring the column_property and the relationship approach, with the goal of having user emails corresponding to private roles loaded eagerly, so that we don't have the N+1 problem each time we retrieve roles. I think this won't work. Here's what I tried:
_private_role_user = relationship(
"User",
secondary="user_role_association",
primaryjoin="and_(Role.id == UserRoleAssociation.role_id, Role.type == 'private')",
secondaryjoin="UserRoleAssociation.user_id == User.id",
lazy="joined",
uselist=True, # we need this to be True to catch errors in the name property
viewonly=True,
)
@hybrid_property
def name(self):
if self.type == "private":
if len(self.private_role_user) == 0 or self.private_role_user[0] is None or self.private_role_user[0].email is None:
raise Exception("Did not find user for private role")
elif len(self.private_role_user) > 1:
raise Exception(f"Found multiple({len(self.private_role_user)}) users for one private role")
return self.private_role_user[0].email
else:
return self._name
return self._name Role retrieval works correctly: we get role names or user emails (for private roles), and the database is called only once. However, this piles complexity onto what should have been a simple mapping. As a result:
stmt = (
select(Role)
.join(Role.users)
.where(UserRoleAssociation.user_id == user.id)
.where(Role.type == Role.types.SHARING)
) which used to generate this SQL: SELECT role.id, role.name, role.type
FROM role JOIN user_role_association ON role.id = user_role_association.role_id
WHERE user_role_association.user_id = 42 AND role.type = 'sharing'; But now it would be this: SELECT role.id, role.name, role.type, user_1.id AS id_1, user_1.email
FROM role JOIN user_role_association ON role.id = user_role_association.role_id LEFT OUTER JOIN (user_role_association AS user_role_association_1 JOIN user AS user_1 ON user_role_association_1.user_id = user_1.id) ON role.id = user_role_association_1.role_id AND role.type = 'private'
WHERE user_role_association.user_id = 42 AND role.type = 'sharing'; ...which requires a call to In summary, I don't think this approach will work. Instead, I'll add pagination + a page limit to the API endpoint. Also, I'll try to fix this on the dev branch by addressing the root cause: we should never rely on string comparison (of private role name to user email) to determine permissions. |
db57072
to
c79cd77
Compare
Specify correct role name for private roles. See comment in make_role definition for why this must be set.
c79cd77
to
ecec57d
Compare
Retrieves a mapping of private role ids to associated users' emails. Used to lookup a user's email when generating a displayed_name for private roles.
Display associated user email as role name for private roles, and role name for non-private roles.
590dbc0
to
a7fcbec
Compare
test failures irrelevant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
2b285b5
to
ef09280
Compare
Fixes #19654
The gist of the issue: #19654 (comment)
TODO:
Solution:
Intercept Role data after it has been retrieved from the database but before it's been sent with the Response. Make one call to the database retrieving a mapping of private role IDs to associated user emails, load into a set and use to augment the Role data as needed, replacing names of private roles with corresponding emails, or adding them as an additional field.
Note: there's much duplication here, but to get rid of it we'd have to touch many places (fastapi controllers, services, legacy controllers, managers), and that kind of refactoring should be done on the dev branch. I've tried to change only what's necessary to fix the bug.
Discarded approaches:
Reason for discarding: breaks logic on the client that assumes a private role name to contain a user's email.
(also applies for using any new non-mapped field, like "displayed_name')
Reasons for discarding: [24.2] Fix private role name performance issue #19679 (comment)
Reason for discarding: requires major changes on the client (role names displayed in dropdowns, etc.)
Performance
I've tested this locally on 40K users+roles+associations. The roles api endpoint on the release branch is indeed unusable (as reported in the linked issue), whereas this branch is fine.
How to test the changes?
(Select all options that apply)
License