Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.2] Fix private role name performance issue #19679

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Feb 21, 2025

Fixes #19654

The gist of the issue: #19654 (comment)

TODO:

  • Add fix for notifications (select roles)

Solution:

Intercept Role data after it has been retrieved from the database but before it's been sent with the Response. Make one call to the database retrieving a mapping of private role IDs to associated user emails, load into a set and use to augment the Role data as needed, replacing names of private roles with corresponding emails, or adding them as an additional field.

Note: there's much duplication here, but to get rid of it we'd have to touch many places (fastapi controllers, services, legacy controllers, managers), and that kind of refactoring should be done on the dev branch. I've tried to change only what's necessary to fix the bug.

Discarded approaches:

  • Set the private role's description field to "Private role for [user' current email]" at runtime. (That field is null by default in newer versions and "private role for [user email]" in older versions.) The value is not persisted and is derived from the associated user's current email.
    Reason for discarding: breaks logic on the client that assumes a private role name to contain a user's email.
    (also applies for using any new non-mapped field, like "displayed_name')
  • A relationship mapping added to the Role model definition that eagerly loads associated user's email.
    Reasons for discarding: [24.2] Fix private role name performance issue #19679 (comment)
  • Use limit + pagination.
    Reason for discarding: requires major changes on the client (role names displayed in dropdowns, etc.)

Performance

I've tested this locally on 40K users+roles+associations. The roles api endpoint on the release branch is indeed unusable (as reported in the linked issue), whereas this branch is fine.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
  • Try to edit permissions for libraries, library datasets, datasets, histories; and in the admin: role permissions, group permissions (on create and edit): observe that role names are displayed as names for non-private roles and emails of associated users for private roles - both in the drop downs and the selected values.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/bug area/performance area/database Galaxy's database or data access layer labels Feb 21, 2025
@jdavcs jdavcs requested a review from a team February 21, 2025 19:01
@github-actions github-actions bot added this to the 25.0 milestone Feb 21, 2025
@jdavcs jdavcs marked this pull request as draft February 21, 2025 20:48
@jdavcs jdavcs changed the title [24.2] Dev role name displayed name description [24.2] Fix private role name performance issue Feb 21, 2025
@jdavcs
Copy link
Member Author

jdavcs commented Feb 21, 2025

This won't work. This breaks logic that assumes a private role name to contain a user's email.

@jdavcs
Copy link
Member Author

jdavcs commented Feb 22, 2025

Or we could paginate the list of roles. That, at least, will fix the issue with the timeout. Anything else may be too intrusive for a bug fix. I'll try that.

@jdavcs
Copy link
Member Author

jdavcs commented Feb 27, 2025

I've tried exploring the column_property and the relationship approach, with the goal of having user emails corresponding to private roles loaded eagerly, so that we don't have the N+1 problem each time we retrieve roles. I think this won't work. Here's what I tried:

  1. Add a relationship:
    _private_role_user = relationship(
        "User",
        secondary="user_role_association",
        primaryjoin="and_(Role.id == UserRoleAssociation.role_id, Role.type == 'private')",
        secondaryjoin="UserRoleAssociation.user_id == User.id",
        lazy="joined",
        uselist=True,  # we need this to be True to catch errors in the name property
        viewonly=True,
    )
  1. Use that relationship in the name hybrid property: if the role is private, return the associated user's email, otherwise return the role name. As a side benefit, it's easy to check for and raise errors if there's no user for a private role or if there's more than one:
@hybrid_property                                                                                
def name(self):                                                                                 
    if self.type == "private":                                                                  
        if len(self.private_role_user) == 0 or self.private_role_user[0] is None or self.private_role_user[0].email is None:  
            raise Exception("Did not find user for private role")                               
        elif len(self.private_role_user) > 1:                                                   
            raise Exception(f"Found multiple({len(self.private_role_user)}) users for one private role")         
        return self.private_role_user[0].email                                                  
    else:                                                                                       
        return self._name                                                                       
    return self._name 

Role retrieval works correctly: we get role names or user emails (for private roles), and the database is called only once.

However, this piles complexity onto what should have been a simple mapping. As a result:

  1. Selecting a role record from the database now requires multiple joins.
  2. As a consequence, the performance of a simple select(Role) is 10x worse. Here's the analysis from main:
  • Current version without the clever joins:
galaxy_main=> explain analyze SELECT r.id, r.name, r.type FROM role r;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on role r  (cost=0.00..9991.74 rows=387174 width=34) (actual time=0.016..76.857 rows=390077 loops=1)
 Planning Time: 0.049 ms
 Execution Time: 94.613 ms
(3 rows) 
  • New version with the joins:
galaxy_main=> explain analyze SELECT r.id, r.name, r.type, u.id, u.email FROM role r LEFT OUTER JOIN (user_role_association ura JOIN galaxy_user u ON ura.user_id = u.id) ON r.id = ura.role_id AND r.type = 'private';
                                                                 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=34183.84..42888.51 rows=387174 width=61) (actual time=380.354..838.836 rows=390078 loops=1)
   Hash Cond: (ura.role_id = r.id) 
   Join Filter: ((r.type)::text = 'private'::text)
   Rows Removed by Join Filter: 582
   ->  Hash Join  (cost=19352.42..27055.17 rows=381682 width=31) (actual time=195.749..457.354 rows=390364 loops=1)
         Hash Cond: (ura.user_id = u.id) 
         ->  Seq Scan on user_role_association ura  (cost=0.00..6700.82 rows=381682 width=8) (actual time=1.095..61.483 rows=390364 loops=1)
         ->  Hash  (cost=14474.41..14474.41 rows=390241 width=27) (actual time=192.554..192.556 rows=390273 loops=1)
               Buckets: 524288  Batches: 1  Memory Usage: 26829kB
               ->  Seq Scan on galaxy_user u  (cost=0.00..14474.41 rows=390241 width=27) (actual time=0.022..79.469 rows=390273 loops=1)
   ->  Hash  (cost=9991.74..9991.74 rows=387174 width=34) (actual time=182.390..182.391 rows=390078 loops=1)
         Buckets: 524288  Batches: 1  Memory Usage: 29791kB
         ->  Seq Scan on role r  (cost=0.00..9991.74 rows=387174 width=34) (actual time=0.013..56.502 rows=390078 loops=1)
 Planning Time: 0.408 ms
 Execution Time: 854.367 ms
(15 rows) 
  1. When we retrieve roles, we should call unique() on the returned Result object ("...as it contains results that include joined eager loads against collections"); and this error will happen at runtime depending on which roles are returned.
  2. Most importantly, since select(Role) results in implicit joins, we run the risk of breaking anything that involves join(Role)! For example, in model/security.py we have a relatively simple query:
 stmt = ( 
     select(Role)
     .join(Role.users)
     .where(UserRoleAssociation.user_id == user.id)
     .where(Role.type == Role.types.SHARING)
 )

which used to generate this SQL:

SELECT role.id, role.name, role.type
FROM role JOIN user_role_association ON role.id = user_role_association.role_id 
WHERE user_role_association.user_id = 42 AND role.type = 'sharing';

But now it would be this:

SELECT role.id, role.name, role.type, user_1.id AS id_1, user_1.email
FROM role JOIN user_role_association ON role.id = user_role_association.role_id LEFT OUTER JOIN (user_role_association AS user_role_association_1 JOIN user AS user_1 ON user_role_association_1.user_id = user_1.id) ON role.id = user_role_association_1.role_id AND role.type = 'private'
WHERE user_role_association.user_id = 42 AND role.type = 'sharing';

...which requires a call to unique() on the result, and may or may not break... Yikes!!

In summary, I don't think this approach will work. Instead, I'll add pagination + a page limit to the API endpoint. Also, I'll try to fix this on the dev branch by addressing the root cause: we should never rely on string comparison (of private role name to user email) to determine permissions.

jdavcs added 4 commits March 10, 2025 11:13
Specify correct role name for private roles. See comment in make_role
definition for why this must be set.
@jdavcs jdavcs force-pushed the 24.2_role_name_displayed_name_description branch from c79cd77 to ecec57d Compare March 10, 2025 15:13
@jdavcs jdavcs force-pushed the 24.2_role_name_displayed_name_description branch from 590dbc0 to a7fcbec Compare March 11, 2025 04:34
@jdavcs jdavcs marked this pull request as ready for review March 11, 2025 04:48
@jdavcs
Copy link
Member Author

jdavcs commented Mar 11, 2025

test failures irrelevant

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jdavcs jdavcs force-pushed the 24.2_role_name_displayed_name_description branch from 2b285b5 to ef09280 Compare March 11, 2025 17:07
@jdavcs jdavcs merged commit bd75278 into galaxyproject:release_24.2 Mar 11, 2025
52 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer area/performance kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants