Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

537: Endpoint to list frontend users #554

Merged
merged 8 commits into from
Jan 12, 2023

Conversation

Vechtomov
Copy link
Collaborator

Resolves #537

@Vechtomov
Copy link
Collaborator Author

Vechtomov commented Jan 8, 2023

I've left a couple of questions here #537, don't forget to consider them during review

@Vechtomov Vechtomov changed the title [WIP] 537: Endpoint to list frontend users 537: Endpoint to list frontend users Jan 8, 2023
@Vechtomov Vechtomov marked this pull request as ready for review January 8, 2023 23:20
@fozziethebeat fozziethebeat added this to the Admin MVP milestone Jan 9, 2023
users = users.filter(User.auth_method == auth_method)

if ge:
users = users.filter(User.display_name >= ge)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the display name of a User? I think that is a string, so how is comparing a string greater than ge?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is indeed possible and intended here: https://www.educba.com/postgresql-compare-strings/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strings can be compared in alphabetical order and PostgreSQL supports string comparison operators. https://stackoverflow.com/questions/40601324/comparing-strings-in-postgres-using-comparison-operators

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are comparing alphabetically? The display_name creation does follow an order when created?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are comparing alphabetically. I'm not sure what you mean by following an order when created

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I do not understand which could be the use case in which we might want to compare alphabetically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also trying to figure it out in the main thread :)

Copy link
Collaborator

@yk yk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR, I left a few comments.

Regarding your questions:

  • yes, for now I'd also accept an api_id like the other method. It's kinda ugly but only trusted frontends can submit one anyway
  • the endpoints returning a list of users is fine
  • I'm really not a fan of the username string comparisons here. as far as I can tell, we don't even have an index to support a prefix query here (although maybe the composite index would do the work). I'd be much more in favor of a classic limit and offset pagination, instead of the lt, gt, and max_count. @olliestanley is there a particular reason we're doing the prefix queries?

@@ -39,3 +39,7 @@ def prepare_tree(tree: list[Message], tree_id: UUID) -> protocol.MessageTree:
tree_messages.append(prepare_message(message))

return protocol.MessageTree(id=tree_id, messages=tree_messages)


def prepare_user(u: User) -> protocol.User:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is well-named, I'd go with something descriptive like db_user_to_protocol_user or you could add a @staticmethod to the protocol user class like from_db_user to construct one from a db user. Or a to_protocol_user method on the db user class that transforms the db user into a protocol user. I think I'd go for the last option, since the second would create a back-dependency from shared to backend, and doesn't result in a separate extra lonely function like here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if lt:
users = users.filter(User.display_name < lt)

users = users.order_by(User.display_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the query planner will probably take care of this, but could we do the sorting before the lt and gt comparisons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 82 to 83
if ge:
users = users.filter(User.display_name >= ge)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter should really be called gte in this case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@olliestanley
Copy link
Collaborator

  • I'm really not a fan of the username string comparisons here. as far as I can tell, we don't even have an index to support a prefix query here (although maybe the composite index would do the work). I'd be much more in favor of a classic limit and offset pagination, instead of the lt, gt, and max_count. @olliestanley is there a particular reason we're doing the prefix queries?

@andreaskoepf would have to confirm on the reasoning for this, but I assumed it was for cases like wanting to lookup a user by username without knowing auth_method or api_client_id, so you could use these criteria to list all users with a certain username regardless of auth_method, api_client_id

@yk
Copy link
Collaborator

yk commented Jan 10, 2023

@andreaskoepf would have to confirm on the reasoning for this, but I assumed it was for cases like wanting to lookup a user by username without knowing auth_method or api_client_id, so you could use these criteria to list all users with a certain username regardless of auth_method, api_client_id

but then let's add a prefix filter (w/ corresponding index). probably the majority of use cases of the endpoint will be listing and paging, and it seems quite cumbersome to do that with the current implementation, especially when we impose some upper limit on the number of users returned (which we have to).

@Vechtomov
Copy link
Collaborator Author

Hi, @yk. Thanks for the good review. I've made changes but I still don't understand why we need to pass api_client_id for untrusted clients. We pass api_key with the request and it uniquely defines api_client.

@yk
Copy link
Collaborator

yk commented Jan 10, 2023

Hi, @yk. Thanks for the good review. I've made changes but I still don't understand why we need to pass api_client_id for untrusted clients. We pass api_key with the request and it uniquely defines api_client.

I think in your question, you referred to another endpoint where this is done. The reasoning there was the following: a trusted api key should be able to also see users of other keys, and would pass those keys along, i.e. saying "give me users of this other key". I honestly don't see a super good reason why we might do this, so I'd be comfortable leaving that away and just saying trusted frontends can just read globally.

@andreaskoepf
Copy link
Collaborator

andreaskoepf commented Jan 12, 2023

I am bit irritated by the paging discussion here.

I deliberately asked to select by interval-starting point + limit. Naive paging with offset and limit is not scalable. Selecting result elements BY OFFSET (!) means in many cases the db-engine runs a linear search over the results until it reaches the indices of the index-window to return. For larger datasets and high offsets this has very bad performance characteristics.

Regarding indices: BTREE indices store strings ordered (based on the configured collation) and are always used string comparisions == <= < > >= etc are used .. query engine will also use them for LIKE "prefix%" conditions.

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

Copy link
Collaborator

@andreaskoepf andreaskoepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably be protocol.FrontEndUser instead of protocol.User .. but I can do the fix quickly. Will merge now.. I think the paging-strategy has to be discussed. To me naive paging is the noob option .. but you guys all seem to love it. We should just add naive index paging everywhere and we will see what happens.

@andreaskoepf andreaskoepf merged commit 0d646e7 into LAION-AI:main Jan 12, 2023
@Vechtomov Vechtomov deleted the frontend-users-list branch January 13, 2023 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Endpoint to list frontend users
6 participants