Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregate query for datasets #193

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stuartmcalpine
Copy link
Collaborator

Add the aggregate_datasets function to the Query class.

Allow for COUNT and SUM queries in the dataset table.

e.g., datareg.Query.aggregate_datasets("nfiles", agg_func="sum") would sum the total number of files in all datasets.

Rules are the same as general queries, by default both production and working and queried and the result joined. If query_mode != "both" then only the query_mode schema is searched,

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good as it stands, but if you have time it could be enhanced somewhat:

  • add support for min, max and avg
  • allow queries on at least some other tables (alias, keyword, maybe dataset_keyword) but only if agg_func is "count"
  • for anything other than "count", check that the column value type is numeric
  • as a convenience for callers, for "count" allow the column argument to be None. The routine should then pick a column or just issue
    select count(*) from the_schema.the_table where...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants