Add aggregate query for datasets #193

stuartmcalpine · 2025-03-14T21:59:03Z

Add the aggregate_datasets function to the Query class.

Allow for COUNT and SUM queries in the dataset table.

e.g., datareg.Query.aggregate_datasets("nfiles", agg_func="sum") would sum the total number of files in all datasets.

Rules are the same as general queries, by default both production and working and queried and the result joined. If query_mode != "both" then only the query_mode schema is searched,

JoanneBogart

This looks good as it stands, but if you have time it could be enhanced somewhat:

add support for min, max and avg
allow queries on at least some other tables (alias, keyword, maybe dataset_keyword) but only if agg_func is "count"
for anything other than "count", check that the column value type is numeric
as a convenience for callers, for "count" allow the column argument to be None. The routine should then pick a column or just issue
select count(*) from the_schema.the_table where...

Add aggregate query for datasets

63e4bcf

stuartmcalpine requested a review from JoanneBogart March 14, 2025 22:00

JoanneBogart reviewed Mar 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aggregate query for datasets #193

Add aggregate query for datasets #193

stuartmcalpine commented Mar 14, 2025

JoanneBogart left a comment •

edited

Loading

Add aggregate query for datasets #193

Are you sure you want to change the base?

Add aggregate query for datasets #193

Conversation

stuartmcalpine commented Mar 14, 2025

JoanneBogart left a comment • edited Loading

Choose a reason for hiding this comment

JoanneBogart left a comment •

edited

Loading