Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pydantic curation model and improve curation format and merging rules #3760

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alejoe91
Copy link
Member

This PR goes in the direction of adding more structure to the curation format.

By defining a Pydantic model, we can add proper description, types, and validation strategies for the curation.
This will make it easier to validate and adopt by third party software

@alejoe91 alejoe91 added the curation Related to curation module label Mar 11, 2025
sparsity_overlap: float = 0.75,
new_id_strategy: str = "append",
return_new_unit_ids: bool = False,
format: str = "memory",
Copy link
Collaborator

@zm711 zm711 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general question that will matter for typing going forward. We have been moving toward doing:

"append" | "new" for typing but type analysis programs don't like this so I assume pydantic won't either. str however is not accurate either because it doesn't expect any string, but specific strings. So in this case should we move the library over to
Literal['append' | 'new']
I forget the actual argument so 'new' was me just making something up for example.

Or does pydantic only accept str and doesn't accept Literal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pydantic only accepts Literal.

Why ""append" | "new" for typing but type analysis programs don't like this"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. On vscode I only get a warning saying that the type "append" | "new" are not defined. And others (I think Heberto) have commented about why not use Literal['append' | 'new'] so maybe he is seeing the typing warning too. I just want to make sure we fit in the pydantic model but also be useful to the end user. Saying str is not useful to the end-user that uses type hints because it is actually a Literal. I think adding Literal clutters stuff, but if we are now relying on a tool that expects Literal then we have to use it and we should move the whole code base in that direction for consistency.

I think the static type analysis programs think that "append" should be a type because we are not specifying it is a literal. So although python allows it, I think static type checkers don't know what to do with it. It is a little similar to the Optional, optional debate in type hinting in python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think using "append" | "new" is not supported...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly what I'm saying!

I prefer it, but it is not supported. So we need to switch! I don't want us to switch to str I want us to switch to Literal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! That makes sense to me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
curation Related to curation module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants