-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pydantic curation model and improve curation format and merging rules #3760
base: main
Are you sure you want to change the base?
Conversation
sparsity_overlap: float = 0.75, | ||
new_id_strategy: str = "append", | ||
return_new_unit_ids: bool = False, | ||
format: str = "memory", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One general question that will matter for typing going forward. We have been moving toward doing:
"append" | "new"
for typing but type analysis programs don't like this so I assume pydantic won't either. str
however is not accurate either because it doesn't expect any string, but specific strings. So in this case should we move the library over to
Literal['append' | 'new']
I forget the actual argument so 'new' was me just making something up for example.
Or does pydantic only accept str
and doesn't accept Literal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pydantic only accepts Literal
.
Why ""append" | "new" for typing but type analysis programs don't like this"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. On vscode I only get a warning saying that the type "append" | "new" are not defined. And others (I think Heberto) have commented about why not use Literal['append' | 'new'] so maybe he is seeing the typing warning too. I just want to make sure we fit in the pydantic model but also be useful to the end user. Saying str
is not useful to the end-user that uses type hints because it is actually a Literal. I think adding Literal clutters stuff, but if we are now relying on a tool that expects Literal then we have to use it and we should move the whole code base in that direction for consistency.
I think the static type analysis programs think that "append" should be a type because we are not specifying it is a literal. So although python allows it, I think static type checkers don't know what to do with it. It is a little similar to the Optional
, optional
debate in type hinting in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think using "append" | "new"
is not supported...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly what I'm saying!
I prefer it, but it is not supported. So we need to switch! I don't want us to switch to str
I want us to switch to Literal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! That makes sense to me :)
…e into curation-pydantic
This PR goes in the direction of adding more structure to the curation format.
By defining a Pydantic model, we can add proper description, types, and validation strategies for the curation.
This will make it easier to validate and adopt by third party software