-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type annotations to constructor parameters in Literal #1498
Add type annotations to constructor parameters in Literal #1498
Conversation
With the caveat that ... for the last few years, I've been working in C++ and I have only recently returned to Python which means that I haven't had any experience of using type annotations but the rationale of "opening up Python code to easier static analysis and refactoring" as described in PEP484 seems completely appropriate for a library and for this change ... your decision looks sound to me, fwiw.
Your call, you're in driving seat.
Ew. Sometimes it's more work negotiating the contrib process than it is to make the fix. Cheers Graham |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your change looks good but more is needed to make it pass type validation, I made a PR against your branch to fix it.
Same here, though I've been working with Kotlin instead of Python for the last few years. I haven't had any experience with type annotations either, but was excited when I heard about it.
Cool, yeah it seems like going with the type annotations is the way to go then.
Definitely. |
rdflib/term.py
Outdated
@@ -548,7 +556,7 @@ def __new__(cls, lexical_or_value, lang=None, datatype=None, normalize=None): | |||
"per http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal" | |||
) | |||
|
|||
if lang and not _is_valid_langtag(lang): | |||
if lang is not None and not _is_valid_langtag(lang): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, though would like @gjhiggins to also sign off on it as this is what was in #1494.
The reason this makes sense to me is that _is_valid_langtag
should be true for any lang tag, and if it is not true for a lang tag then the literal should not be constructed.
With the way it was if lang and not _is_valid_langtag(lang):
there is range of values for lang
for which _is_valid_langtag
would be false, but for which Literal objects would be constructed - and those literal objects would then have invalid language tags.
But I also think the typing here is very useful to prevent any confusion.
This should also ideally only be merged with #1497 - and we could do with a couple of more tests for this, some ideas:
I would be happy to add them myself afterwards though. |
Regarding |
Basically it should be (and with your changes is I think) impossible to construct a Literal that does not have a valid language tag. |
Yep, I agree that it will fail with the current implementation of the code, but I guess my more general question is whether we should be including the actual unit test for this at all? My objection to including the unit test is (as I mentioned) the (potential) large increase in unit tests and argument verification code if we were to apply this same logic and write unit tests passing in random objects to functions that have defined the types they support in their type annotations. Also, having this test means that if we refactor this function, we can't assume that |
I would say yes we should, because semantically there will never be a case where a Literal should construct if it is supplied with an empty dict as lang tag, and that the appropriate exception for this would always be ValueError. Refactoring may change how this exception occurs, but it should always occur when the language tag is not valid, and it will never be valid if it is not a string that is at least 1 character long and complies with the other requirements detailed in the rdf spec.
The right way to test this would be with parameterized tests, I will submit another PR against your branch with #1497 cherry picked in and with parameterized tests added to show you what I mean. Also I think covering some important edge cases would be best, to ensure that we don't get regressions and especially that the same regression this is fixing does not prop up again.
I don't think there is any reason to do runtime type checks, merely by the consequence of checking _is_valid_langtag will already exclude all values that are not strings. Maybe the exact error message could use some tuning there, but already throwing a ValueError (which will be the case) will be a massive improvement over the current behaviour. |
@veyndan I made another PR against your branch: https://github.com/veyndan/rdflib/pull/2 This cherry picks the changes from #1497 into this branch and then adds additional tests using pytest parametrize. In general it is nice to have PRs as small as possible but #1497 and this PR is very linked and I think both are fairly safe and uncontroversial. PS: Thanks for your contribution and patience. |
I understand that
Parameterized tests don't really change the number of tests that are run though, nor do they eliminate the extra code required to write these tests. Granted the tests would be shorter, but they're still something that need to be maintained, and the presence of them in any format should be considered. Just to make sure I fully understand your position, here's what both our positions are (I believe — and please correct me if I'm wrong):
Thanks for the PR, but I'll just hold off on reviewing it until we've resolved our discussion. Also, #1497 has been merged to master now. |
The answer to this question is not really very straight forward. In brief, I think having something like this would be wrong: if not isinstance(lang, str):
raise TypeError(f"Type of lang {lang!r} is invalid") And actually if you did this, mypy would error out, because However in this case (state of https://github.com/veyndan/rdflib/pull/2) Now as to the tests, there is a well defined value space for the language tag: it must be a string that conforms to the RDF spec for lang tag, and this value space is something that has to be checked outside of the type system as it is not expressible in the type system, and if this check does not pass an error should be raised, and we know this check should never pass for any value in this array, because they do not lie within the value space: So to me, it is reasonable to test that a value of |
I don't believe that's true based on the documentation where it says the
I definitely agree that we should raise a With the raising of Are you aware of such a feature of mypy that would cause the following to cause an error in mypy?
|
I think the best way to resolve this is to start from where we agree:
If To me, I do expect I think there is maybe some argument for providing a way to disable all value space validation for cases where someone wants best effort processing of broken RDF, but given value space validation is done, and it is always done, I think it is very reasonable to always expect an error when the provided value falls outside of the value space, and Consider this code: from typing import Any, Optional, Type
import pytest
class EvenNumber:
value: int
def __init__(self, value: int) -> None:
if value % 2 != 0:
raise ValueError(f"{value} is not divisible by 2")
@pytest.mark.parametrize(
"value, error",
[
(2, None),
(1, ValueError),
({}, TypeError),
([], TypeError),
("not a number", TypeError),
],
)
def test_value_space(value: Any, error: Optional[Type[Exception]]) -> None:
if error is not None:
with pytest.raises(error):
EvenNumber(value)
else:
EvenNumber(value) This generates the following output: $ poetry run pytest tests/test_valspace.py
============================================================================ test session starts ============================================================================
platform linux -- Python 3.9.9, pytest-6.2.5, py-1.10.0, pluggy-0.13.1
rootdir: /home/iwana/sw/d/gitlab.com/aucampia/pvt/scratchpad/tech/py3, configfile: pyproject.toml
plugins: subtests-0.5.0, cov-2.12.1
collected 5 items
tests/test_valspace.py::test_value_space[2-None] PASSED [ 20%]
tests/test_valspace.py::test_value_space[1-ValueError] PASSED [ 40%]
tests/test_valspace.py::test_value_space[value2-TypeError] PASSED [ 60%]
tests/test_valspace.py::test_value_space[value3-TypeError] PASSED [ 80%]
tests/test_valspace.py::test_value_space[not a number-TypeError] PASSED [100%]
============================================================================= 5 passed in 0.02s ============================================================================= I did no explicit type checks here, but I still get
Today I learn, I guess this behaviour makes some sense, regardless though, I still do not think that the following is needed: if not isinstance(lang, str):
raise TypeError(f"Type of lang {lang!r} is invalid") And again I mainly don't think it is needed because a type error will already be raised if lang is not a string with your code as is.
I think maybe pyright can do it, but I am not sure that really is not that big a problem, anyway not at the moment. And I think it is also good to consider that a lot of rdflib users don't use mypy, and this may remain the case for a long time, because rdflib is used a lot in academic and scientific settings. |
Thanks for the write-up! I think you raised some really good points, but I think I'll still need to mull it over a bit more. In order to not hold up these PRs any more than they already have, I'll add the proposed tests (thank you for all the PRs you've made against my fork btw!) |
_value needs an explicit type and also added types for: - `lexical_or_value` - `datatype` - `normalize`
I think this maybe need a rebasing from master to get cleaner diff, not sure why it shows tests being added in diff (master...veyndan:literal-lang-TypeAnnotation) from master if they are already there: Lines 48 to 72 in eaa353f
|
Rebased onto master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks
@gjhiggins From your comment here #1494 (comment), I opted to go for type annotations instead of explicit exceptions. I guess the difference between cleaner code by just having type annotations with no explicit runtime checks and having more code with explicit runtime checks is a subjective one, so I'd be interested to hear what you think. Again, just pulled this out into a separate PR as I feel it's out of the scope of the other PR.
Also, the base branch for this should be #1494, but I'm not sure how to do that in GitHub when both the branches are upstream.