Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing "T" in RFC3339 print? #185

Closed
aaronsteers opened this issue Nov 8, 2024 · 8 comments
Closed

Missing "T" in RFC3339 print? #185

aaronsteers opened this issue Nov 8, 2024 · 8 comments
Labels
question Further information is requested

Comments

@aaronsteers
Copy link

aaronsteers commented Nov 8, 2024

I'm seeing a surprising result from format_rfc3339().

I thought that this would always contain a "T" to delimit the date and time part, but I'm actually seeing space as a delimiter.

$ poetry run python
Python 3.10.12 (main, Aug  1 2023, 18:25:02) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import whenever
>>> now = whenever.Instant.now()
>>> print(now)
2024-11-08T19:41:17.404921Z
>>> now.__str__()
'2024-11-08T19:41:17.404921Z'
>>> now.format_rfc3339()
'2024-11-08 19:41:17.404921Z'

The Instant class seems to correctly use the "T" delimiter in the implicit __str__() operation, but calling format_rfc3339() gives me the space delimiter instead - which I do not want.

Context: I'm hoping to move from Pendulum to whenever, and trying to ensure I have deterministic string renderings when I serialize these in a jsonified output format.

@ariebovenberg
Copy link
Owner

ariebovenberg commented Nov 8, 2024

Hi @aaronsteers , thanks for posting.

RFC3339 has a "T" by default, but allows other characters—space being the most common option.

NOTE: ISO 8601 defines date and time separated by "T".
Applications using this syntax may choose, for the sake of
readability, to specify a full-date and full-time separated by
(say) a space character.

(source: RFC3339 spec)

Over time, "RFC3339" has become almost synonymous with this "readable" alternative to ISO8601. For this reason, I decided to use a space character since the format_common_iso() method also exists and does have a "T" separator. Would that work for you?

>>> Instant.now().format_common_iso()
'2024-11-08T20:59:45.850463Z'

I've thought of adding a sep= argument to format_rfc3339(), but it's not high prio since format_common_iso() should also work.

PS: there is one subtler difference between ISO8601 and RFC3339: ISO8601 can represent UTC offsets at second-level precision, i.e. 2020-01-01T00:00:00+00:00:01 is valid ISO8601 but not RFC3339. Unless you're dealing with pre-1950s timezones, you shouldn't have to worry about this though.

edit: punctuation

@ariebovenberg ariebovenberg added the question Further information is requested label Nov 8, 2024
@ariebovenberg
Copy link
Owner

Context: I'm hoping to move from Pendulum to whenever, and trying to ensure I have deterministic string renderings when I serialize these in a jsonified output format.

My aim is definitely to provide 'least surprise' and deterministic output. Taking a look at the docs of format_rfc3339(), I see it mistakenly mentions the T. I'll have a fix out soon.

@gazpachoking
Copy link

gazpachoking commented Nov 8, 2024

I don't really have a preference one way or the other, but that section of RFC 3339 is quite confusing to me. It seems to mention the space is an allowable separator right after saying that the separator should be 'T'. Section 5.5 also says that this is a subset of ISO8601, and using a space would mean it isn't ISO8601 compliant anymore...

EDIT: I may have been wrong about the T being required in ISO8601, but Appendix A of RFC3339 seems to indicate that the T is required in that specification:

ISO 8601 states that the "T" may be omitted under some circumstances. This grammar requires the "T" to avoid ambiguity.

@ariebovenberg
Copy link
Owner

ariebovenberg commented Nov 8, 2024

@gazpachoking 100% agree that the whole situation is unfortunate. The phrasing in RFC3339 is terrible. It even implies any character is fine!

Since the RFC itself is a bit ambiguous, I decided to implement what most devs expect from RFC3339 and what most datetime libraries do (i.e. space, T, or underscore)

edit: missing words

@ariebovenberg
Copy link
Owner

The latest 0.6.12 release just now includes relevant improvements to the docs.

@aaronsteers
Copy link
Author

aaronsteers commented Nov 8, 2024

@ariebovenberg - Thanks for the quick and thorough reply. The docs update is very welcome, thanks for that!

I like your proposal to use iso option and I think that will meet my use case. When considering the output (and not the parsing requirements), the ISO 8601 output seems to also be compliant with RFC3339, but with the "T" delimiter guaranteed.

What I struggle with is that ISO (as you mention) has so many variants, I worry that it is not super helpful as a promised output format that will be 100% machine readable by all consumers. For my own documentation internally, is it safe to say that for most use cases (esp those derived from Instant) that our ISO output is also compliant with the "T" variant of RFC339?

@ariebovenberg
Copy link
Owner

ariebovenberg commented Nov 9, 2024

is it safe to say that for most use cases (esp those derived from Instant) that our ISO output is also compliant with the "T" variant of RFC339?

Yes. The format YYYY-MM-DDThh:mm:ssZZ is RFC3339 compliant (except in the rare case the offset has second-level precision), and it is explicitly documented in the format_common_iso() docstring, which makes it part of the stable API. If the format of format_common_iso() ever changes (I highly doubt it), it will be considered breaking and get an appropriate release and mention in the changelog.


I worry that it is not super helpful as a promised output format [..]

I can relate. What's frustrating is that it was RFC3339's explicit goal to reduce this variability, but I can't understand why they stopped short of doing just that. Also fun fact: they allow T and Z to be lowercase too! So 2020-01-01t00:00:00z is also valid—maddening!

In the future, there will be a .format() method so you can explicitly say something like: YYYY-MM-DDThh:mm:ssZZ.

edit: grammar

@aaronsteers
Copy link
Author

aaronsteers commented Nov 9, 2024

@ariebovenberg - Appreciate the thoughtful replies. Frustration here is admittedly more with the community and standards - not an issue with this library.

As a side note, Pendulum (apparently) changed their default __str__() logic from "RFC3339 with a 'T'" to "RFC3339 with a space" in the 3.0 release - which was a major reason we decided not to upgrade to it. In a large software package, there are two many places where a datetime-like object might be implicitly cast to a string: during json serialization of the object itself and also after it is added to a dictionary object and the dictionary itself is the thing being serialized.

So as to not leave any room for accidentally using the wrong string format, I was considering subclassing Whenever's core classes to provide an explicit __str__() behavior matching the RFC3339-T layout. In testing so far, it looks like this probably won't be necessary - since this is what we get from format_common_iso() and since that I think is the default behavior for __str__() anyway. We'll have unit tests in our package to detect any changed behavior from version-to-version, but anyway, just wanted to share my thoughts on our general approach to getting the "best" deterministic output all the time - at least best for our use case.

Closing this issue as resolved - thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants