Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #21 #22

Merged
merged 1 commit into from
Aug 20, 2014
Merged

Fix for issue #21 #22

merged 1 commit into from
Aug 20, 2014

Conversation

johnerikhalse
Copy link
Contributor

This change will force dates to always be in the english locale. Applying this code will break one unit test in OpenWayback on machines with non-english locale because the test itself is not locale-safe.

Fixes #21

@nlevitt
Copy link
Member

nlevitt commented Sep 10, 2014

Isn't Locale.ENGLISH still ambiguous as far as date formats? I would think Locale.US would be safer...

@anjackson
Copy link
Member

Given that the underlying issue is the default time zone that is used when parsing date strings, Locale.US would still be ambiguous. The behaviour appears to be deterministic as long as the Locale is fixed, rather of being picked up from the environment. This is presumably because each Locale has a default time zone.

If this assumption is undesirable, I think we can avoid it by explicitly setting the time zone to UTC:

SimpleDateFormat f = new SimpleDateFormat("yyyy-MMM-dd HH:mm:ss");
f.setTimeZone(TimeZone.getTimeZone("UTC"));

@nlevitt
Copy link
Member

nlevitt commented Sep 10, 2014

Are you sure that's the underlying issue? I think timezone and locale are orthogonal system settings, and the code already had df.setTimeZone(TimeZone.getTimeZone("GMT")) before this change. I'm curious what the underlying issue is, how this bug was discovered. I haven't found it explained anywhere.

@anjackson
Copy link
Member

So, when running tests locally or via CI on Travis, we found tests would sometime fail. This appears to be because our test machines are in different locales, and with Travis the locale can be different each time.

As I'm in en_GB, I would usually see errors like this:

Running org.archive.wayback.archivalurl.ArchivalUrlRequestParserTest
Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.239 sec <<< FAILURE!
testPathDatelessWithDateHeader(org.archive.wayback.archivalurl.ArchivalUrlRequestParserTest) Time elapsed: 0.04 sec <<< FAILURE!
junit.framework.AssertionFailedError: expected:<Fri Apr 25 05:15:51 UTC 2014> but was:<Thu Apr 24 21:15:51 UTC 2014>

Other times, Travis might see this:

testResourceRecord(org.archive.wayback.resourcestore.resourcefile.WarcResourceTest): Unparseable date: "mer., 03 sept. 2014 13:32:56 GMT"

It seems the code and the test data have the Locale.US assumption built in, or to but that another way, you'll have to override your locale if you want to reproduce this issue.

I guess this was only discussed in the OpenWayback teleconferences (?). You are, of course, welcome to join those if you wish.

EDIT:

Having gone back to the logs, I guess we probably need to use this to lock everything down properly:

SimpleDateFormat f = new SimpleDateFormat("yyyy-MMM-dd HH:mm:ss", Locale.US);
f.setTimeZone(TimeZone.getTimeZone("UTC"));

i.e. Enforce the Locale and the TZ to ensure both the problems outlined above are resolved.

@nlevitt
Copy link
Member

nlevitt commented Sep 11, 2014

Thanks for explaining Andrew. The first test failure you mention is puzzling, but maybe you were in the midst of trying something funky with the timezone. I agree with your conclusion that the locale and timezone should be specified. Thinking more about it, the current code is probably fine. I don't see anything in SimpleDateFormat that looks like it would vary between e.g. en_US and en_GB. (Some of the formats tha strptime(3) accepts probably would differ, but SimpleDateFormat doesn't have those.) And I think GMT and UTC are equivalent for practical purposes.

@johnerikhalse
Copy link
Contributor Author

For our use of SimpleDateFormat, I agree with Noah that there is no difference between en, en_US and en_GB. Because of this, I chose Locale.ENGLISH because it is not coupled to country. It is the language for formatting day names we are interested in.

For the same reason I think UTC should be preferred over GMT. The difference between them have no practical impact on our use, but UTC is an atomic time scale and not a time zone.

These reasons are more political than functional. Since OpenWayback is an international project, I would prefer choosing the non-geographic variants when there is no practical reason for doing otherwise.

@anjackson
Copy link
Member

Having tests a bit more I can verify that Locale does not appear to affect the TZ. The default TZ is picked up from the environment instead:

        String dateString = "01:00:00:000 Mar 8, 1992";

        SimpleDateFormat f = new SimpleDateFormat("H:mm:ss:SSS MMM d, yyyy");
        System.out.println(f.parse(dateString));

        SimpleDateFormat f_en_UK = new SimpleDateFormat(
                "H:mm:ss:SSS MMM d, yyyy", Locale.UK);
        f_en_UK.setTimeZone(TimeZone.getTimeZone("GMT"));
        System.out.println(f_en_UK.parse(dateString));

        SimpleDateFormat f_en_US = new SimpleDateFormat(
                "H:mm:ss:SSS MMM d, yyyy", Locale.US);
        f_en_US.setTimeZone(TimeZone.getTimeZone("EST"));
        System.out.println(f_en_US.parse(dateString));

...giving...

Sun Mar 08 01:00:00 GMT 1992
Sun Mar 08 01:00:00 GMT 1992
Sun Mar 08 06:00:00 GMT 1992

So, to ensure dates are always in English and UTC, we need to set both the locale and the TZ.

@johnerikhalse johnerikhalse deleted the ForceEnglishLocale branch September 23, 2014 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All dates should be independent of locale settings
3 participants