Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from zip and use tar+gzip instead #343

Closed
leostera opened this issue Dec 4, 2015 · 9 comments
Closed

Move away from zip and use tar+gzip instead #343

leostera opened this issue Dec 4, 2015 · 9 comments
Assignees
Labels
architecture Organization of the pages per language, platform, etc.

Comments

@leostera
Copy link
Contributor

leostera commented Dec 4, 2015

Just about that. Refactor the build scripts so they use tar instead.

It'd require refactoring the node-client to use node-tar instead of unzip2. But that should be fairly straightforward since they share the same .Extract interface.

@igorshubovych igorshubovych added the page edit Changes to an existing page(s). label Dec 5, 2015
@igorshubovych igorshubovych self-assigned this Dec 5, 2015
@Permafacture
Copy link

While someone is at it, xz (LZMA) compression is preferred to gzip.

@igorshubovych
Copy link
Collaborator

@Ostera do you think we need it?

IMO it does not solve any problem at the moment, but may create potential issues for the clients who rely on it.

@leostera
Copy link
Contributor Author

Ultimately using tar vs zip provides you not only the ability to choose between gzip or bzip2 compression, but also to store UNIX metadata (uid, gid, permissions) – zip will only store MSDOS metadata (hidden, system, other crap).

Then there's another advantage, which is using gzip instead of zip (unless it's possible to use that compression algorithm with zip too).

gzip will compress one big blob (the tarball), which means that any repeated strings in all of the pages, will be compressed more efficiently. So a zip of a 100 identical files will be about 100 times bigger than the same gzip of the same files.

@igorshubovych
Copy link
Collaborator

@Ostera
You are right, it is 63 Kb vs 143 Kb at the moment.
Probably we should do both to let people migrate.

@leostera
Copy link
Contributor Author

Precious kilobytes! I'd introduce gzip in the next release and then drop zip support on the one after that so people have time to migrate.

@igorshubovych
Copy link
Collaborator

They are precious.
Some people are using tldr on mobile clients. And now imagine they update package every week.

@leostera leostera changed the title Move away from zip and use tar instead Move away from zip and use tar+gzip instead Feb 11, 2016
@waldyrious waldyrious added architecture Organization of the pages per language, platform, etc. and removed page edit Changes to an existing page(s). labels Aug 31, 2016
@agnivade
Copy link
Member

agnivade commented Oct 6, 2016

Closing it as discussed on tldr-pages/tldr-node-client#9. TLDR: Clients should prefer git over manually downloading .zip archives.

@pepa65
Copy link
Contributor

pepa65 commented Mar 12, 2017

Since we're talking about roughly 100kB for the whole archive, smaller than most current webpages, I think downloading the whole archive shouldn't be such a problem every once in a while.

@waldyrious
Copy link
Member

waldyrious commented Apr 25, 2017

@pepa65 while that approach may work now, IMO it's neither scalable nor elegant. But in any case, this issue was created back when clients (especially the node one) were tightly coupled to the tldr-pages repo, and we want to move away from that, by providing a spec that any client can follow.

So, as long as the clients follow the spec's recommendations, they could still use the full zip download approach, since the archives (tldr.zip and index.json) are currently still being generated upon every commit to this repo. We won't invest time changing the format of the archive as suggested on this issue, since we don't recommend that method of updating clients' local cache of pages, and we won't commit to support the archive generation indefinitely (e.g. if some part of the pipeline breaks), but we won't go out of our way to deliberately curtail that service, either -- not without previous discussion with the clients' authors, at least :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Organization of the pages per language, platform, etc.
Projects
None yet
Development

No branches or pull requests

6 participants