Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why NDJSON instead of JSON? #9068

Open
alexandernst opened this issue May 31, 2024 · 3 comments
Open

Why NDJSON instead of JSON? #9068

alexandernst opened this issue May 31, 2024 · 3 comments

Comments

@alexandernst
Copy link

yarn info (and others?) output NDJSON instead of JSON and I can't find a reason why you'd have picked NDJSON instead of JSON.

JSON is already very easily parseable with jq and the size increase in the output is negligible compared to NDJSON. Maybe switch to JSON and make it easier to parse yarn's output?

@Daniel15
Copy link
Member

Daniel15 commented Jul 15, 2024

jq should handle newline-delimited JSON fine.

Are you talking about Yarn 1.x or 4.x? This repo is for 1.x, which is frozen and not getting updates.

@alexandernst
Copy link
Author

Sure, jq can handle NDJSON, but why pick NDJSON instead of plain JSON in the first place? It just seems a weird decision given the fact that NDJSON is not that common and, albeit jq handling it properly, there most probably are ton of other tools that wont handle NDJSON.

I believe the output of this particular command is the same for 1.x and 4.x.

@snydergd
Copy link

When there is a large amount of data, I've found that it can be a challenge to efficiently handle the JSON file in code, because the choices I've seen are either:

  1. Load the entire file into memory as an object - example json.parse(f) in python. I probably have enough memory to do it a few times at least, but still doesn't scale very nicely. I also can't start processing the data until the file has finished writing.
  2. Do some sort of event-based JSON parsing similar to what SAX/STAX are to XML in Java, which from my experience tends to be a lot more cumbersome.

NDJSON (or JSON Lines, as I think is now the new name), solves this problem by allowing me to easily take in a single object at a time by reading lines in the file. Parsing/splitting lines has been trivial in any development environment/stack that I've worked with.

Am trying to think of a scenario where you would be coordinating the run of yarn info and feeding of the data into another tool, where you wouldn't have the ability to do the transformation into a JSON array like you are talking about. It would be dead-simple to do it with a node.js script, or you could use JQ, or you could even do it with something as primitive as bash scripting.

I would love to discuss the scenarios where NDJSON is proving problematic for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants