Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect JSON format #12

Closed
bishopZ opened this issue Mar 30, 2016 · 3 comments
Closed

incorrect JSON format #12

bishopZ opened this issue Mar 30, 2016 · 3 comments

Comments

@bishopZ
Copy link

bishopZ commented Mar 30, 2016

From the README
"A lot of event logs contain JSON objects nowadays (e.g. GitHub Archive). pgfutter expects each line to have a valid JSON object. Importing JSON is only supported for Postgres 9.3 and Postgres 9.4 due to the JSON type.

Create friends.json.

{"name": "Jacob", "age": 26, "friends": ["Anthony"]}
{"name": "Anthony", "age": 25, "friends": []}
{"name": "Emma", "age": 28, "friends": ["Jacob", "Anthony"]}"

This is not valid JSON. Please see http://json.org/ to see what I mean.

Any plans to support properly formatted JSON?

@lukasmartinelli
Copy link
Owner

This is called JSONLines http://jsonlines.org/. Each line contains a valid JSON object.
The reason why this format is very good is that dealing with a 30GB big JSON file would be close to impossible.

What is invalid about that JSON object?

{"name": "Jacob", "age": 26, "friends": ["Anthony"]}

At the bottom of the README you find how to import a single JSON document.
https://github.com/lukasmartinelli/pgfutter#import-single-json-object

@bishopZ
Copy link
Author

bishopZ commented Mar 30, 2016

Any ideas on how to create a JSONLines document from a valid JSON array of objects?

I guess I could manually remove the comma from the 10,000 rows I want to import.

@lukasmartinelli
Copy link
Owner

I guess I could manually remove the comma from the 10,000 rows I want to import.

We had the same problem in #9.

You have a JSON object with 10k rows? That might still fit into memory if you try pgfutter jsonobj document.json. You will then get a single object in the database and using PostgreSQL JSON operators you can then put it into a relational schema.

Or you can use jq to pull it out.

Given this JSON file.

{
  "results": [
    {
      "text": "@twitterapi  http://tinyurl.com/ctrefg",
      "to_user_id": 396524,
      "to_user": "TwitterAPI",
      "from_user": "jkoum"
    },
    {
      "text": "@twitterapi  http://tinyurl.com/jubu",
      "to_user_id": 40314,
      "to_user": "TwitterAPI",
      "from_user": "jubu"
    }
  ]
}

You can filter out the results array as JSONLines like this.

cat test.json | jq -c '.results[]'

@bishopZ bishopZ closed this as completed Mar 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants