Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement eland.DataFrame.to_json #661

Merged
merged 15 commits into from
Feb 15, 2024
Merged

Conversation

bartbroere
Copy link
Contributor

Dumping an Elastic index was the right solution for me at first. After a while csv did not offer all the guarantees I was looking for, since csv records for example can span multiple lines if any of the values contain line breaks.

For that reason, JSON lines was a more suitable format. Pandas' DataFrame.to_json can generate ".jsonl" files by suppling lines=True, orient='records'. This also lets us reuse the earlier solution that streams output from Elastic and appends it to a file, eliminating the need to be able to fit the entire dataset in memory.

This pull request implements streaming an Elasticsearch index to .jsonl, while falling back to running to_pandas().to_json(...) if streaming is a bit harder to do.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@bartbroere bartbroere marked this pull request as ready for review February 7, 2024 12:06
@bartbroere bartbroere requested a review from pquentin February 7, 2024 21:31
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

@pquentin
Copy link
Member

pquentin commented Feb 8, 2024

buildkite test this please

@bartbroere
Copy link
Contributor Author

Thanks! LGTM.

Nice! I removed the two linting errors. Sorry about that.

@pquentin
Copy link
Member

pquentin commented Feb 8, 2024

buildkite test this please

@pquentin
Copy link
Member

buildkite test this please

@pquentin pquentin merged commit 33cf029 into elastic:main Feb 15, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants