Gzip CSV for Benchmark Cmd Line Results #177

stanbrub · 2023-10-04T20:36:46Z

The current set of CSV files generated for benchmark runs is meant to be readable rather than small. There is a lot of redundancy, for example, in the metrics CSV the benchmark name is repeated over and over. This is not an issue when writing the tests and doing test runs, but when it's uploaded to GCloud, it can add up to a lot. Furthermore, with the increasing usage of query snippets that download data for queries, transfer size can be an impediment.

Rather than turn the Benchmark results into a database with references to keep data smaller, save the CSV files as csv.gzip when doing command line runs. A new property to turn this on and off may or may not be necessary. When the test is run in an IDE, don't compress. When it's run from the command line, compress.

Use Java to compress the files, or just run gzip in the Github workflow?
Compress existing files stored in GCloud bucket?
- Can file dates be preserverd?
Update Demo rsync workflow step not to use the -J options for compression
- An added bonus is eliminating the annoying gsutils message about how it's more efficient to compress the files in GCloud than to do it on-the-fly for every download

NOTES

read_csv does not handle downloading http(s) files that are compressed
- In fact if you specify a local file as "file:///data/file.csv.gz", it won't handle that either
- If you specify the local file as "/data/file.csv.gz", it does
On the demo server, reading csv.gz vs csv does not improve demo performance

stanbrub · 2023-11-06T17:47:10Z

Wrote a script that uses python asyncio to download GCloud files to a cache.
download.py.txt

stanbrub added the enhancement New feature or request label Oct 4, 2023

stanbrub mentioned this issue Oct 19, 2023

Benchmark Demo Performance Improvements #193

Open

stanbrub self-assigned this Nov 1, 2023

stanbrub linked a pull request Nov 14, 2023 that will close this issue

Support gzipped g cloud benchmark bucket files #218

Merged

stanbrub closed this as completed in #218 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gzip CSV for Benchmark Cmd Line Results #177

Gzip CSV for Benchmark Cmd Line Results #177

stanbrub commented Oct 4, 2023 •

edited

Loading

stanbrub commented Nov 6, 2023

Gzip CSV for Benchmark Cmd Line Results #177

Gzip CSV for Benchmark Cmd Line Results #177

Comments

stanbrub commented Oct 4, 2023 • edited Loading

stanbrub commented Nov 6, 2023

stanbrub commented Oct 4, 2023 •

edited

Loading