You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current set of CSV files generated for benchmark runs is meant to be readable rather than small. There is a lot of redundancy, for example, in the metrics CSV the benchmark name is repeated over and over. This is not an issue when writing the tests and doing test runs, but when it's uploaded to GCloud, it can add up to a lot. Furthermore, with the increasing usage of query snippets that download data for queries, transfer size can be an impediment.
Rather than turn the Benchmark results into a database with references to keep data smaller, save the CSV files as csv.gzip when doing command line runs. A new property to turn this on and off may or may not be necessary. When the test is run in an IDE, don't compress. When it's run from the command line, compress.
Use Java to compress the files, or just run gzip in the Github workflow?
Compress existing files stored in GCloud bucket?
Can file dates be preserverd?
Update Demo rsync workflow step not to use the -J options for compression
An added bonus is eliminating the annoying gsutils message about how it's more efficient to compress the files in GCloud than to do it on-the-fly for every download
NOTES
read_csv does not handle downloading http(s) files that are compressed
In fact if you specify a local file as "file:///data/file.csv.gz", it won't handle that either
If you specify the local file as "/data/file.csv.gz", it does
On the demo server, reading csv.gz vs csv does not improve demo performance
The text was updated successfully, but these errors were encountered:
The current set of CSV files generated for benchmark runs is meant to be readable rather than small. There is a lot of redundancy, for example, in the metrics CSV the benchmark name is repeated over and over. This is not an issue when writing the tests and doing test runs, but when it's uploaded to GCloud, it can add up to a lot. Furthermore, with the increasing usage of query snippets that download data for queries, transfer size can be an impediment.
Rather than turn the Benchmark results into a database with references to keep data smaller, save the CSV files as csv.gzip when doing command line runs. A new property to turn this on and off may or may not be necessary. When the test is run in an IDE, don't compress. When it's run from the command line, compress.
NOTES
The text was updated successfully, but these errors were encountered: