Overall Fastest/Slowest #3

Hypercubed · 2023-12-23T21:31:34Z

First, thank you for your work on this project!

Do you think it would be work while adding an overall result to the benchmark run? Something like "Fastest is"/"Slowest is". Maybe simplest would be to do this per file... otherwise a suite method like:

suite('name', () => {
  bench("a", () => {
      blackbox(xxx);
  });
  
  bench("b", () => {
      blackbox(xxx);
  });
});

Willing to work on a PR if that's a feature you're interested in.

The text was updated successfully, but these errors were encountered:

romdotdog · 2023-12-23T21:44:03Z

Hi. I'll try my hand at implementing this. I'll let you know if it somehow falls though; at a minimum I'd be happy to review a PR.

Hypercubed · 2023-12-24T16:43:45Z

Let me know how it goes.... FYI this is output from hyperfine:

Summary
  ./build/hello-zig ran
    1.37 ± 2.28 times faster than ./build/hello-rust
    2.15 ± 3.64 times faster than bc ./bc/hello.bc
    2.42 ± 3.95 times faster than ./build/hello-ghc
    2.42 ± 4.40 times faster than ./build/hello-go
    2.53 ± 4.03 times faster than ./build/hello-cpp
    7.38 ± 9.54 times faster than ./build/hello-dart.exe
   28.47 ± 36.37 times faster than python3 ./python/hello.py
   34.53 ± 44.83 times faster than bun ./js/hello.js
   42.28 ± 52.85 times faster than bun ./ts/hello.ts
   65.02 ± 83.96 times faster than node ./js/hello.js
   66.50 ± 85.45 times faster than ./build/hello-deno
   68.43 ± 88.83 times faster than deno run ./js/hello.js
   73.08 ± 94.20 times faster than deno run ./ts/hello.ts
  146.76 ± 187.52 times faster than ruby ./ruby/hello.rb
  268.15 ± 324.65 times faster than racket ./racket/hello.rkt
  279.57 ± 354.98 times faster than ./build/hello-racket
  589.13 ± 765.03 times faster than tsx ./ts/hello.ts
  966.16 ± 1212.96 times faster than elixir ./elixir/hello.exs

romdotdog · 2024-01-08T22:10:05Z

I have finished. It is now live as v3, and I included a short note about it in the README.md. It was a bit tedious, but I got it done using your syntax.

Here's an example of the output:

I hope this resolves your issue. If you need anything else, let me know.

Hypercubed · 2024-01-09T16:57:57Z

Looks great! I'll give it a try soon!

romdotdog · 2024-01-10T00:30:55Z

It came to me on a whim that I forgot to clear the existing benches if you decide to write multiple suites, so I pushed a patch as v3.0.1, if you had already updated as-tral.

Hypercubed · 2024-01-10T06:01:41Z

Looks really good... question about the output. How are the tests ordered. It looks like the "Relative to XXX" test is the first test and the rest of the list are in "decending" order (slow -> fast)? Is that right? Showing only the % change can be ambigious. The output from hyperfine, BTW, is relative to the fastest with the rest in "ascending" order (fast -> slow).

Hypercubed · 2024-01-10T06:07:55Z

Here is another tool's (benchmarkjs) output:

    InstanceOf x 44,909,045 ops/sec ±288.80% (2 runs sampled)
    Array.isArray x 20,286,249 ops/sec ±89.81% (2 runs sampled)
    Object.prototype.toString.call x 1,268,658 ops/sec ±116.03% (2 runs sampled)

*Fastest is __InstanceOf__*

romdotdog · 2024-01-10T15:34:11Z

Why would only showing the delta be ambiguous? The delta is relative to the first test; it's not in any way related to change from last test.

Hypercubed · 2024-01-10T16:22:05Z

It's kind of pedantic and we (user) can usually figure it out but:

Hyperfine shows "XX times faster"... this is clearly "higher is faster". With a +/-XX% it's not entirely clear if positive is faster or slower. In your case a higher number positive means slower, a higher negitive number means faster. Even typing that I'm not 100% clear.

Here is what I am seeing:

Suite add-small finished
Relative to MpZ#add
BigInt#add              delta: [+156.54% +159.73% +163.01%] (p = 0.00 < 0.05)
MpZ#__uadd              delta: [-21.084% -19.914% -18.836%] (p = 0.00 < 0.05)
MpZ#_uaddU32            delta: [-24.498% -23.379% -22.237%] (p = 0.00 < 0.05)

At a glance I'm not sure which is fastest.

This is all a little pedantic but I think if the list was presented as relative to the fastest... then all numbers become positive (slower) and the results would be more clear IMO.

romdotdog · 2024-01-11T00:29:57Z

I reversed the order and pushed it as v3.0.2.

Hypercubed · 2024-01-11T01:48:07Z

I don't think it is so much about the order... but what it is relative to. It should start with the fastest on top and all the other relative to that one (ideally fast -> slow).

In my output:

Suite add-small finished
Relative to MpZ#add
BigInt#add              delta: [+156.54% +159.73% +163.01%] (p = 0.00 < 0.05)
MpZ#__uadd              delta: [-21.084% -19.914% -18.836%] (p = 0.00 < 0.05)
MpZ#_uaddU32            delta: [-24.498% -23.379% -22.237%] (p = 0.00 < 0.05)

You see that BigInt#add is slower but MpZ#__uadd is faster than the baseline MpZ#add.

romdotdog · 2024-01-11T03:31:00Z

This is technically difficult, actually. I'll see what I can do in terms of freeing up time on my schedule.

Hypercubed · 2024-01-11T15:58:10Z

I understand.... just a very nice to have IMO. Thank you for your work!

Hypercubed · 2024-01-13T18:20:26Z

Maybe I'm completly misunderstanding these results. I have three tests in a suite. I've manually ordered them from fastest to slowest (based on my expectation before running). The three tests have the following times:

MpZ#__uadd              time: [286.26ns 332.89ns 388.27ns]
MpZ#add                 time: [321.01ns 365.25ns 416.48ns]
BigInt#add              time: [852.7ns 953.44ns 1076.5ns]

Based on these times MpZ#__uadd is the fastest (as expected) followed by MpZ#add (~1.1x slower) and BigInt#add (~3x slower). But here is the summary:

Suite add large finished
Relative to MpZ#__uadd
BigInt#add              delta: [+194.48% +236.46% +285.51%] (p = 0.00 < 0.05)
MpZ#add                 delta: [-4.0351% +9.0659% +25.553%] (p = 0.00 < 0.05)

BigInt#add is a positive delta and MpZ#add is negitive.

Full output is below:

Benchmarking add large suite

Benchmarking MpZ#__uadd: Warming up for 3000ms
Benchmarking MpZ#__uadd: Collecting 100 samples in estimated 5000.3ms (15M iterations)
Benchmarking MpZ#__uadd: Analyzing
MpZ#__uadd              time: [286.26ns 332.89ns 388.27ns]
                        change: [-34.746% -18.658% +0.9114%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3%)
  1 (1%) high mild
  2 (2%) high severe

Benchmarking MpZ#add: Warming up for 3000ms
Benchmarking MpZ#add: Collecting 100 samples in estimated 5000.2ms (10M iterations)
Benchmarking MpZ#add: Analyzing
MpZ#add                 time: [321.01ns 365.25ns 416.48ns]
                        change: [-26.528% -12.62% +4.2109%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1%)
  1 (1%) high severe

Benchmarking BigInt#add: Warming up for 3000ms
Benchmarking BigInt#add: Collecting 100 samples in estimated 5004.3ms (4.8M iterations)
Benchmarking BigInt#add: Analyzing
BigInt#add              time: [852.7ns 953.44ns 1076.5ns]
                        change: [+3.1275% +17.514% +33.906%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7%)
  2 (2%) high mild
  5 (5%) high severe

Suite add large finished
Relative to MpZ#__uadd
BigInt#add              delta: [+194.48% +236.46% +285.51%] (p = 0.00 < 0.05)
MpZ#add                 delta: [-4.0351% +9.0659% +25.553%] (p = 0.00 < 0.05)

Hypercubed · 2024-01-13T18:30:49Z

Is it possible these deltas are comaring against previous runs? After clearing the as-tral directory I'm getting this:

Suite add large finished
Relative to undefined

romdotdog · 2024-01-13T18:31:05Z

The leftmost number is the lower bound of the confidence interval, the middle is the mean, and the rightmost is the higher bound of the confidence interval.

You're looking at the leftmost number, which means, very generally, that at its best, it performs 4% better than the other algorithm, but usually (the mean) it's 9% worse.

Using statistics helps give more nuanced views of results like this. I hope this clears it up.

romdotdog · 2024-01-13T18:36:56Z

After clearing the as-tral directory I'm getting this...

That's a bug because I accidentally made some suite code conditional on having a baseline. I'll fix it soon.

Hypercubed · 2024-01-13T20:04:09Z

The leftmost number is the lower bound of the confidence interval, the middle is the mean, and the rightmost is the higher bound of the confidence interval.

You're looking at the leftmost number, which means, very generally, that at its best, it performs 4% better than the other algorithm, but usually (the mean) it's 9% worse.

Using statistics helps give more nuanced views of results like this. I hope this clears it up.

It does... TY

Hypercubed · 2024-01-28T04:25:54Z

Hello @romdotdog

Two issues I'm seeing:

I think you had it correct the first time... the suite results are being sorted decending (slowest -> fastest).... since they are relative to the suiteBenchmark (which is hopefuly fastest) the remaining results should be sorted ascending (fastest -> slowest)... IMO.
If I'm correct the benchmark results are written to the as-tral folder with filenames matching the currentBench name. If benchmarks from different suites (or files) have the same name they will collide. Guessing the files should be something like ${fileName}-${currentSuite}-${currentBench} (or hash of that).

Again... willing to submit PRs if you agree these are issues and are accepting.

romdotdog · 2024-01-28T13:42:39Z

as-tral already orders from fastest to slowest (as of v3.0.2), so I don't know what you mean by the first issue.

I'm happy to take PRs for the second issue.

Hypercubed · 2024-01-29T06:46:24Z

Any I reading this incorrectly again?

Benchmarking add large suite

Benchmarking MpZ#__uadd (large): Warming up for 3000ms
Benchmarking MpZ#__uadd (large): Collecting 100 samples in estimated 5000.9ms (7.3M iterations)
Benchmarking MpZ#__uadd (large): Analyzing
MpZ#__uadd (large)      time: [624.47ns 669.37ns 713.25ns]
                        change: [+58.52% +76.242% +97.166%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2%)
  1 (1%) high mild
  1 (1%) high severe

Benchmarking MpZ#add (large): Warming up for 3000ms
Benchmarking MpZ#add (large): Collecting 100 samples in estimated 5000.4ms (6.9M iterations)
Benchmarking MpZ#add (large): Analyzing
MpZ#add (large)         time: [596.15ns 649.3ns 702.27ns]
                        change: [+6.1317% +24.565% +46.281%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2%)
  1 (1%) high mild
  1 (1%) high severe

Benchmarking BigInt#add (large): Warming up for 3000ms
Benchmarking BigInt#add (large): Collecting 100 samples in estimated 5008.5ms (2.6M iterations)
Benchmarking BigInt#add (large): Analyzing
BigInt#add (large)      time: [1540.3ns 1674.8ns 1814.3ns]
                        change: [+26.107% +40.863% +56.35%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1%)
  1 (1%) high mild

Suite add large finished
Relative to MpZ#__uadd (large)
BigInt#add (large)      delta: [+147.02% +180.45% +220.59%] (p = 0.00 < 0.05)
MpZ#add (large)         delta: [-5.1881% +11.75% +35.888%] (p = 0.00 < 0.05)

The results for the three benchmarks are MpZ#__uadd 669.37ns, MpZ#add 649.3ns and BigInt#add 1674.8ns. The output is showing the order (Relative to MpZ#__uadd) of BigInt#add then MpZ#add. Is not BigInt#add (+180%, 1674.8ns) slower than MpZ#add (+11.75%, 649.3ns).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overall Fastest/Slowest #3

Overall Fastest/Slowest #3

Hypercubed commented Dec 23, 2023 •

edited

Loading

romdotdog commented Dec 23, 2023

Hypercubed commented Dec 24, 2023

romdotdog commented Jan 8, 2024

Hypercubed commented Jan 9, 2024

romdotdog commented Jan 10, 2024

Hypercubed commented Jan 10, 2024

Hypercubed commented Jan 10, 2024

romdotdog commented Jan 10, 2024

Hypercubed commented Jan 10, 2024

romdotdog commented Jan 11, 2024

Hypercubed commented Jan 11, 2024

romdotdog commented Jan 11, 2024

Hypercubed commented Jan 11, 2024

Hypercubed commented Jan 13, 2024

Hypercubed commented Jan 13, 2024 •

edited

Loading

romdotdog commented Jan 13, 2024

romdotdog commented Jan 13, 2024

Hypercubed commented Jan 13, 2024

Hypercubed commented Jan 28, 2024

romdotdog commented Jan 28, 2024 •

edited

Loading

Hypercubed commented Jan 29, 2024 •

edited

Loading

Overall Fastest/Slowest #3

Overall Fastest/Slowest #3

Comments

Hypercubed commented Dec 23, 2023 • edited Loading

romdotdog commented Dec 23, 2023

Hypercubed commented Dec 24, 2023

romdotdog commented Jan 8, 2024

Hypercubed commented Jan 9, 2024

romdotdog commented Jan 10, 2024

Hypercubed commented Jan 10, 2024

Hypercubed commented Jan 10, 2024

romdotdog commented Jan 10, 2024

Hypercubed commented Jan 10, 2024

romdotdog commented Jan 11, 2024

Hypercubed commented Jan 11, 2024

romdotdog commented Jan 11, 2024

Hypercubed commented Jan 11, 2024

Hypercubed commented Jan 13, 2024

Hypercubed commented Jan 13, 2024 • edited Loading

romdotdog commented Jan 13, 2024

romdotdog commented Jan 13, 2024

Hypercubed commented Jan 13, 2024

Hypercubed commented Jan 28, 2024

romdotdog commented Jan 28, 2024 • edited Loading

Hypercubed commented Jan 29, 2024 • edited Loading

Hypercubed commented Dec 23, 2023 •

edited

Loading

Hypercubed commented Jan 13, 2024 •

edited

Loading

romdotdog commented Jan 28, 2024 •

edited

Loading

Hypercubed commented Jan 29, 2024 •

edited

Loading