Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall Fastest/Slowest #3

Open
Hypercubed opened this issue Dec 23, 2023 · 21 comments
Open

Overall Fastest/Slowest #3

Hypercubed opened this issue Dec 23, 2023 · 21 comments

Comments

@Hypercubed
Copy link

Hypercubed commented Dec 23, 2023

First, thank you for your work on this project!

Do you think it would be work while adding an overall result to the benchmark run? Something like "Fastest is"/"Slowest is". Maybe simplest would be to do this per file... otherwise a suite method like:

suite('name', () => {
  bench("a", () => {
      blackbox(xxx);
  });
  
  bench("b", () => {
      blackbox(xxx);
  });
});

Willing to work on a PR if that's a feature you're interested in.

@romdotdog
Copy link
Owner

Hi. I'll try my hand at implementing this. I'll let you know if it somehow falls though; at a minimum I'd be happy to review a PR.

@Hypercubed
Copy link
Author

Let me know how it goes.... FYI this is output from hyperfine:

Summary
  ./build/hello-zig ran
    1.37 ± 2.28 times faster than ./build/hello-rust
    2.15 ± 3.64 times faster than bc ./bc/hello.bc
    2.42 ± 3.95 times faster than ./build/hello-ghc
    2.42 ± 4.40 times faster than ./build/hello-go
    2.53 ± 4.03 times faster than ./build/hello-cpp
    7.38 ± 9.54 times faster than ./build/hello-dart.exe
   28.47 ± 36.37 times faster than python3 ./python/hello.py
   34.53 ± 44.83 times faster than bun ./js/hello.js
   42.28 ± 52.85 times faster than bun ./ts/hello.ts
   65.02 ± 83.96 times faster than node ./js/hello.js
   66.50 ± 85.45 times faster than ./build/hello-deno
   68.43 ± 88.83 times faster than deno run ./js/hello.js
   73.08 ± 94.20 times faster than deno run ./ts/hello.ts
  146.76 ± 187.52 times faster than ruby ./ruby/hello.rb
  268.15 ± 324.65 times faster than racket ./racket/hello.rkt
  279.57 ± 354.98 times faster than ./build/hello-racket
  589.13 ± 765.03 times faster than tsx ./ts/hello.ts
  966.16 ± 1212.96 times faster than elixir ./elixir/hello.exs

@romdotdog
Copy link
Owner

I have finished. It is now live as v3, and I included a short note about it in the README.md. It was a bit tedious, but I got it done using your syntax.

Here's an example of the output:
image

I hope this resolves your issue. If you need anything else, let me know.

@Hypercubed
Copy link
Author

Looks great! I'll give it a try soon!

@romdotdog
Copy link
Owner

It came to me on a whim that I forgot to clear the existing benches if you decide to write multiple suites, so I pushed a patch as v3.0.1, if you had already updated as-tral.

@Hypercubed
Copy link
Author

Looks really good... question about the output. How are the tests ordered. It looks like the "Relative to XXX" test is the first test and the rest of the list are in "decending" order (slow -> fast)? Is that right? Showing only the % change can be ambigious. The output from hyperfine, BTW, is relative to the fastest with the rest in "ascending" order (fast -> slow).

@Hypercubed
Copy link
Author

Here is another tool's (benchmarkjs) output:

    InstanceOf x 44,909,045 ops/sec ±288.80% (2 runs sampled)
    Array.isArray x 20,286,249 ops/sec ±89.81% (2 runs sampled)
    Object.prototype.toString.call x 1,268,658 ops/sec ±116.03% (2 runs sampled)

*Fastest is __InstanceOf__*

@romdotdog
Copy link
Owner

Why would only showing the delta be ambiguous? The delta is relative to the first test; it's not in any way related to change from last test.

@Hypercubed
Copy link
Author

It's kind of pedantic and we (user) can usually figure it out but:

Hyperfine shows "XX times faster"... this is clearly "higher is faster". With a +/-XX% it's not entirely clear if positive is faster or slower. In your case a higher number positive means slower, a higher negitive number means faster. Even typing that I'm not 100% clear.

Here is what I am seeing:

Suite add-small finished
Relative to MpZ#add
BigInt#add              delta: [+156.54% +159.73% +163.01%] (p = 0.00 < 0.05)
MpZ#__uadd              delta: [-21.084% -19.914% -18.836%] (p = 0.00 < 0.05)
MpZ#_uaddU32            delta: [-24.498% -23.379% -22.237%] (p = 0.00 < 0.05)

At a glance I'm not sure which is fastest.

This is all a little pedantic but I think if the list was presented as relative to the fastest... then all numbers become positive (slower) and the results would be more clear IMO.

@romdotdog
Copy link
Owner

I reversed the order and pushed it as v3.0.2.

@Hypercubed
Copy link
Author

I don't think it is so much about the order... but what it is relative to. It should start with the fastest on top and all the other relative to that one (ideally fast -> slow).

In my output:

Suite add-small finished
Relative to MpZ#add
BigInt#add              delta: [+156.54% +159.73% +163.01%] (p = 0.00 < 0.05)
MpZ#__uadd              delta: [-21.084% -19.914% -18.836%] (p = 0.00 < 0.05)
MpZ#_uaddU32            delta: [-24.498% -23.379% -22.237%] (p = 0.00 < 0.05)

You see that BigInt#add is slower but MpZ#__uadd is faster than the baseline MpZ#add.

@romdotdog
Copy link
Owner

This is technically difficult, actually. I'll see what I can do in terms of freeing up time on my schedule.

@Hypercubed
Copy link
Author

I understand.... just a very nice to have IMO. Thank you for your work!

@Hypercubed
Copy link
Author

Maybe I'm completly misunderstanding these results. I have three tests in a suite. I've manually ordered them from fastest to slowest (based on my expectation before running). The three tests have the following times:

MpZ#__uadd              time: [286.26ns 332.89ns 388.27ns]
MpZ#add                 time: [321.01ns 365.25ns 416.48ns]
BigInt#add              time: [852.7ns 953.44ns 1076.5ns]

Based on these times MpZ#__uadd is the fastest (as expected) followed by MpZ#add (~1.1x slower) and BigInt#add (~3x slower). But here is the summary:

Suite add large finished
Relative to MpZ#__uadd
BigInt#add              delta: [+194.48% +236.46% +285.51%] (p = 0.00 < 0.05)
MpZ#add                 delta: [-4.0351% +9.0659% +25.553%] (p = 0.00 < 0.05)

BigInt#add is a positive delta and MpZ#add is negitive.

Full output is below:

Benchmarking add large suite

Benchmarking MpZ#__uadd: Warming up for 3000ms
Benchmarking MpZ#__uadd: Collecting 100 samples in estimated 5000.3ms (15M iterations)
Benchmarking MpZ#__uadd: Analyzing
MpZ#__uadd              time: [286.26ns 332.89ns 388.27ns]
                        change: [-34.746% -18.658% +0.9114%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3%)
  1 (1%) high mild
  2 (2%) high severe

Benchmarking MpZ#add: Warming up for 3000ms
Benchmarking MpZ#add: Collecting 100 samples in estimated 5000.2ms (10M iterations)
Benchmarking MpZ#add: Analyzing
MpZ#add                 time: [321.01ns 365.25ns 416.48ns]
                        change: [-26.528% -12.62% +4.2109%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1%)
  1 (1%) high severe

Benchmarking BigInt#add: Warming up for 3000ms
Benchmarking BigInt#add: Collecting 100 samples in estimated 5004.3ms (4.8M iterations)
Benchmarking BigInt#add: Analyzing
BigInt#add              time: [852.7ns 953.44ns 1076.5ns]
                        change: [+3.1275% +17.514% +33.906%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7%)
  2 (2%) high mild
  5 (5%) high severe

Suite add large finished
Relative to MpZ#__uadd
BigInt#add              delta: [+194.48% +236.46% +285.51%] (p = 0.00 < 0.05)
MpZ#add                 delta: [-4.0351% +9.0659% +25.553%] (p = 0.00 < 0.05)

@Hypercubed
Copy link
Author

Hypercubed commented Jan 13, 2024

Is it possible these deltas are comaring against previous runs? After clearing the as-tral directory I'm getting this:

Suite add large finished
Relative to undefined

@romdotdog
Copy link
Owner

The leftmost number is the lower bound of the confidence interval, the middle is the mean, and the rightmost is the higher bound of the confidence interval.

You're looking at the leftmost number, which means, very generally, that at its best, it performs 4% better than the other algorithm, but usually (the mean) it's 9% worse.

Using statistics helps give more nuanced views of results like this. I hope this clears it up.

@romdotdog
Copy link
Owner

After clearing the as-tral directory I'm getting this...

That's a bug because I accidentally made some suite code conditional on having a baseline. I'll fix it soon.

@Hypercubed
Copy link
Author

The leftmost number is the lower bound of the confidence interval, the middle is the mean, and the rightmost is the higher bound of the confidence interval.

You're looking at the leftmost number, which means, very generally, that at its best, it performs 4% better than the other algorithm, but usually (the mean) it's 9% worse.

Using statistics helps give more nuanced views of results like this. I hope this clears it up.

It does... TY

@Hypercubed
Copy link
Author

Hello @romdotdog

Two issues I'm seeing:

  • I think you had it correct the first time... the suite results are being sorted decending (slowest -> fastest).... since they are relative to the suiteBenchmark (which is hopefuly fastest) the remaining results should be sorted ascending (fastest -> slowest)... IMO.

  • If I'm correct the benchmark results are written to the as-tral folder with filenames matching the currentBench name. If benchmarks from different suites (or files) have the same name they will collide. Guessing the files should be something like ${fileName}-${currentSuite}-${currentBench} (or hash of that).

Again... willing to submit PRs if you agree these are issues and are accepting.

@romdotdog
Copy link
Owner

romdotdog commented Jan 28, 2024

as-tral already orders from fastest to slowest (as of v3.0.2), so I don't know what you mean by the first issue.

I'm happy to take PRs for the second issue.

@Hypercubed
Copy link
Author

Hypercubed commented Jan 29, 2024

Any I reading this incorrectly again?

Benchmarking add large suite

Benchmarking MpZ#__uadd (large): Warming up for 3000ms
Benchmarking MpZ#__uadd (large): Collecting 100 samples in estimated 5000.9ms (7.3M iterations)
Benchmarking MpZ#__uadd (large): Analyzing
MpZ#__uadd (large)      time: [624.47ns 669.37ns 713.25ns]
                        change: [+58.52% +76.242% +97.166%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2%)
  1 (1%) high mild
  1 (1%) high severe

Benchmarking MpZ#add (large): Warming up for 3000ms
Benchmarking MpZ#add (large): Collecting 100 samples in estimated 5000.4ms (6.9M iterations)
Benchmarking MpZ#add (large): Analyzing
MpZ#add (large)         time: [596.15ns 649.3ns 702.27ns]
                        change: [+6.1317% +24.565% +46.281%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2%)
  1 (1%) high mild
  1 (1%) high severe

Benchmarking BigInt#add (large): Warming up for 3000ms
Benchmarking BigInt#add (large): Collecting 100 samples in estimated 5008.5ms (2.6M iterations)
Benchmarking BigInt#add (large): Analyzing
BigInt#add (large)      time: [1540.3ns 1674.8ns 1814.3ns]
                        change: [+26.107% +40.863% +56.35%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1%)
  1 (1%) high mild

Suite add large finished
Relative to MpZ#__uadd (large)
BigInt#add (large)      delta: [+147.02% +180.45% +220.59%] (p = 0.00 < 0.05)
MpZ#add (large)         delta: [-5.1881% +11.75% +35.888%] (p = 0.00 < 0.05)

The results for the three benchmarks are MpZ#__uadd 669.37ns, MpZ#add 649.3ns and BigInt#add 1674.8ns. The output is showing the order (Relative to MpZ#__uadd) of BigInt#add then MpZ#add. Is not BigInt#add (+180%, 1674.8ns) slower than MpZ#add (+11.75%, 649.3ns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants