Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add a cost_tracker option to callreadonly rpc specifying the CostTracker to use for evaluation #5828

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

alexjtupper
Copy link

Description

When evaluating a Clarity contract function, Clarity tracks its runtime cost using a CostTracker. The choice of CostTracker can significantly impact the time it takes to complete the evaluation (execution time).

This PR adds an optional cost_tracker query parameter to the callreadonly RPC endpoint, allowing the client to specify which CostTracker to use, potentially allowing the client to choose a faster evaluation.

Applicable issues

None

Additional info (benefits, drawbacks, caveats)

Currently, the callreadonly RPC endpoint uses a LimitedCostTracker initialised with LimitedCostTracker::new_mid_block(). At initialisation, this LimitedCostTracker loads the on-chain contracts specifying how costs are calculated from the store. Then, during contract evaluation, this CostTracker will evaluate a cost function for each cost-incurring operation during evaluation. By doing so, the callreadonly endpoint, through this LimitedCostTracker, can enforce cost-related limits during execution.

An alternative to the above LimitedCostTracker, is LimitedCostTracker::Free. This cost tracker doesn't load the on-chain cost-related contracts, and returns 0 cost which computing the cost of each cost-incurring operation during evaluation. It doesn't evaluate any cost functions and it can't enforce any cost-related limits.

This benchmark, comparing execution times of contract evaluation (through the Environment interface) with both OwnedEnvironment::new_free() and OwnedEnvironment::new_max_limit(), shows that using LimitedCostTracker::Free can result in significantly faster execution. It shows a significant (10-20x) increase in execution time when using the default, non-free CostTracker, and that this increase is positively related to the number of operations in the function (crudely measured as the number of executed function calls in the benchmarked contract function). YMMV.

Example benchmark output:

Median execution time (µs) from sample of 20 executions
Complexity (n_calls) |            Free |         Limited
10                   |             407 |            5099
20                   |             644 |            9876
30                   |             947 |           14871
40                   |            1290 |           19446
50                   |            1526 |           24314
60                   |            1824 |           28957
70                   |            2197 |           34191
80                   |            2414 |           38670
90                   |            2700 |           43223
100                  |            2987 |           48151

The speed-up from using LimitedCostTracker::Free may be particularly valuable to people running semi-private follower nodes for the purpose of having their own RPC endpoint (i.e. a node that is p2p publicly connected but for which the RPC endpoint is intended for internal use). The reduction in execution time from this change could result in significantly higher throughput for the callreadonly endpoint for a given target latency. However, this comes at the cost of not being able to enforce cost-related limits.

Considerations

This PR takes the approach of allowing the client to specify which of the two aforementioned LimitedCostTracker initialisations to use (via a cost_tracker query parameter) - giving the client the ability to make the tradeoff between performance and limit enforcement for their use-case.

However, it is not clear whether this could be abused on nodes with public facing RPC endpoints - with public clients deciding to evaluate contracts without cost control.

A change we could consider is whether to enable / disable this query parameter option in the node's config. When disabled, the query parameter could be ignored - falling back to the current, default behaviour.

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated
  • Required documentation changes (e.g., docs/rpc/openapi.yaml and rpc-endpoints.md for v2 endpoints, event-dispatcher.md for new events)
  • New clarity functions have corresponding PR in clarity-benchmarking repo
  • New integration test(s) added to bitcoin-tests.yml

@alexjtupper alexjtupper requested review from a team as code owners February 12, 2025 20:21
@kantai
Copy link
Contributor

kantai commented Feb 12, 2025

So there's a couple questions about the benchmark that I think need to be resolved before thinking about adding this option.

First -- the benchmarks linked use a MemoryBackingStore. This means that the cost of loading the "cost contracts" is nearly free. In the actual stacks node's RPC interface, it has to load/initialize the cost contracts by reading from the stacks chain state. This could be much more costly, and could be the dominating factor for the performance difference seen between a Free tracker (which does not need to load/initialize the cost contracts) and the limited tracker. If that's the case, then the problem for RPC endpoints could be resolved by caching the cost contracts (this would be nice because it would speed up the RPC endpoint for people running public nodes as well).

Second -- it's important to evaluate this sort of thing with a release build. The release build shows similar performance differences that increase with complexity, but the overhead is itself much smaller in real terms:

$ cargo run --release
Complexity (n_calls) |            Free |         Limited
10                   |              63 |             790
20                   |             119 |            1521
30                   |             183 |            2349
40                   |             235 |            3104
50                   |             285 |            3851
60                   |             342 |            4559
70                   |             407 |            5435
80                   |             472 |            5957
90                   |             506 |            6706
100                  |             559 |            7472

A difference of 7 ms is definitely a real difference, but was the total overhead of the RPC endpoint in the use case you were seeing just 7 ms? Or was it more significant?

@alexjtupper
Copy link
Author

Thanks for taking a look at this @kantai


First -- the benchmarks linked use a MemoryBackingStore. This means that the cost of loading the "cost contracts" is nearly free. [...] This could be much more costly, and could be the dominating factor for the performance difference...

Agreed - loading the contracts could be a significant cost and isn't measured in the benchmark.

I'll look into running a fork with either server-timing headers in the callreadonly response (measuring the contract loading time and the evaluation time separately) or with these timings in the logs.

On caching as a potential solution:

in callreadonly, we load the chain tip before we instantiate the CostTracker, so we could cache the cost_function_references, cost_contracts and contract_call_circuits TrackerData fields using the tip block ID as a key (if the tip changes , we would have to check for contract changes). Since the API allows the client to specify the tip ID, we could use a small LRU cache to accommodate multiple tips.

To construct the LimitedCostTracker with the cached TrackerData fields, I think the simplest change would be to add a public method to LimitedCostTracker: TrackerData fields are private (from the perspective of the stackslib package) and I think the latter would be the natural place to define the cache. This new method could work by allowing the caller to pass the cost_function_references, cost_contracts and contract_call_circuits needed for the TrackerData, and would skip the cost_tracker.load_costs() call.


Second -- it's important to evaluate this sort of thing with a release build.

Good point - I'll update the README to include the --release option in the example.

The benchmark is designed to show the relationship between contract complexity, cost tracker and execution time; as noted in the README, the absolute numbers are somewhat meaningless.

A difference of 7 ms is definitely a real difference, but was the total overhead of the RPC endpoint in the use case you were seeing just 7 ms? Or was it more significant?

The original motivation for running the benchmark and exploring this option was that I had observed a speed up from ~30ms to ~7ms by switching from the non-free to the free LimitedCostTracker. It's hard to replicate this in a way that I can share - and I imagine the results will vary drastically based on the contract. Hence the benchmark.


TL;DR

I'll try running another experiment in a fork (off 3.1.0.0.3), logging evaluation times and contract loading times on mainnet with both the free and non-free cost trackers and run some traffic with a range of contracts.

If you have any ideas for which contracts would be good to test - let me know!

@aldur aldur added this to the 3.1.0.0.6 milestone Feb 13, 2025
@aldur aldur linked an issue Feb 13, 2025 that may be closed by this pull request
@kantai
Copy link
Contributor

kantai commented Feb 13, 2025

I'll try running another experiment in a fork (off 3.1.0.0.3), logging evaluation times and contract loading times on mainnet with both the free and non-free cost trackers and run some traffic with a range of contracts.

Great!

If it turns out to be the case that caching would get us most of the performance benefit here, I'd suggest trying to make the caching system work.

If switching to the free tracker ends up still being necessary, then my suggestion would be to refactor this PR somewhat. Ultimately, we want the node operator to be able to configure this (rather than the client), which means a new config setting will have to be plumbed through the ConnectionOptions and ConnectionOptionsFile structs.

@alexjtupper
Copy link
Author

For the contracts I'm working with, I'm seeing the following median times (μs) for executing the following parts of the callreadonly endpoint:

Cost Tracker Type Cost Tracker Initialization Evaluation Handler
Mid block 957 6262.5 7903.5
Free 0 3995.5 4688

In this case:

  1. The non-free cost tracker initialisation takes roughly: 1ms
  2. Evaluation takes longer with the non-free tracker by about: 2ms

Two outstanding questions I have:

  1. Why is this so different from the benchmark (I would expect some difference in the evaluation times, but 1.5x above is very different from the 15x in the benchmark)?
  2. Does this change significantly under load (I previously saw a speed up from 30ms to 7ms for the handler, now I'm seeing something smaller - perhaps because I'm running a small sample of requests in series)?

@alexjtupper
Copy link
Author

@kantai: what do you reckon to me continuing with your suggested refactor given my findings on mainnet? I can also include an approach to caching and reusing the cost contracts to shave off some of that 1ms cost tracker initialisation.

@kantai
Copy link
Contributor

kantai commented Feb 18, 2025

what do you reckon to me continuing with your suggested refactor given my findings on mainnet?

Yeah, I think that refactoring the configuration to support the free tracker makes sense in this case.

@aldur aldur modified the milestones: 3.1.0.0.6, 3.1.0.0.7 Feb 21, 2025
@kantai
Copy link
Contributor

kantai commented Feb 26, 2025

@alexjtupper -- the latest develop includes some performance improvements for the cost tracking.

Updating to use the latest revision of clarity/stacks-common in your benchmark:

diff --git a/Cargo.toml b/Cargo.toml
index 120525f..a7504ff 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -4,7 +4,7 @@ version = "0.1.0"
 edition = "2021"
 
 [dependencies]
-clarity = { git = "https://github.com/stacks-network/stacks-core", tag = "3.1.0.0.5", version = "0.0.1", features = [
+clarity = { git = "https://github.com/stacks-network/stacks-core", rev = "3a935b5c8d92550f520812cca9655b407269cfa7", features = [
     'testing',
 ] }
-stacks-common = { git = "https://github.com/stacks-network/stacks-core", tag = "3.1.0.0.5", version = "0.0.1" }
+stacks-common = { git = "https://github.com/stacks-network/stacks-core", rev = "3a935b5c8d92550f520812cca9655b407269cfa7" }

The measured runtimes seem much closer now: the overhead is more like 10-12%.

Median execution time (µs) from sample of 20 executions
Complexity (n_calls) |            Free |         Limited
10                   |              66 |              72
20                   |             119 |             130
30                   |             180 |             199
40                   |             230 |             251
50                   |             286 |             312
60                   |             342 |             375
70                   |             399 |             444
80                   |             465 |             494
90                   |             512 |             554
100                  |             565 |             622

Is this performance close enough that you no longer want to configure the free tracker?

@aldur aldur modified the milestones: 3.1.0.0.7, 3.1.0.0.8 Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Option to disable cost tracking for read-only calls
3 participants