Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Load Testing] 10B Relay Load Test #1106

Open
18 tasks
okdas opened this issue Mar 5, 2025 · 3 comments
Open
18 tasks

[Load Testing] 10B Relay Load Test #1106

okdas opened this issue Mar 5, 2025 · 3 comments
Assignees
Labels
infra Infra or tooling related improvements, additions or fixes loadtest Work related to load testing

Comments

@okdas
Copy link
Member

okdas commented Mar 5, 2025

Objective

Create and execute a large-scale load test with billions of relays to validate the network's scalability limits under extreme conditions.

Origin Document

This is a follow-up to the permissionless demand load testing work in issue #742. Based on the progress made and issues identified, we need a dedicated test to push the network to its theoretical limits.

Goals

  • Validate the network's ability to handle massive scale with hundreds of suppliers, thousands of services, and thousands of applications
  • Identify any bottlenecks or failure points when processing billions of relays
  • Gather performance metrics to inform future scalability improvements
  • Verify that previous infrastructure issues have been resolved at scale

Deliverables

  • A. Test environment setup:

    • 1. Register thousands (1000s) of services
    • 2. Deploy and configure hundreds (100s) of suppliers
    • 3. Register thousands (1000s) of applications
    • 4. Configure load generation for billions of relays
  • B. Execution and monitoring:

    • 1. Run the load test in controlled phases with increasing load
    • 2. Monitor system performance metrics (CPU, memory, disk, network)
    • 3. Track on-chain events and transaction processing
    • 4. Document any failures or degradation points
  • C. Analysis and reporting:

    • 1. Analyze performance data and identify bottlenecks
    • 2. Document maximum sustainable throughput
    • 3. Compare results with previous load tests
    • 5. Provide recommendations for further improvements
  • D. Documentation & Automation

    • 1. Document the tools used for A1 - A4 (@okdas)
    • 2. Prepare another X Thread similar to the one in the origin document (@Olshansk)

Non-goals / Non-deliverables

  • Fixing issues discovered during the load test (these should be tracked in separate tickets)
  • Involving community members in the load test execution
  • Testing other aspects of the network not related to relay, claim & proof processing
  • Optimizing the code based on test results

Creator: @okdas
Co-Owners: @red-0ne @Olshansk

@okdas okdas self-assigned this Mar 5, 2025
@okdas okdas added this to Shannon Mar 5, 2025
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Shannon Mar 5, 2025
@okdas okdas added infra Infra or tooling related improvements, additions or fixes loadtest Work related to load testing labels Mar 5, 2025
@okdas okdas added this to the MainNet Launch milestone Mar 5, 2025
@Olshansk Olshansk moved this from 📋 Backlog to 🏗 In progress in Shannon Mar 5, 2025
@Olshansk
Copy link
Member

Olshansk commented Mar 7, 2025

@okdas I made some minor NITs & Edits to the GitHub issue. You can view the diff (just in case you're not familiar with this feature).

Image


One concrete question I have is re this:

  • Involving community members in the load test execution

I was envisioining that we'd ask the community members to run some suppliers and non-anvil backend services.

Before we reach out, I wanted to ask what did you have in mind?

@okdas
Copy link
Member Author

okdas commented Mar 7, 2025

I was envisioining that we'd ask the community members to run some suppliers and non-anvil backend services.

We won't have full visibility into the performance metrics as we don't "control" (i.e. maintain) that infrastructure.

We could potentially ask the community to stake suppliers, run RelayMiners, and use them for the load test. Howeer, this adds an additional layer of indirection that does not contribute to the goals of this.

I would like to keep the option (or preference) to switch/maintain to our Suppliers + RelayMiners only.

Before we reach out, I wanted to ask what did you have in mind?

  1. Provision another k8s cluster in a different region.
  2. Add new onchain services
  3. Stake/restake Applications and Suppliers
  4. Deploy new RelayMiners in (1)
  5. Use NGINX static response for the backend of each RelayMiner regardless of the service.

Pros of this approach:

  • We are isolating onchain scalability
  • We are isolating RelayMiner bottlenecks & performance
  • We are not dependant on external parties
  • We do not need to investigate backend/node issues

@Olshansk
Copy link
Member

Olshansk commented Mar 7, 2025

@okdas Made some edits to your message (for clarity for myself & others) but the plan looks 👌 to me.

We'll handle this ourselves.

My key ask is to ensure that we have different services, but the actual contents of the RelayMiner's backend response -- as you clearly state -- is out of scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra Infra or tooling related improvements, additions or fixes loadtest Work related to load testing
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants