Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zinnia spark spot check #2

Merged
merged 21 commits into from
Feb 17, 2025
Merged

Add zinnia spark spot check #2

merged 21 commits into from
Feb 17, 2025

Conversation

pyropy
Copy link
Contributor

@pyropy pyropy commented Feb 6, 2025

Adds basic utility for retrieving performing spot checks on spark retrievals. By default it performs full retrieval on all deals for current round.

Implementation plan specified here
Closes CheckerNetwork/roadmap#226

This was referenced Feb 6, 2025
@pyropy pyropy force-pushed the add-zinnia-spark-spot-check branch from 72196c3 to eb1d72e Compare February 7, 2025 12:05
@pyropy
Copy link
Contributor Author

pyropy commented Feb 7, 2025

I have rebased to main given that #3 is predecessor to #2 but was opened at a later time (hence the force-push)

@pyropy pyropy self-assigned this Feb 7, 2025
@juliangruber
Copy link
Member

I have rebased to main given that #3 is predecessor to #2 but was opened at a later time (hence the force-push)

@pyropy GitHub has a feature for this. If PR A depends on PR B, set the target branch of PR A to PR B's branch. Then, when PR B gets merged, GitHub automatically changes PR A to target main.

@pyropy
Copy link
Contributor Author

pyropy commented Feb 7, 2025

I have rebased to main given that #3 is predecessor to #2 but was opened at a later time (hence the force-push)

@pyropy GitHub has a feature for this. If PR A depends on PR B, set the target branch of PR A to PR B's branch. Then, when PR B gets merged, GitHub automatically changes PR A to target main.

Did not know about that, that's awesome! Thanks for letting me know 🙏🏻

@pyropy pyropy marked this pull request as ready for review February 10, 2025 12:02
Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 👏

lib/constants.js Outdated
export const SPARK_VERSION = '1.15.0'
export const MAX_CAR_SIZE = 200 * 1024 * 1024 // 200 MB
export const APPROX_ROUND_LENGTH_IN_MS = 20 * 60_000 // 20 minutes
export const MAX_CAR_SIZE = 200 * 1024 * 1024 // 200MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant can be removed since the spot checker isn't enforcing a CAR size limit anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually used to define max size of buffer used to read CAR bytes to

const carBuffer = new ArrayBuffer(0, { maxByteLength: MAX_CAR_SIZE })

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I didn't see that! Can it happen that we read more than MAX_CAR_SIZE into the buffer, and if, what happens then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when we are trying to read more bytes into the buffer ArrayBuffer instance throws an error, for example:

RangeError: ArrayBuffer.prototype.resize: Invalid length parameter
    at ArrayBuffer.resize (<anonymous>)
    at SpotChecker.fetchCAR (file:///home/srdjan/Development/checker-network/spark-spot-check/lib/spot-checker.js:126:21)
    at eventLoopTick (ext:core/01_core.js:178:11)
    at async SpotChecker.executeSpotCheck (file:///home/srdjan/Development/checker-network/spark-spot-check/lib/spot-checker.js:75:5)
    at async SpotChecker.spotCheck (file:///home/srdjan/Development/checker-network/spark-spot-check/lib/spot-checker.js:158:5)
    at async SpotChecker.run (file:///home/srdjan/Development/checker-network/spark-spot-check/lib/spot-checker.js:183:24)
    at async file:///home/srdjan/Development/checker-network/spark-spot-check/main.js:35:1

I have reduced size from 200MB to 1MB as we're currently only reading 200 bytes (there is some overhead due to car encoding hence bigger upper limit of 1MB).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve removed this limit since we’re no longer loading the CAR file into memory. Instead, we're now reading a chunked stream returned by Lassie.

We now use the maxByteLength parameter, which defines the threshold for the number of bytes downloaded before interrupting the retrieval.

// 20 is the digest length (32 bytes = 256 bits)
stats.carChecksum = '1220' + encodeHex(digest)
}
await verifyContent(cid, carBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is verifyContent checking? Will it only verify that the root block payload in carBytes matches the hash in cid? IIUC, that's how I implemented this verification in Spark, where it makes sense, since we are fetching the root block only.

When we are fetching more than a root block, then it would be best to verify the entire content. Otherwise, the server can return a CAR with a root block followed by a random payload, and we will accept it as a valid response.

To verify the entire content, we need to download the entire Merkle tree (the full CAR file).

If we want to work with byte ranges, then we may be able to use IPFS HTTP Gateway feature where the client asks for a byte range in the file and the gateway is expected to send not only those bytes, but also additional content required to perform content-verification all the way up to the root cid.

Another option is to skip content verification. The downside is that without content verification, our spot check cannot tell whether the SP is serving the real content or just random bytes.

Let's discuss!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you are already using entity-bytes below, so maybe my comment above is not relevant as it's based on an incomplete understanding of this pull request.

@pyropy pyropy marked this pull request as draft February 11, 2025 12:39
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks great at the high level.

I think the biggest remaining question is how many bytes to retrieve while being able to do proper content verification. As we discussed in our call earlier today, one option is to use Lassie for all retrievals (both Graphsync and HTTP). Lassie performs content verification and can offload large payloads to disk.

I am concerned about having two subtly different copies of Spark's lib files. It's fine for a proof of concept, but once we move to a more production-ready version, I would like us to explore how to refactor the original Spark checker code to support both "regular" checks and these new "spot" checks. Ideally, the new spot checker should re-use Spark checker's lib files with no modifications needed. (We can even move the spot checker's entry point to the Spark checker repository, similarly to how we have manual-check.js, but that's open for discussion.)

@pyropy
Copy link
Contributor Author

pyropy commented Feb 11, 2025

The PR looks great at the high level.

I think the biggest remaining question is how many bytes to retrieve while being able to do proper content verification. As we discussed in our call earlier today, one option is to use Lassie for all retrievals (both Graphsync and HTTP). Lassie performs content verification and can offload large payloads to disk.

I think that for now we should piggyback of the Lassie retrievals as it enables us to perform.

What I have found is that we're able to perform validation over each chunk that we receive as response:

const reader = await CarBlockIterator.fromIterable(res.body)
for await (const block of reader) {
    await validateBlock(block)
}

I am not sure if this is correct, but in case it is we can either perform full retrieval or cancel retrieval after some set period while we validate streamed blocks on the fly.

I am concerned about having two subtly different copies of Spark's lib files. It's fine for a proof of concept, but once we move to a more production-ready version, I would like us to explore how to refactor the original Spark checker code to support both "regular" checks and these new "spot" checks. Ideally, the new spot checker should re-use Spark checker's lib files with no modifications needed. (We can even move the spot checker's entry point to the Spark checker repository, similarly to how we have manual-check.js, but that's open for discussion.)

I completely agree with you on this one. Original idea was to add spot check to spark codebase but I was afraid of introducing too much changes at once to such critical part of codebase that I have decided to create a new repository as a PoC. I think we can move it to main codebase once we are sure that spot checks make sense and that the ones we perform are correct.

@bajtos
Copy link
Member

bajtos commented Feb 12, 2025

The PR looks great at the high level.
I think the biggest remaining question is how many bytes to retrieve while being able to do proper content verification. As we discussed in our call earlier today, one option is to use Lassie for all retrievals (both Graphsync and HTTP). Lassie performs content verification and can offload large payloads to disk.

I think that for now we should piggyback of the Lassie retrievals as it enables us to perform.

What I have found is that we're able to perform validation over each chunk that we receive as response:

const reader = await CarBlockIterator.fromIterable(res.body)
for await (const block of reader) {
    await validateBlock(block)
}

I am not sure if this is correct, but in case it is we can either perform full retrieval or cancel retrieval after some set period while we validate streamed blocks on the fly.

It looks correct to me. IIRC, Lassie is re-ordering the blocks inside the returned CAR file to enable streaming validation like the one you are performing in your for-await loop.

I believe Lassie already performs content verification; therefore, it's not strictly necessary to do content verification inside the spot checker. It should not harm, though.

lib/multiaddr.js Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this file and not call validateHttpMultiaddr() since we are delegating all retrievals to Lassie. We can let Lassie validate the multiaddr.

What do you think?

The only downside I see is that the spot-checker may be able to retrieve successfully from a multiaddr value that the real Spark checker does not support, which can be confusing.

So maybe it's better to keep the validation check in place 🤷🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally removed this in favor of Lassie's validation logic but later decided to add it back. When given an invalid multiaddr, Lassie responds with a 400 status code and a generic error message like invalid providers parameter, without providing much detail about the issue.

@pyropy pyropy marked this pull request as ready for review February 12, 2025 13:18
}

/**
* @param {object} args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation looks way off 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to more jsdoc comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please in the future review PRs before submitting them :)

Copy link
Member

@juliangruber juliangruber Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue review after the indentation has been fixed, because that will make the diff easier to read

Copy link
Contributor Author

@pyropy pyropy Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran npx standard --fix, and that’s the result 😅

By the way, what have you been using on Spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reformatted everything using VSCode and then ran npx standard --fix again and now it seems fine. I've been using neovim for past couple of days so maybe that's the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been using any code formatter, but I think @bajtos uses one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this got messed up 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, standard has no opinions on whitespace in code comments. It annoys me, I hope we will find a better solution as part of CheckerNetwork/roadmap#166

@pyropy pyropy merged commit 255133e into main Feb 17, 2025
@pyropy pyropy deleted the add-zinnia-spark-spot-check branch February 17, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First round of spot checks on SPs to validate Spark scores
3 participants