Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/eth/v1/node/syncing is not really up to the task #77

Closed
rolfyone opened this issue Sep 3, 2020 · 8 comments
Closed

/eth/v1/node/syncing is not really up to the task #77

rolfyone opened this issue Sep 3, 2020 · 8 comments

Comments

@rolfyone
Copy link
Contributor

rolfyone commented Sep 3, 2020

Sorry for the late flagging of this, I've just started actually moving some of the teku endpoints to serve new endpoints.

Syncing is a slightly more complicated domain than it probably appears on first glance.

A given node has a good idea of what slot it's at, and can pretty easily find what slot other peers are at, and based on trailing distances or some other metric decide it actually needs to sync data from a peer.

The current endpoint arguably over-simplifies the scope of the problem, by only having 2 fields:

head_slot: uint64,
sync_distance: uint64

Head slot describes 'head slot the node is trying to reach', and sync_distance is 0 if the node is in sync.

It's probably not uncommon for nodes to be in some kind of state where they're a little behind and just processing gossip to catch up on the state of play.

Lighthouse/teku nodes had syncing endpoint returning an explicit syncing flag, the head_slot of the best peer, the current_slot that the node is up to, and the start_slot (which i have never really needed, but is basically the first non final slot from the nodes perspective, so could potentially be winding back to that point if it ends up syncing a different node)

The value in this is that it builds a picture on the state of play of the node you're querying.

  • explicitly told if the node is syncing via a boolean

  • have the slot the node is at, and the slot of its best peer.

To best map currently, I would basically have to look at whether our node is syncing, and if it is immediately pass back our current slot, and sync_distance 0 to say we're not syncing.
If we are syncing, head_slot would then be the head_slot of the best peer that I have, and sync_distance would be head_slot - current_slot

Is this something we should consider adding more context to before going too far down the implementation route for this endpoint?

If we do stay with current description, we should probably at the very least rename the 'head_slot' due to that being a term currently in use by ethstats, and I think we're actually wanting 'target_slot' or similar.

@mpetrunic
Copy link
Contributor

head_slot is your nodes current calculated head slot. syncing_distance is amount of slots between you head_slot and calculated network head_slot. If your node is synced or gossiping blocks, you can just put sync_distance=0. Target slot is easy to calculate -> head_slot + sync_distance.

@rolfyone
Copy link
Contributor Author

rolfyone commented Sep 3, 2020

the comment for head_slot is Head slot node is trying to reach

so from what you're saying the comment is completely misleading? if it's YOUR head slot, that's the exact opposite of what I understood it to be..

@mpetrunic
Copy link
Contributor

This is embarrassing, I even implemented it like I said in lodestar but spec clearly says otherwise. 😞
Should we change api description to says "Beacon nodes head slot" or rename to target_slot?

@mcdee
Copy link
Contributor

mcdee commented Sep 3, 2020

Given the confusion, perhaps rename "head_slot" to "chain_head_slot"?

Also unsure about the idea of a target slot outlined above. Surely the target slot is the chain head slot, in which case sync distance is the chain head slot minus the node's highest synced slot.

I'm not a fan of trying to keep track of the peers' highest synced slot; it is a very volatile number and I'm not sure that knowing it will give much value to an end user. Perhaps I'm missing the use case here.

@ajsutton
Copy link
Contributor

ajsutton commented Sep 3, 2020

I think it's pretty set that "head slot" means the slot of the block at the node's current chain head because that's how eth2stats labels it.

In terms of the API I think it probably needs to provide more explicit information. There's three questions I'd normally call an API like this to answer:

  1. Is the node in sync or not?
  2. Where is it up to?
  3. How much longer does it have to go?

The ETH1 syncing APIs (https://besu.hyperledger.org/en/stable/Reference/API-Methods/#eth_syncing) are quite useful for answering these questions, providing:

  • syncing: boolean
  • current slot (probably should be head slot in ETH2 terminology - slot of the node's current best block)
  • highest slot (or maybe target slot - the slot the node is trying to sync up to)

ETH1 also provide starting block which I've never understood the reasoning for and a few fast sync specific fields, so I'd omit those. I would also avoid the slightly weird behaviour of only returning the head slot etc when syncing and just syncing: false when not syncing. When in sync I'd expect a response like:

{
  "syncing": false,
  "head_slot": 1000,
  "highest_slot": 1000
}

I think the explicit syncing flag is particularly useful, though perhaps better to flip it to "in_sync". A node is not necessarily out of sync because it has peers with a block at a later slot as they may be for non-canonical forks. It may even be that the node considers itself in sync but is actively running a batch sync to pull in a different fork but which it believes is minority (eg small number of peers on it, more than 50% of attestations on it's current canonical fork etc). I guess you can communicate that by returning a distance of 0 in that case but that requires a bunch more understanding to interpret the result instead of having more explicit fields.

@rolfyone
Copy link
Contributor Author

rolfyone commented Sep 6, 2020

if we were to update the description of head slot to something like suggested that'd be fine. I was just completely confused with how to work it given what the description was.
Also happy with Adrians suggestion if that ends up the solution, I might work on a different endpoint for a while and see how this pans out :)

@rolfyone
Copy link
Contributor Author

i think we've all moved past this now, closing.

@paulhauner
Copy link
Contributor

paulhauner commented Jan 13, 2021

Lighthouse is facing this issue as well: sigp/lighthouse#2148 (review). We also have the is_syncing flag that Teku has. We can work around the standard API removing it by declaring that syncing = sync_distance > 0.

However, I find this unattractive. Firstly, it's a pain for users. Consider a user who wants to use curl to ask the age old question of "is my node syncing?" Seeing "is_syncing: true" is going to make their lives a whole lot easier.

Secondly, the API is being opinionated by saying: It's not possible to be "not synced" with a sync distance greater than zero . This seems to be reasonable at first, but it's also reasonable for a BN to say "I might only have a few blocks left to sync and I'm synced-enough, so lets turn on all the duties endpoints and health endpoint". In this case we're unable to communicate that we're still expecting a few more blocks but are considering ourselves to be synced.

I think it's pretty clear-cut that UX is better with an is_syncing value. It's subjective if you want to do the sync tolerance stuff I talk about, but regardless I think it makes it makes the case that adding is_syncing makes the API more explicit and expressive in ways that are useful to some.

EDIT: I opened a PR at #121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants