diff --git a/docs/architecture/adr-024-high-throughput-recovery.md b/docs/architecture/adr-024-high-throughput-recovery.md new file mode 100644 index 0000000000..d3089f565f --- /dev/null +++ b/docs/architecture/adr-024-high-throughput-recovery.md @@ -0,0 +1,42 @@ +# ADR 024: High Throughput Recovery + +## Changelog + +- 2025/01/29: Initial draft (@evan-forbes) + +## Status + +Proposed + +## Context + +The Celestia protocol will likely separate block propagation into two phases. "Preparation", for distributing data before the block is created, and "recovery" for distributing data after the block has been created. In order to utilize the data distributed before the block is created, the recovery phase must also be pull based. Therefore, the constraints for recovery are: + +- 100% of the Block data MUST be delivered to >2/3 of the voting power before the ProposalTimeout is reached +- MUST use pull based gossip + +## Decision + +TBD + +## Detailed Design + +- [Messages](./assets/adr024/messages.md) +- [Handlers and State](./assets/adr024/handlers_and_state.md) +- [Connecting to Consensus](./assets/adr024/connecting_to_consensus.md) + +## Alternative Approaches + +### PBBT w/o erasure encoding + +### No broadcast tree + +## Consequences + +### Positive + +### Negative + +### Neutral + +## References diff --git a/docs/architecture/assets/adr024/connecting_to_consensus.md b/docs/architecture/assets/adr024/connecting_to_consensus.md new file mode 100644 index 0000000000..85188031cd --- /dev/null +++ b/docs/architecture/assets/adr024/connecting_to_consensus.md @@ -0,0 +1,119 @@ +# Backwards Compatible Block Propagation + +This document is an extension of ADR024. + +## Intro + +Changes to gossiping protocols need to be backwards compatible with the existing +mechanism to allow for seemless upgrades. This means that the gossiping +mechanisms need to be hotswapple. This can be challenging due to the consensus +reactor and state having their own propagation mechanism, and that they were not +designed to be easily modifiable. + +## Compatability with the Consensus Reactor + +Minimally invasive modularity can be added by not touching the consensus state, +and utilizing the same entry points that exist now. That is, the consenus +reactors internal message channel to the consensus state. While far from optimal +from an engineering or even performance perspective, by simply adding (yet another) +syncing routine, we can sync the data from the block propagation reactor to the +consensus. + +```go +// sync data periodically checks to make sure that all block parts in the data +// routine are pushed through to the state. +func (cs *State) syncData() { + for { + select { + case <-cs.Quit(): + return + case <-time.After(time.Millisecond * SyncDataInterval): + if cs.dr == nil { + continue + } + + cs.mtx.RLock() + h, r := cs.Height, cs.Round + pparts := cs.ProposalBlockParts + pprop := cs.Proposal + completeProp := cs.isProposalComplete() + cs.mtx.RUnlock() + + if completeProp { + continue + } + + prop, parts, _, has := cs.dr.GetProposal(h, r) + + if !has { + continue + } + + if prop != nil && pprop == nil { + cs.peerMsgQueue <- msgInfo{&ProposalMessage{prop}, ""} + } + + if pparts != nil && pparts.IsComplete() { + continue + } + + for i := 0; i < int(parts.Total()); i++ { + if pparts != nil { + if p := pparts.GetPart(i); p != nil { + continue + } + } + + part := parts.GetPart(i) + if part == nil { + continue + } + cs.peerMsgQueue <- msgInfo{&BlockPartMessage{cs.Height, cs.Round, part}, ""} + } + } + } +} +``` + +This allows for the old routine, alongside the rest of the consensus state +logic, to function as it used to for peers that have yet to migrate to newer +versions. If the peer does not indicate that they are using the new block prop +reactor during the handshake, then the old gossiping routines are spun up like +normal upon adding the peer to the consensus reactor. However, if the peer has +indicated that they are using the new consensus reactor, then the old routines +are simply not spun up. Something along the lines of the below code should +suffice. + +```go +func legacyPropagation(peer p2p.Peer) (bool, error) { + legacyblockProp := true + ni, ok := peer.NodeInfo().(p2p.DefaultNodeInfo) + if !ok { + return false, errors.New("wrong NodeInfo type. Expected DefaultNodeInfo") + } + + for _, ch := range ni.Channels { + if ch == types.BlockPropagationChannel { + legacyblockProp = false + break + } + } + + return legacyblockProp, nil +} +``` + +## Compatability with Parity Data + +Adding parity data is highly advantageous for broadcast trees and pull based +gossip. However, the added parity data also requires being committed to by the +proposer. At the moment, the proposer commits over the block data via the +`PartSetHeader`. In order to be backwards compatible, we can't break this. +Simulataneously, we don't want to add excessive overhead via requiring +commitments computed twice. In order to solve this dilemma, we can simply reuse +the first commitment, add a second parity commitment computed identically to the +original `PartSetHeader` hash. + +Setting the `PartSetHeader` hash to the zero value and not using it is an +option. Since this is a consensus breaking change, changing the commitment in +the `CompactBlock` can be done at the same time. diff --git a/docs/architecture/assets/adr024/handlers_and_state.md b/docs/architecture/assets/adr024/handlers_and_state.md new file mode 100644 index 0000000000..8002870255 --- /dev/null +++ b/docs/architecture/assets/adr024/handlers_and_state.md @@ -0,0 +1,3 @@ +# Logic and State + +The pbbt reactor logic at a high level is described in the spec. \ No newline at end of file diff --git a/docs/architecture/assets/adr024/messages.md b/docs/architecture/assets/adr024/messages.md new file mode 100644 index 0000000000..6c2d20e4d6 --- /dev/null +++ b/docs/architecture/assets/adr024/messages.md @@ -0,0 +1,146 @@ +# PBBT Messages and Validation Logic + +At a high level, all flavors of PBBT have four message types. `Commitment`, +`Have`, `Want`, and `Data`. + +## Commitment + +```proto +message TxMetaData { + bytes hash = 1; + uint32 start = 2; + uint32 end = 3; +} + +// CompactBlock commits to the transaction included in a proposal. +message CompactBlock { + int64 height = 1; + int32 round = 2; + bytes bp_hash = 3; + repeated TxMetaData blobs = 4; + bytes signature = 5; +} +``` + +The compact block is signed over by the proposer, and verified by converting to +signbytes, and using the proposer's public key to verify the included signature. + +> Note: This siganture is separate from the proposal signature as it is purely +> related to block propagation, and not meant to be part of the proposal. This +> allows for block propagation to be backwards compatible with older +> implementations. + +The `TxMetaData` contains the hash of the PFB for the blob transaction that it +commits to, alongside the `start` and `end`. `start` is the inclusive index of +the starting byte in the protobuf encoded block. `end` depicts the last byte +occupied by the blob transaction. + +The `pbbt_root` is generated by taking the merkle root over of each of the blob +transactions in `BlobMetaData` and `Have` messasges. + +Verification: + +- The signature MUST be valid using the sign bytes of the compact block and the public key of the expected proposer for that height and + round. + +## Have + +```protobuf= +message HaveParts { + bytes hash = 1; + int64 height = 2; + int32 round = 3; + tendermint.crypto.Proof proof = 4 [(gogoproto.nullable) = false]; +} +``` + +Verification: + +- The merkle proof MUST be verified using the roots included in the + `CompactBlock` for that height and round. If the data is parity data, then it + MUST use the `parity_root`, if the data is original block data, then it MUST + use the `PartSetHeaderRoot`. + +### Want + +```protobuf +message WantParts { + tendermint.libs.bits.BitArray parts = 1 [(gogoproto.nullable) = false]; + int64 height = 2; + int32 round = 3; +} +``` + +## Data + +```protobuf +message RecoveryPart { + int64 height = 1; + int32 round = 2; + uint32 index = 3; + bytes data = 4; +} +``` + +Verification + +- The hash of the bytes in the data field MUST match that of the `Have` message. + +### Parity Data + +Parity data is required for all practical broadcast trees. This becomes +problematic mainly due to the requirement that transactions downloaded before +the block is created need to be used during recovery. Using erasure encoding +means that the data must be chunked in an even size. All transactions in that +chunk must have been downloaded in order to use it alongside parity data to +reconstruct the block. Most scenarios would likely be fine, however it would be +possible for a node to have downloaded a large portion of the block, but have no +complete parts, rendering all of the parity data useless. The way to fix this +while remaining backwards compatible is to still commit over and propagate +parts, but to erasure encode smaller chunks of those parts, aka `SubParts`. + +```go +const ( + SubPartsPerPart uint32 = 32 + SubPartSize = BlockPartSizeBytes / SubPartsPerPart +) + +type Part struct { + Index uint32 `json:"index"` + Bytes cmtbytes.HexBytes `json:"bytes"` + Proof merkle.Proof `json:"proof"` +} + +// SubPart is a portion of a part and block that is used for generating parity +// data. +type SubPart struct { + Index uint32 `json:"index"` + Bytes cmtbytes.HexBytes `json:"bytes"` +} + +// SubPart breaks a block part into smaller equal sized subparts. +func (p *Part) SubParts() []SubPart { + sps := make([]SubPart, SubPartsPerPart) + for i := uint32(0); i < SubPartsPerPart; i++ { + sps[i] = SubPart{ + Index: uint32(i), + Bytes: p.Bytes[i*SubPartSize : (i+1)*SubPartSize], + } + } + return sps +} + +func PartFromSubParts(index uint32, sps []SubPart) *Part { + if len(sps) != int(SubPartsPerPart) { + panic(fmt.Sprintf("invalid number of subparts: %d", len(sps))) + } + b := make([]byte, 0, BlockPartSizeBytes) + for _, sp := range sps { + b = append(b, sp.Bytes...) + } + return &Part{ + Index: index, + Bytes: b, + } +} +```