From 0f9f765d5cbb0989c9363a6b9dca3cb44779ebe7 Mon Sep 17 00:00:00 2001 From: Roman Date: Thu, 25 Aug 2022 21:31:33 -0400 Subject: [PATCH 01/10] docs: state-compatibility documentation --- CONTRIBUTING.md | 131 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b817757a3d1..01144ea8e57 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -333,6 +333,137 @@ There are several steps that go into a major release * Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory. +## Patch and Minor Releases + +### Backport Flow + +For patch and minor releases, we should already have +a release branch available in the repository. For example, +for any v11 release, we have a `v11.x` branch. + +Therefore, for any change made to the `main` and, as long as it is +**state-compatible**, we must backport it to the last major release branch e.g. +`v11.x` when the next major release is v12. + +This helps to minimize the diff for a major upgrade review process. + +Additionally, it helps to safely and incrementally test +state-compatible changes by doing smaller patch releases. Contrary +to a major release, there is always an opportunity to safely downgrade +for any minor or patch release. + +### State-compatibility + +It is critical for the patch and minor releases to be state-machine compatible with +prior releases in the same major version. For example, v11.2.1 must be +compatible with v11.1.0 and v11.0.0. + +This is to ensure **determinism** i.e. that given the same input, the nodes +will always produce the same output. + +State-incompatibility is allowed for major upgrades because all nodes in the network +perform it at the same time. Therefore, after the upgrade, the nodes continue +functioning in the determistic way. + +The state compatibiliy is ensured by taking the app hash of the state and comparing it +to the app hash with the rest of the network. Essentially, app hash is a hash of +hashes of every store's Merkle root. + +For example, at height n, we compute: +`app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)` + +Then, Tendermint ensures that app hash hash of local node matches the app hash +of the network. Please note that the explanation and examples are simplified. + +#### Sources of State-incompatibility + +##### Creating Additional State + +By erronously creating additional state, we can cause the app hash to differ +accross nodes in the network. Therefore, this must be avoided. + +##### Returning Different Errors Given Same Input + +Version A +```go +func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { + if amount.IsNegative() { + return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive or zero") + } + return nil, errors.New("error") +} +``` + +Version B +```go +func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { + if amount.IsNegative() || amount.IsZero() { + return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive") + } + return nil, errors.New("error") +} +``` + +Note that now an amount of 0 can be valid in "Version A". However, not in "Version B". +Therefore, if some nodes are running "Version A" and others are running "Version B", +the final app hash might not be deterministic. + +##### Variability in Gas Usage + +For transaction flows (or any other flow that consumes gas), it is important +that the gas usage is deterministic. + +Currently, gas usage is being Merklized in state. As a result, reordering functions +becomes risky. + +Suppose my gas limit is 2000 and 1600 is used up before entering +`someInternalMethod`. Consider the following: + +```go +func someInternalMethod(ctx sdk.Context) { + object1 := readOnlyFunction1(ctx) # consumes 1000 gas + object2 := readOnlyFunction2(ctx) # consumes 500 gas + doStuff(ctx, object1, object2) +} +``` +- It will run out of gas with `gasUsed = 2600` where 2600 getting merkelized +into the tx results. + +```go +func someInternalMethod(ctx sdk.Context) { + object2 := readOnlyFunction2(ctx) # consumes 500 gas + object1 := readOnlyFunction1(ctx) # consumes 1000 gas + doStuff(ctx, object1, object2) +} +``` +- It will run out of gas with `gasUsed = 2100` where 2100 is getting merkelized +into the tx results. + +Therefore, we introduced a state-incompatibility by merklizing diverging gas +usage. + +##### Network Requests + +It is critical to avoid performing network requests since it is common +for services to be unavailable or rate-limit. As a result, nodes +may get diverging responses, leading to state-breakage. + +##### Randomness + +Another critical property that should be avoided due to the likelyhood +of leading the nodes to resulting in different state. + +#### Parallelism and Shared State + +Threads, Goroutines might preempt differently in different hardware. Therefore, +they should be avoided for the sake of determinism. Additionally, it is hard +to predict when multi-threaded state can be updated. As a result, it can +result it non-determinism. + +##### Hardware Errors + +This is out of the developer's control but is mentioned for completeness. + ### Pre-release auditing process For every module with notable changes, we assign someone who was not a primary author of those changes to review the entire module. From 8ef2f1845d9c7dd3f422d3f1953e36e5cf112e73 Mon Sep 17 00:00:00 2001 From: Roman Date: Thu, 25 Aug 2022 21:41:01 -0400 Subject: [PATCH 02/10] updates --- CONTRIBUTING.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 01144ea8e57..6211adc7e08 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -345,7 +345,7 @@ Therefore, for any change made to the `main` and, as long as it is **state-compatible**, we must backport it to the last major release branch e.g. `v11.x` when the next major release is v12. -This helps to minimize the diff for a major upgrade review process. +This helps to minimize the diff of a major upgrade review process. Additionally, it helps to safely and incrementally test state-compatible changes by doing smaller patch releases. Contrary @@ -363,24 +363,25 @@ will always produce the same output. State-incompatibility is allowed for major upgrades because all nodes in the network perform it at the same time. Therefore, after the upgrade, the nodes continue -functioning in the determistic way. +functioning in a deterministic way. -The state compatibiliy is ensured by taking the app hash of the state and comparing it -to the app hash with the rest of the network. Essentially, app hash is a hash of +The state compatibility is ensured by taking the app hash of the state and comparing it +to the app hash with the rest of the network. Essentially, an app hash is a hash of hashes of every store's Merkle root. For example, at height n, we compute: `app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)` -Then, Tendermint ensures that app hash hash of local node matches the app hash +Then, Tendermint ensures that the app hash of the local node matches the app hash of the network. Please note that the explanation and examples are simplified. #### Sources of State-incompatibility ##### Creating Additional State -By erronously creating additional state, we can cause the app hash to differ -accross nodes in the network. Therefore, this must be avoided. +By erroneously creating database entries that exist in Version A but not in +Version B, we can cause the app hash to differ across nodes running +these versions in the network. Therefore, this must be avoided. ##### Returning Different Errors Given Same Input @@ -413,7 +414,7 @@ the final app hash might not be deterministic. For transaction flows (or any other flow that consumes gas), it is important that the gas usage is deterministic. -Currently, gas usage is being Merklized in state. As a result, reordering functions +Currently, gas usage is being Merklized in the state. As a result, reordering functions becomes risky. Suppose my gas limit is 2000 and 1600 is used up before entering @@ -439,26 +440,25 @@ func someInternalMethod(ctx sdk.Context) { - It will run out of gas with `gasUsed = 2100` where 2100 is getting merkelized into the tx results. -Therefore, we introduced a state-incompatibility by merklizing diverging gas +Therefore, we introduced a state-incompatibility by merklezing diverging gas usage. ##### Network Requests It is critical to avoid performing network requests since it is common for services to be unavailable or rate-limit. As a result, nodes -may get diverging responses, leading to state-breakage. +may get diverging responses, leading to state breakage. ##### Randomness -Another critical property that should be avoided due to the likelyhood -of leading the nodes to resulting in different state. +Another critical property that should be avoided due to the likelihood +of leading the nodes to result in a different state. #### Parallelism and Shared State -Threads, Goroutines might preempt differently in different hardware. Therefore, +Threads and Goroutines might preempt differently in different hardware. Therefore, they should be avoided for the sake of determinism. Additionally, it is hard -to predict when multi-threaded state can be updated. As a result, it can -result it non-determinism. +to predict when the multi-threaded state can be updated. ##### Hardware Errors From 3288e5c8537c4b2764287216b98801ae79b8dad9 Mon Sep 17 00:00:00 2001 From: Roman Akhtariev Date: Fri, 26 Aug 2022 17:12:47 -0400 Subject: [PATCH 03/10] updates --- CONTRIBUTING.md | 114 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 103 insertions(+), 11 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6211adc7e08..6435e55a806 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -365,9 +365,40 @@ State-incompatibility is allowed for major upgrades because all nodes in the net perform it at the same time. Therefore, after the upgrade, the nodes continue functioning in a deterministic way. -The state compatibility is ensured by taking the app hash of the state and comparing it -to the app hash with the rest of the network. Essentially, an app hash is a hash of -hashes of every store's Merkle root. +#### Scope + +The state-machine scope includes the following areas: + +- All messages supported by the chain + +- Transaction gas usage + +- Whitelisted queries + +- All `BeginBlock`/`EndBlock` logic + +The following are **NOT* in the state-machine scope: + +- Events + +- Queries that are not whitelisted + +- CLI interfaces + +#### Validating State-Compatibility + +Tendermint ensures state compatibility by validating a number +of hashes that can be found here: +https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 + +Among the hashes that are commonly affected by our work and cause +problems are the `AppHash` and `LastResultsHash`. To avoid these problems, let's now examine how these hashes work. + +##### App Hash + +Cosmos-SDK takes an app hash of the state, and propagates it to Tendermint which, +in turn, compares it to the app hash of the rest of the network. +An app hash is a hash of hashes of every store's Merkle root. For example, at height n, we compute: `app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)` @@ -375,7 +406,46 @@ For example, at height n, we compute: Then, Tendermint ensures that the app hash of the local node matches the app hash of the network. Please note that the explanation and examples are simplified. -#### Sources of State-incompatibility +##### LastResultsHash + +The `LastResultsHash` today contains: +https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54 + +- Tx `GasWanted` + +- Tx `GasUsed` + +`GasUsed` being merkelized means that we cannot freely reorder methods that consume gas. +We should also be careful of modifying any validation logic since changing the +locations where we error or pass might affect transaction gas usage. + +There are plans to remove this field from being Merkelized in a subsequent Tendermint release, at which point we will have more flexibility in reordering operations / erroring. + +- Tx response `Data` + +The `Data` field includes the proto message response. Therefore, we cannot +change these in patch releases. + +- Tx response `Code` + +This is an error code that is returned by the transaction flow. In the case of +success, it is `0`. On a general error, it is `1`. Additionally, each module +defines its custom error codes. For example, `x/mint` currently has the +following: +https://github.com/osmosis-labs/osmosis/blob/8ef2f1845d9c7dd3f422d3f1953e36e5cf112e73/x/mint/types/errors.go#L8-L10 + +As a result, it is important to avoid changing custom error codes or change +the semantics of what is valid logic in thransaction flows. + +Note that all of the above stem from `DeliverTx` execution path, which handles: + +- `AnteHandler`'s marked as deliver tx +- `msg.ValidateBasic` +- execution of a message from the message server + +The `DeliverTx` return back to the Tendermint is defined [here][1]. + +#### Major Sources of State-incompatibility ##### Creating Additional State @@ -383,6 +453,14 @@ By erroneously creating database entries that exist in Version A but not in Version B, we can cause the app hash to differ across nodes running these versions in the network. Therefore, this must be avoided. +##### Changing Proto Field Definitions + +For example, if we change a field that gets persisted to the database, +the app hash will differ across nodes running these versions in the network. + +Additionally, this affects `LastResultsHash` because it contains a `Data` field that is a marshaled proto message. + + ##### Returning Different Errors Given Same Input Version A @@ -391,7 +469,7 @@ func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { if amount.IsNegative() { return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive or zero") } - return nil, errors.New("error") + return nil } ``` @@ -401,7 +479,7 @@ func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { if amount.IsNegative() || amount.IsZero() { return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive") } - return nil, errors.New("error") + return nil } ``` @@ -409,6 +487,11 @@ Note that now an amount of 0 can be valid in "Version A". However, not in "Versi Therefore, if some nodes are running "Version A" and others are running "Version B", the final app hash might not be deterministic. +Additionally, a different error message does not matter because it +is not included in any hash. However, an error code `sdkerrors.ErrInvalidRequest` does. +It translates to the `Code` field in the `LastResultsHash` and participates in +its validation. + ##### Variability in Gas Usage For transaction flows (or any other flow that consumes gas), it is important @@ -443,18 +526,25 @@ into the tx results. Therefore, we introduced a state-incompatibility by merklezing diverging gas usage. -##### Network Requests +#### Secondary Limitations To Keep In Mind -It is critical to avoid performing network requests since it is common -for services to be unavailable or rate-limit. As a result, nodes -may get diverging responses, leading to state breakage. +##### Network Requests to External Services + +It is critical to avoid performing network requests to external services +since it is common for services to be unavailable or rate-limit. + +Imagine a service that returns exchange rates when clients query its HTTP endpoint. +This service might experience downtime or be restricted in some geographical areas. + +As a result, nodes may get diverging responses where some +get successful responses while others errors, leading to state breakage. ##### Randomness Another critical property that should be avoided due to the likelihood of leading the nodes to result in a different state. -#### Parallelism and Shared State +##### Parallelism and Shared State Threads and Goroutines might preempt differently in different hardware. Therefore, they should be avoided for the sake of determinism. Additionally, it is hard @@ -486,3 +576,5 @@ We test in testnet & e2e testnet behaviors about every message that has changed We communicate with various integrators if they'd like release-blocking QA testing for major releases * Chainapsis has communicated wanting a series of osmosis-frontend functionalities to be checked for correctness on a testnet as a release blocking item + +[1]:https://github.com/cosmos/cosmos-sdk/blob/d11196aad04e57812dbc5ac6248d35375e6603af/baseapp/abci.go#L293-L303 From e2fda364af93e0fa0b3fa6da7a1a167d08b0127e Mon Sep 17 00:00:00 2001 From: Roman Date: Fri, 26 Aug 2022 18:04:01 -0400 Subject: [PATCH 04/10] updates --- CONTRIBUTING.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6435e55a806..c9853ddfb63 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -377,7 +377,7 @@ The state-machine scope includes the following areas: - All `BeginBlock`/`EndBlock` logic -The following are **NOT* in the state-machine scope: +The following are **NOT** in the state-machine scope: - Events @@ -388,18 +388,19 @@ The following are **NOT* in the state-machine scope: #### Validating State-Compatibility Tendermint ensures state compatibility by validating a number -of hashes that can be found here: -https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 +of hashes that can be found [here][2]. -Among the hashes that are commonly affected by our work and cause -problems are the `AppHash` and `LastResultsHash`. To avoid these problems, let's now examine how these hashes work. +`AppHash` and `LastResultsHash` are the common sources of problems stemmnig from our work. +To avoid these problems, let's now examine how these hashes work. ##### App Hash -Cosmos-SDK takes an app hash of the state, and propagates it to Tendermint which, -in turn, compares it to the app hash of the rest of the network. An app hash is a hash of hashes of every store's Merkle root. +Cosmos-SDK takes an app hash of the application state, and propagates +it to Tendermint which, in turn, compares it to the app hash of the +rest of the network. + For example, at height n, we compute: `app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)` @@ -408,6 +409,9 @@ of the network. Please note that the explanation and examples are simplified. ##### LastResultsHash +`LastResultsHash` is the root hash of all results from the transactions +in the block returned by the ABCI's `DeliverTx`. + The `LastResultsHash` today contains: https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54 @@ -578,3 +582,4 @@ We communicate with various integrators if they'd like release-blocking QA testi * Chainapsis has communicated wanting a series of osmosis-frontend functionalities to be checked for correctness on a testnet as a release blocking item [1]:https://github.com/cosmos/cosmos-sdk/blob/d11196aad04e57812dbc5ac6248d35375e6603af/baseapp/abci.go#L293-L303 +[2]https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 From 7b365b1d6b6346f82c0e6aca95290f2d291ecac8 Mon Sep 17 00:00:00 2001 From: Roman Date: Fri, 26 Aug 2022 18:09:22 -0400 Subject: [PATCH 05/10] updates --- CONTRIBUTING.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c9853ddfb63..b98299fc1bb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -397,7 +397,7 @@ To avoid these problems, let's now examine how these hashes work. An app hash is a hash of hashes of every store's Merkle root. -Cosmos-SDK takes an app hash of the application state, and propagates +Cosmos-SDK [takes an app hash of the application state][4], and propagates it to Tendermint which, in turn, compares it to the app hash of the rest of the network. @@ -412,8 +412,7 @@ of the network. Please note that the explanation and examples are simplified. `LastResultsHash` is the root hash of all results from the transactions in the block returned by the ABCI's `DeliverTx`. -The `LastResultsHash` today contains: -https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54 +The [`LastResultsHash`][3] today contains: - Tx `GasWanted` @@ -582,4 +581,6 @@ We communicate with various integrators if they'd like release-blocking QA testi * Chainapsis has communicated wanting a series of osmosis-frontend functionalities to be checked for correctness on a testnet as a release blocking item [1]:https://github.com/cosmos/cosmos-sdk/blob/d11196aad04e57812dbc5ac6248d35375e6603af/baseapp/abci.go#L293-L303 -[2]https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 +[2]:https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 +[3]:https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54 +[4]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/rootmulti/store.go#L430 From 51b37a35cc46a2a10c67aa4b0e9bc70c2620e234 Mon Sep 17 00:00:00 2001 From: Roman Date: Fri, 26 Aug 2022 18:11:47 -0400 Subject: [PATCH 06/10] updates --- CONTRIBUTING.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b98299fc1bb..2a706b8563b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -395,13 +395,14 @@ To avoid these problems, let's now examine how these hashes work. ##### App Hash -An app hash is a hash of hashes of every store's Merkle root. +An app hash is a hash of hashes of every store's Merkle root that is returned by +ABCI's `Commit()` from Cosmos-SDK to Tendermint. Cosmos-SDK [takes an app hash of the application state][4], and propagates it to Tendermint which, in turn, compares it to the app hash of the rest of the network. -For example, at height n, we compute: +For example, at height n, [we compute][5]: `app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)` Then, Tendermint ensures that the app hash of the local node matches the app hash @@ -584,3 +585,4 @@ We communicate with various integrators if they'd like release-blocking QA testi [2]:https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 [3]:https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54 [4]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/rootmulti/store.go#L430 +[5]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/types/commit_info.go#L40 From 37c3f7f2067e41e9dd494ddcb78677dedfd926ab Mon Sep 17 00:00:00 2001 From: Roman Date: Fri, 26 Aug 2022 18:14:13 -0400 Subject: [PATCH 07/10] updates --- CONTRIBUTING.md | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2a706b8563b..808f497a34d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -415,23 +415,20 @@ in the block returned by the ABCI's `DeliverTx`. The [`LastResultsHash`][3] today contains: -- Tx `GasWanted` - -- Tx `GasUsed` +1. Tx `GasWanted` +2. Tx `GasUsed` `GasUsed` being merkelized means that we cannot freely reorder methods that consume gas. We should also be careful of modifying any validation logic since changing the locations where we error or pass might affect transaction gas usage. There are plans to remove this field from being Merkelized in a subsequent Tendermint release, at which point we will have more flexibility in reordering operations / erroring. -- Tx response `Data` - +3. Tx response `Data` The `Data` field includes the proto message response. Therefore, we cannot change these in patch releases. -- Tx response `Code` - +4. Tx response `Code` This is an error code that is returned by the transaction flow. In the case of success, it is `0`. On a general error, it is `1`. Additionally, each module defines its custom error codes. For example, `x/mint` currently has the From 19dc662437ef14924882832ebf102f5303e92ffc Mon Sep 17 00:00:00 2001 From: Roman Date: Fri, 26 Aug 2022 18:15:28 -0400 Subject: [PATCH 08/10] updates --- CONTRIBUTING.md | 49 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 17 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 808f497a34d..4f24ba7e2af 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -30,6 +30,7 @@ To contribute a change proposal, use the following workflow: 1. Commit your changes. Write a simple, straightforward commit message. To learn more, see [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/). 2. Push your changes to your remote fork. To add your remote, you can copy/paste the following: + ```sh #Remove origin @@ -53,6 +54,7 @@ To contribute a change proposal, use the following workflow: #e.g. git push my_awesome_new_remote_repo ``` + 3. Create a PR on the Osmosis repository. There should be a PR template to help you do so. 4. Wait for your changes to be reviewed. If you are a maintainer, you can assign your PR to one or more reviewers. If you aren't a maintainer, one of the maintainers will assign a reviewer. 5. After you receive feedback from a reviewer, make the requested changes, commit them to your branch, and push them to your remote fork again. @@ -71,6 +73,7 @@ We'll lay out three examples below (one that uses our format for messages, one t To simplify (and speed up) the process of writing unit tests that fit our standard, we have put together a Gotest template that allows you to easily generate unit tests using built-in functionality for the Vscode Go plugin (complete with parameters listed, basic error checking logic etc.). The following two sections lay out how to generate a unit test automatically using this method. #### 1. Setup + Note: this section assumes you already have the Go plugin for Vscode installed. Please refer to our [IDE setup docs](https://docs.osmosis.zone/developing/osmosis-core/ide-guide.html) if you haven't done any IDE setup yet. Copy the `templates` folder into your `.vscode` folder from our main repo [here](https://github.com/osmosis-labs/osmosis/tree/main/.vscode). This folder has our custom templates for generating tests that fit our testing standards as accurately as possible. @@ -83,9 +86,11 @@ Then, go to your `settings.json` file in your `.vscode` folder and add the follo "[ABSOLUTE PATH TO TEMPLATES FOLDER]" ], ``` + where `"[ABSOLUTE PATH TO TEMPLATES FOLDER]"` should look something like: `"User/ExampleUser/osmosis/.vscode/templates"` #### 2. Generating a unit test + On the function you want to generate a unit test for, right click the function name and select `Go: Generate Unit Tests For Function`. This should take you to an automatically generated template test in the corresponding test file for the file your function is in. If there isn't a corresponding test file, it should automatically generate one, define its package and imports, and generate the test there. ### Rules of thumb for table-driven tests @@ -96,6 +101,7 @@ On the function you want to generate a unit test for, right click the function n 4. Avoid verbosity by creating local variables instead of constantly referring to struct field (e.g. doing `lockupKeeper := suite.App.LockupKeeper` instead of using `suite.App.LockupKeeper` every time). ### Example #1: [Message-Related Test] + This type of test is mainly for functions that would be triggered by incoming messages (we interact directly with the message server since all other metadata is stripped from a message by the time it hits the msg_server): ```go @@ -142,7 +148,9 @@ func(suite *KeeperTestSuite) TestCreateDenom() { } } ``` + ### Example #2: [Keeper-Related Test] + This type of test is mainly for functions that would be triggered by other modules calling public keeper methods (or just to unit-test keeper methods in general): ```go @@ -180,6 +188,7 @@ func(suite *KeeperTestSuite) TestMintExportGenesis() { ``` ### Example #3: [Gamm-Related Test] + Since the GAMM module is core to the Osmosis repo, it might be useful to have a good example of a well-structured GAMM-specific test. This example covers a simple getter function and validates the specific error messages around the function (as opposed to merely the presence of an error): ```go @@ -245,10 +254,10 @@ The file `e2e_setup_test.go` defines the testing suite and contains the core boo The file `e2e_test.go` contains the actual end-to-end integration tests that utilize the testing suite. - Additionally, there is an ability to disable certain components of the e2e suite. This can be done by setting the environment variables. See the [E2E test docs](https://github.com/osmosis-labs/osmosis/blob/main/tests/e2e/README.md) or more details. To get started: + - Run `make test-e2e` - Inspect the logs of the docker containers and see if something it’s there - `docker ps -a #` to list all docker containers @@ -317,21 +326,21 @@ You can also feel free to do `make format` if you're getting errors related to ` There are several steps that go into a major release -* The GitHub release is created via this [GitHub workflow](https://github.com/osmosis-labs/osmosis/blob/main/.github/workflows/release.yml). The workflow is manually triggered from the [osmosis-ci repository](https://github.com/osmosis-labs/osmosis-ci). The workflow uses the `make build-reproducible` command to create the `osmosisd` binaries using the default [Makefile](https://github.com/osmosis-labs/osmosis/blob/main/Makefile#L99). +- The GitHub release is created via this [GitHub workflow](https://github.com/osmosis-labs/osmosis/blob/main/.github/workflows/release.yml). The workflow is manually triggered from the [osmosis-ci repository](https://github.com/osmosis-labs/osmosis-ci). The workflow uses the `make build-reproducible` command to create the `osmosisd` binaries using the default [Makefile](https://github.com/osmosis-labs/osmosis/blob/main/Makefile#L99). -* Make a PR to main, with a cosmovisor config, generated in tandem with the binaries from tool. - * Should be its own PR, as it may get denied for Fork upgrades. +- Make a PR to main, with a cosmovisor config, generated in tandem with the binaries from tool. + - Should be its own PR, as it may get denied for Fork upgrades. -* Make a PR to main to update the import paths and go.mod for the new major release +- Make a PR to main to update the import paths and go.mod for the new major release -* Should also make a commit into every open PR to main to do the same find/replace. (Unless this will cause conflicts) +- Should also make a commit into every open PR to main to do the same find/replace. (Unless this will cause conflicts) -* Do a PR if that commit has conflicts +- Do a PR if that commit has conflicts -* (Eventually) Make a PR that adds a version handler for the next upgrade - * [Add v10 upgrade boilerplate #1649](https://github.com/osmosis-labs/osmosis/pull/1649/files) +- (Eventually) Make a PR that adds a version handler for the next upgrade + - [Add v10 upgrade boilerplate #1649](https://github.com/osmosis-labs/osmosis/pull/1649/files) -* Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory. +- Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory. ## Patch and Minor Releases @@ -418,6 +427,7 @@ The [`LastResultsHash`][3] today contains: 1. Tx `GasWanted` 2. Tx `GasUsed` + `GasUsed` being merkelized means that we cannot freely reorder methods that consume gas. We should also be careful of modifying any validation logic since changing the locations where we error or pass might affect transaction gas usage. @@ -425,15 +435,17 @@ locations where we error or pass might affect transaction gas usage. There are plans to remove this field from being Merkelized in a subsequent Tendermint release, at which point we will have more flexibility in reordering operations / erroring. 3. Tx response `Data` + The `Data` field includes the proto message response. Therefore, we cannot change these in patch releases. 4. Tx response `Code` + This is an error code that is returned by the transaction flow. In the case of success, it is `0`. On a general error, it is `1`. Additionally, each module defines its custom error codes. For example, `x/mint` currently has the following: -https://github.com/osmosis-labs/osmosis/blob/8ef2f1845d9c7dd3f422d3f1953e36e5cf112e73/x/mint/types/errors.go#L8-L10 + As a result, it is important to avoid changing custom error codes or change the semantics of what is valid logic in thransaction flows. @@ -461,10 +473,10 @@ the app hash will differ across nodes running these versions in the network. Additionally, this affects `LastResultsHash` because it contains a `Data` field that is a marshaled proto message. - ##### Returning Different Errors Given Same Input Version A + ```go func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { if amount.IsNegative() { @@ -475,6 +487,7 @@ func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { ``` Version B + ```go func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { if amount.IsNegative() || amount.IsZero() { @@ -511,6 +524,7 @@ func someInternalMethod(ctx sdk.Context) { doStuff(ctx, object1, object2) } ``` + - It will run out of gas with `gasUsed = 2600` where 2600 getting merkelized into the tx results. @@ -521,6 +535,7 @@ func someInternalMethod(ctx sdk.Context) { doStuff(ctx, object1, object2) } ``` + - It will run out of gas with `gasUsed = 2100` where 2100 is getting merkelized into the tx results. @@ -561,15 +576,15 @@ For every module with notable changes, we assign someone who was not a primary a Deliverables of review are: -* PR's with in-line code comments for things they had to figure out (or questions) +- PR's with in-line code comments for things they had to figure out (or questions) -* Tests / test comments needed to convince themselves of correctness +- Tests / test comments needed to convince themselves of correctness -* Spec updates +- Spec updates -* Small refactors that helped in understanding / making code conform to consistency stds / improve code signal-to-noise ratio welcome +- Small refactors that helped in understanding / making code conform to consistency stds / improve code signal-to-noise ratio welcome -* (As with all PRs, should not be a monolithic PR that gets PR'd, even though that may be the natural way its first formed) +- (As with all PRs, should not be a monolithic PR that gets PR'd, even though that may be the natural way its first formed) At the moment, we're looking for a tool that lets us statically figure out every message that had something in its code path that changed. Until a tool is found, we must do this manually. From 213ae0ad0b1fa0f6abc5d4fc901ed85156a22289 Mon Sep 17 00:00:00 2001 From: Dev Ojha Date: Sat, 27 Aug 2022 20:08:58 +0200 Subject: [PATCH 09/10] Minor changes --- CONTRIBUTING.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4f24ba7e2af..b880cc1d8f5 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -378,20 +378,20 @@ functioning in a deterministic way. The state-machine scope includes the following areas: -- All messages supported by the chain - -- Transaction gas usage - +- All messages supported by the chain, including: + - Every msg's ValidateBasic method + - Every msg's MsgServer method + - Net gas usage, in all execution paths (WIP to make this not include error flows) + - Error result returned (TODO: Make the error code not merkelized) + - State changes (namely every store write) +- AnteHandlers in "DeliverTx" mode - Whitelisted queries - - All `BeginBlock`/`EndBlock` logic The following are **NOT** in the state-machine scope: - Events - - Queries that are not whitelisted - - CLI interfaces #### Validating State-Compatibility @@ -557,8 +557,10 @@ get successful responses while others errors, leading to state breakage. ##### Randomness -Another critical property that should be avoided due to the likelihood -of leading the nodes to result in a different state. +Randomness cannot be used in the state machine, as the state machine definitionally must be deterministic. +Any time you'd consider using it, instead seed a CSPRNG off of some seed. + +One thing to note is that in golang, iteration order over `map`s is non-deterministic, so to be deterministic you must gather the keys, and sort them all prior to iterating over all values. ##### Parallelism and Shared State From 266ca5c895053f9f044bcd151670ec822e2efc01 Mon Sep 17 00:00:00 2001 From: Dev Ojha Date: Sat, 27 Aug 2022 20:14:01 +0200 Subject: [PATCH 10/10] Fix MD lint --- CONTRIBUTING.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b880cc1d8f5..0720c3c42e1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -379,11 +379,11 @@ functioning in a deterministic way. The state-machine scope includes the following areas: - All messages supported by the chain, including: - - Every msg's ValidateBasic method - - Every msg's MsgServer method - - Net gas usage, in all execution paths (WIP to make this not include error flows) - - Error result returned (TODO: Make the error code not merkelized) - - State changes (namely every store write) + - Every msg's ValidateBasic method + - Every msg's MsgServer method + - Net gas usage, in all execution paths (WIP to make this not include error flows) + - Error result returned (TODO: Make the error code not merkelized) + - State changes (namely every store write) - AnteHandlers in "DeliverTx" mode - Whitelisted queries - All `BeginBlock`/`EndBlock` logic