Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: state-compatibility documentation #2514

Merged
merged 10 commits into from
Aug 29, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 260 additions & 15 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ To contribute a change proposal, use the following workflow:

1. Commit your changes. Write a simple, straightforward commit message. To learn more, see [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/).
2. Push your changes to your remote fork. To add your remote, you can copy/paste the following:

```sh

#Remove origin
Expand All @@ -53,6 +54,7 @@ To contribute a change proposal, use the following workflow:

#e.g. git push my_awesome_new_remote_repo
```

3. Create a PR on the Osmosis repository. There should be a PR template to help you do so.
4. Wait for your changes to be reviewed. If you are a maintainer, you can assign your PR to one or more reviewers. If you aren't a maintainer, one of the maintainers will assign a reviewer.
5. After you receive feedback from a reviewer, make the requested changes, commit them to your branch, and push them to your remote fork again.
Expand All @@ -71,6 +73,7 @@ We'll lay out three examples below (one that uses our format for messages, one t
To simplify (and speed up) the process of writing unit tests that fit our standard, we have put together a Gotest template that allows you to easily generate unit tests using built-in functionality for the Vscode Go plugin (complete with parameters listed, basic error checking logic etc.). The following two sections lay out how to generate a unit test automatically using this method.

#### 1. Setup

Note: this section assumes you already have the Go plugin for Vscode installed. Please refer to our [IDE setup docs](https://docs.osmosis.zone/developing/osmosis-core/ide-guide.html) if you haven't done any IDE setup yet.

Copy the `templates` folder into your `.vscode` folder from our main repo [here](https://github.com/osmosis-labs/osmosis/tree/main/.vscode). This folder has our custom templates for generating tests that fit our testing standards as accurately as possible.
Expand All @@ -83,9 +86,11 @@ Then, go to your `settings.json` file in your `.vscode` folder and add the follo
"[ABSOLUTE PATH TO TEMPLATES FOLDER]"
],
```

where `"[ABSOLUTE PATH TO TEMPLATES FOLDER]"` should look something like: `"User/ExampleUser/osmosis/.vscode/templates"`

#### 2. Generating a unit test

On the function you want to generate a unit test for, right click the function name and select `Go: Generate Unit Tests For Function`. This should take you to an automatically generated template test in the corresponding test file for the file your function is in. If there isn't a corresponding test file, it should automatically generate one, define its package and imports, and generate the test there.

### Rules of thumb for table-driven tests
Expand All @@ -96,6 +101,7 @@ On the function you want to generate a unit test for, right click the function n
4. Avoid verbosity by creating local variables instead of constantly referring to struct field (e.g. doing `lockupKeeper := suite.App.LockupKeeper` instead of using `suite.App.LockupKeeper` every time).

### Example #1: [Message-Related Test]

This type of test is mainly for functions that would be triggered by incoming messages (we interact directly with the message server since all other metadata is stripped from a message by the time it hits the msg_server):

```go
Expand Down Expand Up @@ -142,7 +148,9 @@ func(suite *KeeperTestSuite) TestCreateDenom() {
}
}
```

### Example #2: [Keeper-Related Test]

This type of test is mainly for functions that would be triggered by other modules calling public keeper methods (or just to unit-test keeper methods in general):

```go
Expand Down Expand Up @@ -180,6 +188,7 @@ func(suite *KeeperTestSuite) TestMintExportGenesis() {
```

### Example #3: [Gamm-Related Test]

Since the GAMM module is core to the Osmosis repo, it might be useful to have a good example of a well-structured GAMM-specific test. This example covers a simple getter function and validates the specific error messages around the function (as opposed to merely the presence of an error):

```go
Expand Down Expand Up @@ -245,10 +254,10 @@ The file `e2e_setup_test.go` defines the testing suite and contains the core boo

The file `e2e_test.go` contains the actual end-to-end integration tests that utilize the testing suite.


Additionally, there is an ability to disable certain components of the e2e suite. This can be done by setting the environment variables. See the [E2E test docs](https://github.com/osmosis-labs/osmosis/blob/main/tests/e2e/README.md) or more details.

To get started:

- Run `make test-e2e`
- Inspect the logs of the docker containers and see if something it’s there
- `docker ps -a #` to list all docker containers
Expand Down Expand Up @@ -317,41 +326,277 @@ You can also feel free to do `make format` if you're getting errors related to `

There are several steps that go into a major release

* The GitHub release is created via this [GitHub workflow](https://github.com/osmosis-labs/osmosis/blob/main/.github/workflows/release.yml). The workflow is manually triggered from the [osmosis-ci repository](https://github.com/osmosis-labs/osmosis-ci). The workflow uses the `make build-reproducible` command to create the `osmosisd` binaries using the default [Makefile](https://github.com/osmosis-labs/osmosis/blob/main/Makefile#L99).
- The GitHub release is created via this [GitHub workflow](https://github.com/osmosis-labs/osmosis/blob/main/.github/workflows/release.yml). The workflow is manually triggered from the [osmosis-ci repository](https://github.com/osmosis-labs/osmosis-ci). The workflow uses the `make build-reproducible` command to create the `osmosisd` binaries using the default [Makefile](https://github.com/osmosis-labs/osmosis/blob/main/Makefile#L99).

- Make a PR to main, with a cosmovisor config, generated in tandem with the binaries from tool.
- Should be its own PR, as it may get denied for Fork upgrades.

- Make a PR to main to update the import paths and go.mod for the new major release

- Should also make a commit into every open PR to main to do the same find/replace. (Unless this will cause conflicts)

- Do a PR if that commit has conflicts

- (Eventually) Make a PR that adds a version handler for the next upgrade
- [Add v10 upgrade boilerplate #1649](https://github.com/osmosis-labs/osmosis/pull/1649/files)

- Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory.

## Patch and Minor Releases

### Backport Flow

For patch and minor releases, we should already have
a release branch available in the repository. For example,
for any v11 release, we have a `v11.x` branch.

Therefore, for any change made to the `main` and, as long as it is
**state-compatible**, we must backport it to the last major release branch e.g.
`v11.x` when the next major release is v12.

This helps to minimize the diff of a major upgrade review process.

Additionally, it helps to safely and incrementally test
state-compatible changes by doing smaller patch releases. Contrary
to a major release, there is always an opportunity to safely downgrade
for any minor or patch release.

### State-compatibility

It is critical for the patch and minor releases to be state-machine compatible with
prior releases in the same major version. For example, v11.2.1 must be
compatible with v11.1.0 and v11.0.0.

This is to ensure **determinism** i.e. that given the same input, the nodes
will always produce the same output.

State-incompatibility is allowed for major upgrades because all nodes in the network
perform it at the same time. Therefore, after the upgrade, the nodes continue
functioning in a deterministic way.

#### Scope

The state-machine scope includes the following areas:

- All messages supported by the chain, including:
- Every msg's ValidateBasic method
- Every msg's MsgServer method
- Net gas usage, in all execution paths (WIP to make this not include error flows)
- Error result returned (TODO: Make the error code not merkelized)
- State changes (namely every store write)
- AnteHandlers in "DeliverTx" mode
- Whitelisted queries
- All `BeginBlock`/`EndBlock` logic

The following are **NOT** in the state-machine scope:

- Events
- Queries that are not whitelisted
- CLI interfaces

#### Validating State-Compatibility

Tendermint ensures state compatibility by validating a number
of hashes that can be found [here][2].

`AppHash` and `LastResultsHash` are the common sources of problems stemmnig from our work.
To avoid these problems, let's now examine how these hashes work.

##### App Hash

An app hash is a hash of hashes of every store's Merkle root that is returned by
ABCI's `Commit()` from Cosmos-SDK to Tendermint.

Cosmos-SDK [takes an app hash of the application state][4], and propagates
it to Tendermint which, in turn, compares it to the app hash of the
rest of the network.

For example, at height n, [we compute][5]:
`app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)`

Then, Tendermint ensures that the app hash of the local node matches the app hash
of the network. Please note that the explanation and examples are simplified.

##### LastResultsHash

`LastResultsHash` is the root hash of all results from the transactions
in the block returned by the ABCI's `DeliverTx`.

The [`LastResultsHash`][3] today contains:

1. Tx `GasWanted`

2. Tx `GasUsed`

`GasUsed` being merkelized means that we cannot freely reorder methods that consume gas.
We should also be careful of modifying any validation logic since changing the
locations where we error or pass might affect transaction gas usage.

There are plans to remove this field from being Merkelized in a subsequent Tendermint release, at which point we will have more flexibility in reordering operations / erroring.

3. Tx response `Data`

The `Data` field includes the proto message response. Therefore, we cannot
change these in patch releases.

* Make a PR to main, with a cosmovisor config, generated in tandem with the binaries from tool.
* Should be its own PR, as it may get denied for Fork upgrades.
4. Tx response `Code`

* Make a PR to main to update the import paths and go.mod for the new major release
This is an error code that is returned by the transaction flow. In the case of
success, it is `0`. On a general error, it is `1`. Additionally, each module
defines its custom error codes. For example, `x/mint` currently has the
following:
<https://github.com/osmosis-labs/osmosis/blob/8ef2f1845d9c7dd3f422d3f1953e36e5cf112e73/x/mint/types/errors.go#L8-L10>

* Should also make a commit into every open PR to main to do the same find/replace. (Unless this will cause conflicts)
As a result, it is important to avoid changing custom error codes or change
the semantics of what is valid logic in thransaction flows.

* Do a PR if that commit has conflicts
Note that all of the above stem from `DeliverTx` execution path, which handles:

* (Eventually) Make a PR that adds a version handler for the next upgrade
* [Add v10 upgrade boilerplate #1649](https://github.com/osmosis-labs/osmosis/pull/1649/files)
- `AnteHandler`'s marked as deliver tx
- `msg.ValidateBasic`
- execution of a message from the message server

* Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory.
The `DeliverTx` return back to the Tendermint is defined [here][1].

#### Major Sources of State-incompatibility

##### Creating Additional State
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also handle Changing Existing State(Ex changing of existing proto field)


By erroneously creating database entries that exist in Version A but not in
Version B, we can cause the app hash to differ across nodes running
these versions in the network. Therefore, this must be avoided.

##### Changing Proto Field Definitions

For example, if we change a field that gets persisted to the database,
the app hash will differ across nodes running these versions in the network.

Additionally, this affects `LastResultsHash` because it contains a `Data` field that is a marshaled proto message.

##### Returning Different Errors Given Same Input

Version A

```go
func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error {
if amount.IsNegative() {
return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive or zero")
}
return nil
}
```

Version B

```go
func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error {
if amount.IsNegative() || amount.IsZero() {
return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive")
}
return nil
}
```

Note that now an amount of 0 can be valid in "Version A". However, not in "Version B".
Therefore, if some nodes are running "Version A" and others are running "Version B",
the final app hash might not be deterministic.

Additionally, a different error message does not matter because it
is not included in any hash. However, an error code `sdkerrors.ErrInvalidRequest` does.
It translates to the `Code` field in the `LastResultsHash` and participates in
its validation.

##### Variability in Gas Usage

For transaction flows (or any other flow that consumes gas), it is important
that the gas usage is deterministic.

Currently, gas usage is being Merklized in the state. As a result, reordering functions
becomes risky.

Suppose my gas limit is 2000 and 1600 is used up before entering
`someInternalMethod`. Consider the following:

```go
func someInternalMethod(ctx sdk.Context) {
object1 := readOnlyFunction1(ctx) # consumes 1000 gas
object2 := readOnlyFunction2(ctx) # consumes 500 gas
doStuff(ctx, object1, object2)
}
```

- It will run out of gas with `gasUsed = 2600` where 2600 getting merkelized
into the tx results.

```go
func someInternalMethod(ctx sdk.Context) {
object2 := readOnlyFunction2(ctx) # consumes 500 gas
object1 := readOnlyFunction1(ctx) # consumes 1000 gas
doStuff(ctx, object1, object2)
}
```

- It will run out of gas with `gasUsed = 2100` where 2100 is getting merkelized
into the tx results.

Therefore, we introduced a state-incompatibility by merklezing diverging gas
usage.

#### Secondary Limitations To Keep In Mind

##### Network Requests to External Services

It is critical to avoid performing network requests to external services
since it is common for services to be unavailable or rate-limit.

Imagine a service that returns exchange rates when clients query its HTTP endpoint.
This service might experience downtime or be restricted in some geographical areas.

As a result, nodes may get diverging responses where some
get successful responses while others errors, leading to state breakage.

##### Randomness

Randomness cannot be used in the state machine, as the state machine definitionally must be deterministic.
Any time you'd consider using it, instead seed a CSPRNG off of some seed.

One thing to note is that in golang, iteration order over `map`s is non-deterministic, so to be deterministic you must gather the keys, and sort them all prior to iterating over all values.

##### Parallelism and Shared State

Threads and Goroutines might preempt differently in different hardware. Therefore,
they should be avoided for the sake of determinism. Additionally, it is hard
to predict when the multi-threaded state can be updated.

##### Hardware Errors

This is out of the developer's control but is mentioned for completeness.

### Pre-release auditing process

For every module with notable changes, we assign someone who was not a primary author of those changes to review the entire module.

Deliverables of review are:

* PR's with in-line code comments for things they had to figure out (or questions)
- PR's with in-line code comments for things they had to figure out (or questions)

* Tests / test comments needed to convince themselves of correctness
- Tests / test comments needed to convince themselves of correctness

* Spec updates
- Spec updates

* Small refactors that helped in understanding / making code conform to consistency stds / improve code signal-to-noise ratio welcome
- Small refactors that helped in understanding / making code conform to consistency stds / improve code signal-to-noise ratio welcome

* (As with all PRs, should not be a monolithic PR that gets PR'd, even though that may be the natural way its first formed)
- (As with all PRs, should not be a monolithic PR that gets PR'd, even though that may be the natural way its first formed)

At the moment, we're looking for a tool that lets us statically figure out every message that had something in its code path that changed. Until a tool is found, we must do this manually.

We test in testnet & e2e testnet behaviors about every message that has changed

We communicate with various integrators if they'd like release-blocking QA testing for major releases
* Chainapsis has communicated wanting a series of osmosis-frontend functionalities to be checked for correctness on a testnet as a release blocking item

[1]:https://github.com/cosmos/cosmos-sdk/blob/d11196aad04e57812dbc5ac6248d35375e6603af/baseapp/abci.go#L293-L303
[2]:https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77
[3]:https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54
[4]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/rootmulti/store.go#L430
[5]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/types/commit_info.go#L40