diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b817757a3d1..0720c3c42e1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -30,6 +30,7 @@ To contribute a change proposal, use the following workflow: 1. Commit your changes. Write a simple, straightforward commit message. To learn more, see [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/). 2. Push your changes to your remote fork. To add your remote, you can copy/paste the following: + ```sh #Remove origin @@ -53,6 +54,7 @@ To contribute a change proposal, use the following workflow: #e.g. git push my_awesome_new_remote_repo ``` + 3. Create a PR on the Osmosis repository. There should be a PR template to help you do so. 4. Wait for your changes to be reviewed. If you are a maintainer, you can assign your PR to one or more reviewers. If you aren't a maintainer, one of the maintainers will assign a reviewer. 5. After you receive feedback from a reviewer, make the requested changes, commit them to your branch, and push them to your remote fork again. @@ -71,6 +73,7 @@ We'll lay out three examples below (one that uses our format for messages, one t To simplify (and speed up) the process of writing unit tests that fit our standard, we have put together a Gotest template that allows you to easily generate unit tests using built-in functionality for the Vscode Go plugin (complete with parameters listed, basic error checking logic etc.). The following two sections lay out how to generate a unit test automatically using this method. #### 1. Setup + Note: this section assumes you already have the Go plugin for Vscode installed. Please refer to our [IDE setup docs](https://docs.osmosis.zone/developing/osmosis-core/ide-guide.html) if you haven't done any IDE setup yet. Copy the `templates` folder into your `.vscode` folder from our main repo [here](https://github.com/osmosis-labs/osmosis/tree/main/.vscode). This folder has our custom templates for generating tests that fit our testing standards as accurately as possible. @@ -83,9 +86,11 @@ Then, go to your `settings.json` file in your `.vscode` folder and add the follo "[ABSOLUTE PATH TO TEMPLATES FOLDER]" ], ``` + where `"[ABSOLUTE PATH TO TEMPLATES FOLDER]"` should look something like: `"User/ExampleUser/osmosis/.vscode/templates"` #### 2. Generating a unit test + On the function you want to generate a unit test for, right click the function name and select `Go: Generate Unit Tests For Function`. This should take you to an automatically generated template test in the corresponding test file for the file your function is in. If there isn't a corresponding test file, it should automatically generate one, define its package and imports, and generate the test there. ### Rules of thumb for table-driven tests @@ -96,6 +101,7 @@ On the function you want to generate a unit test for, right click the function n 4. Avoid verbosity by creating local variables instead of constantly referring to struct field (e.g. doing `lockupKeeper := suite.App.LockupKeeper` instead of using `suite.App.LockupKeeper` every time). ### Example #1: [Message-Related Test] + This type of test is mainly for functions that would be triggered by incoming messages (we interact directly with the message server since all other metadata is stripped from a message by the time it hits the msg_server): ```go @@ -142,7 +148,9 @@ func(suite *KeeperTestSuite) TestCreateDenom() { } } ``` + ### Example #2: [Keeper-Related Test] + This type of test is mainly for functions that would be triggered by other modules calling public keeper methods (or just to unit-test keeper methods in general): ```go @@ -180,6 +188,7 @@ func(suite *KeeperTestSuite) TestMintExportGenesis() { ``` ### Example #3: [Gamm-Related Test] + Since the GAMM module is core to the Osmosis repo, it might be useful to have a good example of a well-structured GAMM-specific test. This example covers a simple getter function and validates the specific error messages around the function (as opposed to merely the presence of an error): ```go @@ -245,10 +254,10 @@ The file `e2e_setup_test.go` defines the testing suite and contains the core boo The file `e2e_test.go` contains the actual end-to-end integration tests that utilize the testing suite. - Additionally, there is an ability to disable certain components of the e2e suite. This can be done by setting the environment variables. See the [E2E test docs](https://github.com/osmosis-labs/osmosis/blob/main/tests/e2e/README.md) or more details. To get started: + - Run `make test-e2e` - Inspect the logs of the docker containers and see if something it’s there - `docker ps -a #` to list all docker containers @@ -317,21 +326,251 @@ You can also feel free to do `make format` if you're getting errors related to ` There are several steps that go into a major release -* The GitHub release is created via this [GitHub workflow](https://github.com/osmosis-labs/osmosis/blob/main/.github/workflows/release.yml). The workflow is manually triggered from the [osmosis-ci repository](https://github.com/osmosis-labs/osmosis-ci). The workflow uses the `make build-reproducible` command to create the `osmosisd` binaries using the default [Makefile](https://github.com/osmosis-labs/osmosis/blob/main/Makefile#L99). +- The GitHub release is created via this [GitHub workflow](https://github.com/osmosis-labs/osmosis/blob/main/.github/workflows/release.yml). The workflow is manually triggered from the [osmosis-ci repository](https://github.com/osmosis-labs/osmosis-ci). The workflow uses the `make build-reproducible` command to create the `osmosisd` binaries using the default [Makefile](https://github.com/osmosis-labs/osmosis/blob/main/Makefile#L99). + +- Make a PR to main, with a cosmovisor config, generated in tandem with the binaries from tool. + - Should be its own PR, as it may get denied for Fork upgrades. + +- Make a PR to main to update the import paths and go.mod for the new major release + +- Should also make a commit into every open PR to main to do the same find/replace. (Unless this will cause conflicts) + +- Do a PR if that commit has conflicts + +- (Eventually) Make a PR that adds a version handler for the next upgrade + - [Add v10 upgrade boilerplate #1649](https://github.com/osmosis-labs/osmosis/pull/1649/files) + +- Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory. + +## Patch and Minor Releases + +### Backport Flow + +For patch and minor releases, we should already have +a release branch available in the repository. For example, +for any v11 release, we have a `v11.x` branch. + +Therefore, for any change made to the `main` and, as long as it is +**state-compatible**, we must backport it to the last major release branch e.g. +`v11.x` when the next major release is v12. + +This helps to minimize the diff of a major upgrade review process. + +Additionally, it helps to safely and incrementally test +state-compatible changes by doing smaller patch releases. Contrary +to a major release, there is always an opportunity to safely downgrade +for any minor or patch release. + +### State-compatibility + +It is critical for the patch and minor releases to be state-machine compatible with +prior releases in the same major version. For example, v11.2.1 must be +compatible with v11.1.0 and v11.0.0. + +This is to ensure **determinism** i.e. that given the same input, the nodes +will always produce the same output. + +State-incompatibility is allowed for major upgrades because all nodes in the network +perform it at the same time. Therefore, after the upgrade, the nodes continue +functioning in a deterministic way. + +#### Scope + +The state-machine scope includes the following areas: + +- All messages supported by the chain, including: + - Every msg's ValidateBasic method + - Every msg's MsgServer method + - Net gas usage, in all execution paths (WIP to make this not include error flows) + - Error result returned (TODO: Make the error code not merkelized) + - State changes (namely every store write) +- AnteHandlers in "DeliverTx" mode +- Whitelisted queries +- All `BeginBlock`/`EndBlock` logic + +The following are **NOT** in the state-machine scope: + +- Events +- Queries that are not whitelisted +- CLI interfaces + +#### Validating State-Compatibility + +Tendermint ensures state compatibility by validating a number +of hashes that can be found [here][2]. + +`AppHash` and `LastResultsHash` are the common sources of problems stemmnig from our work. +To avoid these problems, let's now examine how these hashes work. + +##### App Hash + +An app hash is a hash of hashes of every store's Merkle root that is returned by +ABCI's `Commit()` from Cosmos-SDK to Tendermint. + +Cosmos-SDK [takes an app hash of the application state][4], and propagates +it to Tendermint which, in turn, compares it to the app hash of the +rest of the network. + +For example, at height n, [we compute][5]: +`app hash = hash(hash(root of x/epochs), hash(root of x/gamm)...)` + +Then, Tendermint ensures that the app hash of the local node matches the app hash +of the network. Please note that the explanation and examples are simplified. + +##### LastResultsHash + +`LastResultsHash` is the root hash of all results from the transactions +in the block returned by the ABCI's `DeliverTx`. + +The [`LastResultsHash`][3] today contains: + +1. Tx `GasWanted` + +2. Tx `GasUsed` + +`GasUsed` being merkelized means that we cannot freely reorder methods that consume gas. +We should also be careful of modifying any validation logic since changing the +locations where we error or pass might affect transaction gas usage. + +There are plans to remove this field from being Merkelized in a subsequent Tendermint release, at which point we will have more flexibility in reordering operations / erroring. + +3. Tx response `Data` + +The `Data` field includes the proto message response. Therefore, we cannot +change these in patch releases. -* Make a PR to main, with a cosmovisor config, generated in tandem with the binaries from tool. - * Should be its own PR, as it may get denied for Fork upgrades. +4. Tx response `Code` -* Make a PR to main to update the import paths and go.mod for the new major release +This is an error code that is returned by the transaction flow. In the case of +success, it is `0`. On a general error, it is `1`. Additionally, each module +defines its custom error codes. For example, `x/mint` currently has the +following: + -* Should also make a commit into every open PR to main to do the same find/replace. (Unless this will cause conflicts) +As a result, it is important to avoid changing custom error codes or change +the semantics of what is valid logic in thransaction flows. -* Do a PR if that commit has conflicts +Note that all of the above stem from `DeliverTx` execution path, which handles: -* (Eventually) Make a PR that adds a version handler for the next upgrade - * [Add v10 upgrade boilerplate #1649](https://github.com/osmosis-labs/osmosis/pull/1649/files) +- `AnteHandler`'s marked as deliver tx +- `msg.ValidateBasic` +- execution of a message from the message server -* Update chain JSON schema's recommended versions in `chain.schema.json` located in the root directory. +The `DeliverTx` return back to the Tendermint is defined [here][1]. + +#### Major Sources of State-incompatibility + +##### Creating Additional State + +By erroneously creating database entries that exist in Version A but not in +Version B, we can cause the app hash to differ across nodes running +these versions in the network. Therefore, this must be avoided. + +##### Changing Proto Field Definitions + +For example, if we change a field that gets persisted to the database, +the app hash will differ across nodes running these versions in the network. + +Additionally, this affects `LastResultsHash` because it contains a `Data` field that is a marshaled proto message. + +##### Returning Different Errors Given Same Input + +Version A + +```go +func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { + if amount.IsNegative() { + return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive or zero") + } + return nil +} +``` + +Version B + +```go +func (sk Keeper) validateAmount(ctx context.Context, amount sdk.Int) error { + if amount.IsNegative() || amount.IsZero() { + return sdkerrors.Wrap(sdkerrors.ErrInvalidRequest, "amount must be positive") + } + return nil +} +``` + +Note that now an amount of 0 can be valid in "Version A". However, not in "Version B". +Therefore, if some nodes are running "Version A" and others are running "Version B", +the final app hash might not be deterministic. + +Additionally, a different error message does not matter because it +is not included in any hash. However, an error code `sdkerrors.ErrInvalidRequest` does. +It translates to the `Code` field in the `LastResultsHash` and participates in +its validation. + +##### Variability in Gas Usage + +For transaction flows (or any other flow that consumes gas), it is important +that the gas usage is deterministic. + +Currently, gas usage is being Merklized in the state. As a result, reordering functions +becomes risky. + +Suppose my gas limit is 2000 and 1600 is used up before entering +`someInternalMethod`. Consider the following: + +```go +func someInternalMethod(ctx sdk.Context) { + object1 := readOnlyFunction1(ctx) # consumes 1000 gas + object2 := readOnlyFunction2(ctx) # consumes 500 gas + doStuff(ctx, object1, object2) +} +``` + +- It will run out of gas with `gasUsed = 2600` where 2600 getting merkelized +into the tx results. + +```go +func someInternalMethod(ctx sdk.Context) { + object2 := readOnlyFunction2(ctx) # consumes 500 gas + object1 := readOnlyFunction1(ctx) # consumes 1000 gas + doStuff(ctx, object1, object2) +} +``` + +- It will run out of gas with `gasUsed = 2100` where 2100 is getting merkelized +into the tx results. + +Therefore, we introduced a state-incompatibility by merklezing diverging gas +usage. + +#### Secondary Limitations To Keep In Mind + +##### Network Requests to External Services + +It is critical to avoid performing network requests to external services +since it is common for services to be unavailable or rate-limit. + +Imagine a service that returns exchange rates when clients query its HTTP endpoint. +This service might experience downtime or be restricted in some geographical areas. + +As a result, nodes may get diverging responses where some +get successful responses while others errors, leading to state breakage. + +##### Randomness + +Randomness cannot be used in the state machine, as the state machine definitionally must be deterministic. +Any time you'd consider using it, instead seed a CSPRNG off of some seed. + +One thing to note is that in golang, iteration order over `map`s is non-deterministic, so to be deterministic you must gather the keys, and sort them all prior to iterating over all values. + +##### Parallelism and Shared State + +Threads and Goroutines might preempt differently in different hardware. Therefore, +they should be avoided for the sake of determinism. Additionally, it is hard +to predict when the multi-threaded state can be updated. + +##### Hardware Errors + +This is out of the developer's control but is mentioned for completeness. ### Pre-release auditing process @@ -339,15 +578,15 @@ For every module with notable changes, we assign someone who was not a primary a Deliverables of review are: -* PR's with in-line code comments for things they had to figure out (or questions) +- PR's with in-line code comments for things they had to figure out (or questions) -* Tests / test comments needed to convince themselves of correctness +- Tests / test comments needed to convince themselves of correctness -* Spec updates +- Spec updates -* Small refactors that helped in understanding / making code conform to consistency stds / improve code signal-to-noise ratio welcome +- Small refactors that helped in understanding / making code conform to consistency stds / improve code signal-to-noise ratio welcome -* (As with all PRs, should not be a monolithic PR that gets PR'd, even though that may be the natural way its first formed) +- (As with all PRs, should not be a monolithic PR that gets PR'd, even though that may be the natural way its first formed) At the moment, we're looking for a tool that lets us statically figure out every message that had something in its code path that changed. Until a tool is found, we must do this manually. @@ -355,3 +594,9 @@ We test in testnet & e2e testnet behaviors about every message that has changed We communicate with various integrators if they'd like release-blocking QA testing for major releases * Chainapsis has communicated wanting a series of osmosis-frontend functionalities to be checked for correctness on a testnet as a release blocking item + +[1]:https://github.com/cosmos/cosmos-sdk/blob/d11196aad04e57812dbc5ac6248d35375e6603af/baseapp/abci.go#L293-L303 +[2]:https://github.com/tendermint/tendermint/blob/9f76e8da150414ce73eed2c4f248947b657c7587/proto/tendermint/types/types.proto#L70-L77 +[3]:https://github.com/tendermint/tendermint/blob/main/types/results.go#L47-L54 +[4]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/rootmulti/store.go#L430 +[5]:https://github.com/osmosis-labs/cosmos-sdk/blob/5c9a51c277d067e0ec5cf48df30a85fae95bcd14/store/types/commit_info.go#L40