Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migration from v0.2.0 to v0.2.1 #468

Closed
wants to merge 3 commits into from

Conversation

contrun
Copy link
Collaborator

@contrun contrun commented Jan 12, 2025

There is a breaking change from v0.2.0 to v0.2.1, which was introduced in #420. This PR tries to migrate data created from v0.2.0 to v0.2.1. The approach used in this PR is not perfect. There are 2 notable defects.

  1. The ChannelActorState contains many references to the data structures defined in other places. I didn't duplicate all the data definitions (that requires a lot of manual copy and pastes). So if the dependent data structures change in the future, then the migration code presented here may break. For example, if we add additional field to ChannelConstraints (say in the version 0.5.0 we added a new field), then the deserialization of OldChannelActorState and the serialization of NewChannelActorState in this code base will no longer work. This means that if the the user wants to upgrade data from v0.2.0 to v0.5.0, then he/she may have to first upgrade fnn from 0.2.0 to 0.2.2 (assume 0.2.2 contains this PR), and then to 0.5.0.
  2. As noted in the comment, we can't determine which remote nonce we should set for the field last_revoke_and_ack_remote_nonce without the counterparty's help. This is because we don't know if the counterparty received our latest RevokeAndAck or not, and we didn't save the remote nonce we used while creating the RevokeAndAck message. This PR assumes the happy path, i.e. the counterparty received our latest RevokeAndAck. If this is not the case, the user has to force close the channel. The root cause of this problem is that which data to save in the NewChannelActorState depends on the peer's state, and we have no way to interact with the peer in the migration stage.

I have verified that the code here works by the following steps.

  1. Run git checkout v0.2.0 to checkout to the v0.2.0 version code.
  2. Run REMOVE_OLD_STATE=y ./tests/nodes/start.sh to start a few fnn nodes with all new states.
  3. Run git checkout v0.2.1-migration-scripts to change the code base to branch v0.2.1-migration-scripts with some additional scripts to create channel.
  4. Run (cd ./tests/bruno; npm exec -- @usebruno/[email protected] run e2e/before-migration -r --env test) to create a channel.
  5. Stop the nodes started in step 2.
  6. Run cargo run -- -d ./tests/nodes/1 --migrate and cargo run -- -d ./tests/nodes/3 --migrate to migrate (we need to type YES in the terminal prompt).
  7. Run ./tests/nodes/start.sh to start a few fnn nodes to start fnn nodes again.
  8. Run (cd ./tests/bruno; npm exec -- @usebruno/[email protected] run e2e/after-migration -r --env test) to verify that we can make progress after the migration.

@contrun contrun changed the title Add migration for v0.2.0 to v0.2.1 Add migration from v0.2.0 to v0.2.1 Jan 12, 2025
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 8.97436% with 71 lines in your changes missing coverage. Please review.

Project coverage is 49.95%. Comparing base (096a739) to head (b3573d3).
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
src/store/migrations/v20250112205923.rs 6.57% 71 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #468      +/-   ##
===========================================
- Coverage    50.06%   49.95%   -0.12%     
===========================================
  Files           48       49       +1     
  Lines        31408    31486      +78     
===========================================
+ Hits         15725    15728       +3     
- Misses       15683    15758      +75     
Flag Coverage Δ
unittests 49.95% <8.97%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// RevokeAndAck from the peer, we will use this nonce to validate the RevokeAndAck message.
#[serde_as(as = "Option<PubNonceAsBytes>")]
pub last_commitment_signed_remote_nonce: Option<PubNonce>,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you change the order here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is the right order. Previous comments were actually placed near to the wrong fields.

@chenyukang
Copy link
Collaborator

This means that if the the user wants to upgrade data from v0.2.0 to v0.5.0, then he/she may have to first upgrade fnn from 0.2.0 to 0.2.2 (assume 0.2.2 contains this PR), and then to 0.5.0.

This is an assumption we need to hold, the binary of v0.5.0 must contain the code for upgrading from 0.2.0 to 0.2.2(if there is such an upgrading), and the migration will be executed by order.

@quake

@chenyukang
Copy link
Collaborator

As noted in the comment, we can't determine which remote nonce we should set for the field last_revoke_and_ack_remote_nonce without the counterparty's help. This is because we don't know if the counterparty received our latest RevokeAndAck or not, and we didn't save the remote nonce we used while creating the RevokeAndAck message. This PR assumes the happy path, i.e. the counterparty received our latest RevokeAndAck. If this is not the case, the user has to force close the channel. The root cause of this problem is that which data to save in the NewChannelActorState depends on the peer's state, and we have no way to interact with the peer in the migration stage.

This is the most challenge scenario may happened during migration, sometimes it's hard to determine the initial value of new fields.

It's better to consider this when writing the new code, I'm thinking maybe we need a way to detect the necessary of migration, then we can add a check in CI, for this case, the code change in storage tests code maybe a good signal:

v0.2.0...v0.2.1#diff-c8ce48166948578ad1a1ffbb233f3d0c04e43d3a528f59e73437e1e2aefbc869L357-L359

@contrun contrun closed this Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants