Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #125: sign block with old key if signing key updated in same block #1203

Merged
merged 19 commits into from
Aug 31, 2018

Conversation

abitmore
Copy link
Member

@abitmore abitmore commented Jul 29, 2018

PR for #125.

This PR also contains a fix about block size check, and a fix about removing item from fork db.

Things to be done:

  • test case for loading object_database from disk which contains signing keys not in genesis
  • test case for witness_update_operation
  • test case for witness_create_operation
  • test case for witness plugin update: I have done some manual tests, it's too much efforts to create such test cases.

@abitmore
Copy link
Member Author

Travis complains:

2795751ms th_a object_database.cpp:106 open ] Opening object database from /tmp/graphene-tmp/e16c-550d-aba8-ade6 ...
chain_test: /home/travis/build/bitshares/bitshares-core/libraries/db/include/graphene/db/simple_index.hpp:67: const graphene::db::object& graphene::db::simple_index::insert(graphene::db::object&&) [with T = graphene::chain::global_property_object]: Assertion `!_objects[instance]' failed.
Running 214 test cases...
unknown location(0): fatal error in "change_signing_key": signal: SIGABRT (application abort requested)
/home/travis/build/bitshares/bitshares-core/tests/common/database_fixture.cpp(271): last checkpoint

Related code:

// close the database, flush all data to disk
db.close();
// reopen database, all data should be unchanged
db.open(data_dir.path(), make_genesis, "TEST");

Failed to reopen database file after closed? Any idea how to fix?

@pmconrad
Copy link
Contributor

The database is not meant to be reopened after close. Also see discussion in #436 #687 #689 .

@abitmore
Copy link
Member Author

Destroyed db object and created a new one. Travis-CI no longer complains.

@abitmore abitmore force-pushed the 125-signing-key-check branch from 449745f to bf0d807 Compare August 19, 2018 14:25
private:
void _apply_block( const signed_block& next_block );
processed_transaction _apply_transaction( const signed_transaction& trx );
void _cancel_bids_and_revive_mpa( const asset_object& bitasset, const asset_bitasset_data_object& bad );

/// For tracking signing keys of specified witnesses, only update when applied a block or popped a block
flat_map< witness_id_type, optional<public_key_type> > _witness_key_cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is a bad idea to introduce block-dependent state in database that is not covered by object_db. This state must be maintained manually, which is fragile. I think the problem at hand can be solved in a more simple way.

@@ -250,7 +253,7 @@ block_production_condition::block_production_condition_enum witness_plugin::mayb
}

fc::time_point_sec scheduled_time = db.get_slot_time( slot );
graphene::chain::public_key_type scheduled_key = scheduled_witness( db ).signing_key;
graphene::chain::public_key_type scheduled_key = *db.find_witness_key_from_cache( scheduled_witness ); // should be valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of using the cache, witness_plugin could simply connect to the database::applied_block signal and use that to retrieve the latest keys of the witness(es) it is interested in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially you mean to move the cache to witness plugin. Because there is no signal when a block is popped out, in order to deal with chain reorganization, the cache should be in (extended) chain state. Then, in order to make it work, need to either replay the whole chain to enable the plugin, or add code to refresh the cache (copy data from main chain state to the extended part) on startup. Then, in case when there is a reorganization to the startup head block, need to make sure the cache don't get emptied.

I didn't see it's simpler so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The witness plugin is only interested in current head block. Blocks are only popped when switching forks, which means there will be block pushes immediately after the pop, which means the cache will automatically be updated in that case.
This is slightly different from the cache in database itself - that should always be up to date, also after popping blocks.

Overall, I think even if the resulting code is not significantly simpler, the containment in the witness plugin is a big plus.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that ideally it's better to put the logic in witness plugin because block generation is actually witness plugin's business.

Blocks are only popped when switching forks, which means there will be block pushes immediately after the pop, which means the cache will automatically be updated in that case.

One scenario to consider: if put the cache in witness plugin, if we code the plugin to track applied operations only, after a block is popped out, the cache may become inconsistent if the popped out block contains a witness_update_operation, when pushing a new block, if there is no the same witness_update_operation in the new block, the cache would still be inconsistent. To solve this, we need additional code to detect if a chain reorganization has occurred and refresh the cache accordingly, if not simply refresh the whole cache on every block. However, when refreshing the cache, we need to fetch data from main chain state, since signalling is a multi-threading thing, the main state could have changed when fetching data, which means the cache is not 100% reliable.

By putting the cache in database class we can guarantee its consistency. Perhaps there will still be edge cases, for example, witness plugin tries to generate a block in the middle when database is pushing a block.

Another thing is we don't have unit test for witness plugin, so adding more code there means more work / risks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The witness plugin typically needs to keep track of the keys of only one witness. Refreshing this after each pushed block should have minimal cost, and only for the node in question.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like to refresh from the plugin by querying chain state directly, because chain state may have become dirty already when refreshing. For example, this scenario:

  • there was a witness_update_operation in _pending_transactions
  • a block is pushed to database without that operation
  • that operation got pushed to database soon after finished pushing the block
  • witness plugin received the "applied_block" signal
  • witness plugin tries to refresh local cache, but got updated key (dirty data).

That said, ideally we need to keep a clean state or a subset in database for plugins or other modules to use. This PR is trying in this way, IMHO we can merge it now, then refactor it to a more generic module later when necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the applied_block signal is processed in sync right after the block has been applied. There shouldn't be any tx's pushed in between.
https://github.com/bitshares/bitshares-core/blob/2.0.180612/libraries/chain/db_block.cpp#L550

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, I forgot that. The state should be clean when processing applied_block event.

By the way, why we need to sleep here and there in test cases? IIRC sometimes chain_test will segfault without sleeping. E.G.

fc::usleep(fc::milliseconds(200)); // sleep a while to execute callback in another thread

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's required in some places where API callbacks happen. These are asynchronous.
Shouldn't be necessary in market_rounding_tests IMO.

if( !(skip & skip_witness_signature) )
FC_ASSERT( witness_obj.signing_key == block_signing_private_key.get_public_key() );
{
auto signing_key = find_witness_key_from_cache( witness_id );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the cache, simply move the existing FC_ASSERT down a couple of lines, after _pending_tx_session.reset();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cheaper to check here than checking after called _pending_tx_session.reset();.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only hurt the node that is trying to produce without proper keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for nodes with proper keys, it's still slightly cheaper because the cache would be much smaller than object database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nodes with keys it may be cheaper, but the global cache is a penalty for all nodes.

@abitmore abitmore self-assigned this Aug 27, 2018
@abitmore
Copy link
Member Author

Moved the signing key cache to witness plugin as recommended by @pmconrad.

Resolved conflicts:
  libraries/chain/db_block.cpp
@abitmore abitmore removed their assignment Aug 29, 2018
which means we assume it will be no more than 3 bytes after serialization. 21 bits means at maximum 2^21-1=2,097,151 transactions in a block, it's practically safe although theoretically the size is unsigned_int which can be 56 bits at maximum.
Copy link
Contributor

@pmconrad pmconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! (And much simpler than before, IMO.)

The PR contains several unrelated changes. That's OK, IMO - no need to create separate PRs for minor code improvements
We need to keep this balanced though. Perhaps not quite as many unrelated things in the future.

{
// Note: if this check failed (which won't happen in normal situations),
// we would have temporarily broken the invariant that
// _pending_tx_session is the result of applying _pending_tx.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point _pending_tx_session has been cleared, so the comment makes no sense.

Copy link
Member Author

@abitmore abitmore Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, _pending_tx_session has been cleared, but _pending_tx hasn't, so they're inconsistent. That's why I wrote the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Never thought of this as an invariant, but I guess it's ok.

@@ -532,7 +538,17 @@ void database::_apply_block( const signed_block& next_block )
uint32_t skip = get_node_properties().skip_flags;
_applied_ops.clear();

FC_ASSERT( (skip & skip_merkle_check) || next_block.transaction_merkle_root == next_block.calculate_merkle_root(), "", ("next_block.transaction_merkle_root",next_block.transaction_merkle_root)("calc",next_block.calculate_merkle_root())("next_block",next_block)("id",next_block.id()) );
if( !(skip & skip_block_size_check) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should at least soft-fork protect this.

// should fail to push if it's too large
if( fc::raw::pack_size(maybe_large_block) > gpo.parameters.maximum_block_size )
{
BOOST_CHECK_THROW( db.push_block(maybe_large_block), fc::exception );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should ensure that this is triggered a couple of times. Perhaps increment a counter each time and BOOST_CHECK_EQUAL the counter after the loop ends. (Not sure how often it should trigger...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check to make sure it has been triggered at least once. I think it's enough. Didn't use BOOST_CHECK_EQUAL because it would be a pain when trying to tweak the loop in the future.

@pmconrad
Copy link
Contributor

Ok, I won't insist on the soft fork.

@abitmore abitmore merged commit 8189b45 into develop Aug 31, 2018
@abitmore abitmore deleted the 125-signing-key-check branch August 31, 2018 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants