Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block log pruning: optionally reduce disk usage of block log file during operation by periodically trimming blocks #342

Merged
merged 24 commits into from
Jun 15, 2022

Conversation

spoonincode
Copy link
Member

@spoonincode spoonincode commented Jun 2, 2022

Add a new option, block-log-retain-blocks, which when set will periodically reduce the disk usage of the block log to only that of the given number of most recent blocks. This allows operators to limit the disk usage of the block log without stopping and restarting nodeos.

Reducing disk usage is performed by "poking holes" in the log file; in other words making it a sparse file. This may have implications if the pruned log is tarballed/copied/etc with an application that does not recognize sparse files: the log may take up more disk space than expected on its destination if an application is not spare file aware.

A new command line option --vacuum is added to eosio-blocklog to squash the log back to a simple layout thus removing the spare file nature of the file. This vacuum operation is also automatically performed should one remove the block-log-retain-blocks option.

Pruning is only periodically performed so in practice a small number of additional blocks (no more than a few megabytes worth of data) will be available beyond the configured number.

@spoonincode spoonincode changed the title optional automatic trimming of block log file block log pruning: optionally reduce disk usage of block log file during operation by periodically trimming blocks Jun 6, 2022
@spoonincode spoonincode marked this pull request as ready for review June 8, 2022 06:10
@spoonincode
Copy link
Member Author

Some tasks to carry over from ship log review:

  • double check if any preconditions on functions on added prune/vacuum functions should be more explicit
  • bundle all the various pruning options in a struct instead of making some assumptions various places
  • can probably optimize the hole punching in vacuum better

Copy link
Contributor

@brianjohnson5972 brianjohnson5972 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these have to be handled for this PR to merge, some can be put in a github issue to be handled after a release and some can be disputed.

const size_t first_data_pos = get_block_pos(first_block_num);
block_file.seek_end(-sizeof(uint32_t));
const size_t last_data_pos = block_file.tellp();
if(last_data_pos - first_data_pos < 1024*1024*1024) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we just vacuum on exit if the file is greater than 1G? Did we decide on this because we only thought the time needed to be spent if there was any real size to recover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a nice round number to start discussion at. It’s not clear whether there is a preference when/if vacuuming should be performed automatically. I thought I had a strong opinion on this initially but all the ongoing discussions make me uncertain now.

Clearly vacuuming is performed with the manual eosio-blocklog --vacuum command (block log is fortunate that it has this separate tool: ship has no such operation making this decision matrix a little tougher to decide on)

For automatic vacuuming there are then two opportunities:

  • On exit, if the size to vacuum is small enough (what’s small enough? 1GB at the moment).
  • On start, if the user no longer specifies a trim value. (the alternative here is that the log stays an “unvacuumed pruned log” but stops performing any more pruning until the user specifies otherwise in the future).

fwiw I've removed automatic vacuuming at the moment until we can come to agreement on the criteria for when it should happen.


//if there is no head block though, bail now, otherwise first_block_num won't actually be available
// and it'll mess this all up
if(!head)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to fix the index file or else this should be before convert_existing_header_to_vacuumed()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following this comment, help me out. No head block means no entries in the log which should mean index is completely empty by my understanding. But digging on this some did show another issue and I’ve added a unit test to address prune transitions on an empty blog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then wouldn't no head block mean there was no pruning to be done, so no reason to even start?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once someone enables pruning the header will have the flag set that its a pruned log, so when vacuuming (i.e. converting it back to a non-pruned log) it firsts clears the flag by way of rewriting the header via convert_existing_header_to_vacuumed(). Then if !head then there are no blocks & the index is empty so we're done. Maybe the comment saying "bail" has the wrong connotation -- it's really that we're done here. Otherwise if there are blocks (head is nonnull) the magic has to happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree now with your original code, but I think your new code is wrong. I don't see the convert_existing_header_to_vacuumed() adding the extra "valid non-pruned blocks count" (or whatever that name is) 32 bit number at the end, so why are you removing that 32 bits from the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm going to refer to the 4 byte block count on the end of the file as the trailer)

Creation of a new log is done in reset(). This can create a new log initially pruned or nonpruned. It'll write out the header with the correct flag (at the end of the function), as well as write out the trailer either by way of append() if there is a first block, or by writing out 0 manually if not (head==nullptr).

Conversion from non-pruned -> pruned occurs in open(). There, after calling prune() it always writes a new header with the flag enabled and the 4 byte trailer to the file. If head==nullptr it'll write out a 0.

vacuum() is called to transition pruned -> non-pruned. The first thing it does is call convert_existing_header_to_vacuumed() which EOS_ASSERTs that we are indeed currently a pruned log file on disk. Then it writes a new header which will have the flag cleared. (it also may need to transition from genesis_state to chain_id). Once the header is written it'll either a) if head==nullptr simply chop off the trailer that was added by reset()/open(), or then if there is a head ergo first_block_num really exists then it goes on to perform the vacuum procedure which effectively eliminates the trailer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original code had a defect where if a log was in pruned mode with no blocks and you vacummed it (i.e. converted to non pruned), that if(!head) return would leave the trailer on the file after clearing the flag. The log then said it was a nonpruned but the trailer was still on it which of course caused it to be a broken log. Hence adding the code to trim off the trailer there

* move prune flag to MSb of version field
* adjustments & unit tests for empty log files
* fix an off-by-one when converting to a pruned log that was fixed on append() but if never being append()ed would leave bad block count
* compact some code around vacuum header conversion
* an EOS_ASSERT or two for expected preconditions
* add some consts where appropriate & remove prefixed '_' on local variable
* change vacuum status print to be every 5 seconds
* tweak hole punching during vacuum to not leave some gaps of data
* auto-prune on nodeos exit has been disabled at the moment
@brianjohnson5972
Copy link
Contributor

brianjohnson5972 commented Jun 14, 2022

I think block-logs-prune-blocks is a confusing config parameter name, since it sounds like the number of blocks that get pruned. Would like a name like prune-block-log-retain.

@spoonincode
Copy link
Member Author

Maybe block-log-retain-blocks? I'd like to keep the option starting with "block-log"

@brianjohnson5972
Copy link
Contributor

yeah, go with block-log-retain-blocks

@spoonincode spoonincode merged commit 718ceee into main Jun 15, 2022
@spoonincode spoonincode deleted the blog_trimming branch June 15, 2022 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants