Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[debug client] Limit key import concurrency to 4 #591

Merged
merged 7 commits into from
Mar 20, 2023

Conversation

okdas
Copy link
Member

@okdas okdas commented Mar 16, 2023

Description

make localnet_client_debug currently fails on machines with low~ish amount of RAM. Key import loads 999 validator keys quickly, but cryptographic calculations are so intense on memory it only imports 250 keys before dying OOM on my 16Gi virtual machine.

Type of change

Please mark the relevant option(s):

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Major breaking change
  • Documentation
  • Other

List of changes

Testing

  • make develop_test
  • make localnet_client_debug
  • make client_start && make client_connect

Required Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have tested my changes using the available tooling
  • I have updated the corresponding CHANGELOG

If Applicable Checklist

  • I have updated the corresponding README(s); local and/or global
  • I have added tests that prove my fix is effective or that my feature works
  • I have added, or updated, mermaid.js diagrams in the corresponding README(s)
  • I have added, or updated, documentation and mermaid.js diagrams in shared/docs/* if I updated shared/*README(s)

memory: "64Mi"
cpu: "250m"
limits:
memory: "512Mi"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the client process die quick instead of crashing control plane when memory is full.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if this works with docker-compose client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if this works with docker-compose client?

Yep, it worked for me!

@Olshansk Olshansk self-requested a review March 17, 2023 00:42
@Olshansk Olshansk added bug Something isn't working - expected behaviour is incorrect infra Core infrastructure - not protocol related labels Mar 17, 2023
@0xBigBoss
Copy link
Contributor

@okdas wdyt of trying something like this? It's a bit more efficient since it's limited to the number of CPUs. This implementation did save a bunch of memory on my setup with 12 cores.

root@pocket-v1-cli-client-745744b48-4jlxr:/# client keys list
Before initializeDebugKeybase: Alloc = 1 MiB, TotalAlloc = 1 MiB, Sys = 16 MiB, NumGC = 0
{"level":"debug","time":"2023-03-17T01:53:37Z","message":"Debug keybase initializing... Adding 999 validator keys to the keybase"}
numWorkers:  12
After initializeDebugKeybase: Alloc = 478 MiB, TotalAlloc = 32075 MiB, Sys = 1166 MiB, NumGC = 71
//go:build debug

package debug

import (
	"fmt"
	"os"
	"runtime"
	"sync"

	"github.com/pokt-network/pocket/app/client/keybase"
	"github.com/pokt-network/pocket/build"
	"github.com/pokt-network/pocket/logger"
	cryptoPocket "github.com/pokt-network/pocket/shared/crypto"
	"gopkg.in/yaml.v2"
)

const (
	// NOTE: This is the number of validators in the private-keys.yaml manifest file
	numValidators      = 999
	debugKeybaseSuffix = "/.pocket/keys"
)

var (
	// TODO: Allow users to override this value via `datadir` flag or env var or config file
	debugKeybasePath string
)

// Initialise the debug keybase with the 999 validator keys from the private-keys manifest file
func init() {
	homeDir, err := os.UserHomeDir()
	if err != nil {
		logger.Global.Fatal().Err(err).Msg("Cannot find user home directory")
	}
	debugKeybasePath = homeDir + debugKeybaseSuffix

	// Initialise the debug keybase with the 999 validators
	if err := initializeDebugKeybase(); err != nil {
		logger.Global.Fatal().Err(err).Msg("Cannot initialise the keybase with the validator keys")
	}
}

func initializeDebugKeybase() error {
	var (
		validatorKeysPairMap map[string]string
		err                  error
	)

	validatorKeysPairMap, err = parseValidatorPrivateKeysFromEmbeddedYaml()

	if err != nil {
		return err
	}

	// Create/Open the keybase at `$HOME/.pocket/keys`
	kb, err := keybase.NewBadgerKeybase(debugKeybasePath)
	if err != nil {
		return err
	}
	db, err := kb.GetBadgerDB()
	if err != nil {
		return err
	}

	// Add the keys if the keybase contains less than 999
	curAddr, _, err := kb.GetAll()
	if err != nil {
		return err
	}

	// Add validator addresses if not present
	if len(curAddr) < numValidators {
		logger.Global.Debug().Msgf(fmt.Sprintf("Debug keybase initializing... Adding %d validator keys to the keybase", numValidators-len(curAddr)))

		// Use writebatch to speed up bulk insert
		wb := db.NewWriteBatch()

		// Create a channel to receive errors from goroutines
		errCh := make(chan error, numValidators)

		// Create a WaitGroup to wait for all goroutines to finish
		var wg sync.WaitGroup

		// Create a worker pool with a limited number of workers
		numWorkers := runtime.NumCPU()
		wg.Add(numWorkers)

		// Create a channel to send tasks to the workers
		taskCh := make(chan string, numValidators)

		// Start the worker pool
		for i := 0; i < numWorkers; i++ {
			go func() {
				defer wg.Done()
				for privHexString := range taskCh {
					// Import the keys into the keybase with no passphrase or hint as these are for debug purposes
					keyPair, err := cryptoPocket.CreateNewKeyFromString(privHexString, "", "")
					if err != nil {
						errCh <- err
						return
					}

					// Use key address as key in DB
					addrKey := keyPair.GetAddressBytes()

					// Encode KeyPair into []byte for value
					keypairBz, err := keyPair.Marshal()
					if err != nil {
						errCh <- err
						return
					}
					if err := wb.Set(addrKey, keypairBz); err != nil {
						errCh <- err
						return
					}
				}
			}()
		}

		// Send tasks to the workers
		for _, privHexString := range validatorKeysPairMap {
			taskCh <- privHexString
		}

		// Close the task channel to signal the workers to exit
		close(taskCh)

		// Wait for all workers to finish
		wg.Wait()

		// Check if any goroutines returned an error
		select {
		case err := <-errCh:
			return err
		default:
		}

		if err := wb.Flush(); err != nil {
			return err
		}
	}

	// Close DB connection
	if err := kb.Stop(); err != nil {
		return err
	}

	return nil
}

// parseValidatorPrivateKeysFromEmbeddedYaml fetches the validator private keys from the embedded build/localnet/manifests/private-keys.yaml manifest file.
func parseValidatorPrivateKeysFromEmbeddedYaml() (map[string]string, error) {

	// Parse the YAML file and load into the config struct
	var config struct {
		ApiVersion string            `yaml:"apiVersion"`
		Kind       string            `yaml:"kind"`
		MetaData   map[string]string `yaml:"metadata"`
		Type       string            `yaml:"type"`
		StringData map[string]string `yaml:"stringData"`
	}
	if err := yaml.Unmarshal(build.PrivateKeysFile, &config); err != nil {
		return nil, err
	}
	validatorKeysMap := make(map[string]string)

	for id, privHexString := range config.StringData {
		validatorKeysMap[id] = privHexString
	}
	return validatorKeysMap, nil
}

Copy link
Member

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach makes sense to make, but I'm still trying to catch up and understand on how we got here. @0xBigBoss I think it was introduced by #578, but the issue should've been there beforehand, right?

Since this is just a client debug key feature, I'm approving, but would like a 👍 from @deblasis or @0xBigBoss before merging.

memory: "64Mi"
cpu: "250m"
limits:
memory: "512Mi"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if this works with docker-compose client?

@okdas
Copy link
Member Author

okdas commented Mar 17, 2023

Hey @0xBigBoss, that is a great suggestion. I did try to switch to runtime.NumCPU() on my VM with 8 cores and client unfortunately crashed with 512MiB limit.

To be frank, I don't know if we should have debug client consume more than 0.5Gi RAM. It is possible we might need to adjust complexity of scrypt (possibly for debug client only?). For now we probably will end up merging this as it is to unblock the team.

@okdas okdas requested a review from Olshansk March 17, 2023 22:31
@0xBigBoss
Copy link
Contributor

0xBigBoss commented Mar 17, 2023

@okdas That makes sense on the memory limits. What’s the benefit of re-importing every time a new pod is created? It’s really slow. Could we mount a persistent volume to $HOME/.pocket to prevent this?

@okdas
Copy link
Member Author

okdas commented Mar 17, 2023

Could we mount a persistent volume to $HOME/.pocket to prevent this?

Why not, that is a great idea. Added a PVC for that pod. Thanks!

@okdas okdas merged commit 26c592c into main Mar 20, 2023
@okdas okdas deleted the okdas/debug-client-key-import branch March 20, 2023 18:03
bryanchriswhite added a commit that referenced this pull request Mar 21, 2023
* pokt/main:
  [Tooling] Output changelog validation check into the PR automatically (#596)
  [debug client] Limit key import concurrency to 4 (#591)
  [SHARED][CLI] adds message routing type field to debug messages and CLI commands (#561)
dylanlott pushed a commit that referenced this pull request Mar 24, 2023
## Description

<!-- REMOVE this comment block after following the instructions
1. Add a summary of the change including: motivation, reasons, context,
dependencies, etc...
2. If applicable, specify the key files that should be looked at.
-->

`make localnet_client_debug` currently fails on machines with low~ish
amount of RAM. Key import loads 999 validator keys quickly, but
cryptographic calculations are so intense on memory it only imports 250
keys before dying OOM on my 16Gi virtual machine.

## Type of change

Please mark the relevant option(s):

- [ ] New feature, functionality or library
- [x] Bug fix
- [ ] Code health or cleanup
- [ ] Major breaking change
- [ ] Documentation
- [ ] Other <!-- add details here if it a different type of change -->

## List of changes

<!-- REMOVE this comment block after following the instructions
 List out all the changes made
-->

- Wrapped key import part into concurrency limiter
(https://github.com/korovkin/limiter)

## Testing

- [x] `make develop_test`
- [x] `make localnet_client_debug`
- [x] `make client_start && make client_connect`

<!-- REMOVE this comment block after following the instructions
 If you added additional tests or infrastructure, describe it here.
 Bonus points for images and videos or gifs.
-->

## Required Checklist

- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have tested my changes using the available tooling
- [x] I have updated the corresponding CHANGELOG

### If Applicable Checklist

- [ ] I have updated the corresponding README(s); local and/or global
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have added, or updated,
[mermaid.js](https://mermaid-js.github.io) diagrams in the corresponding
README(s)
- [ ] I have added, or updated, documentation and
[mermaid.js](https://mermaid-js.github.io) diagrams in `shared/docs/*`
if I updated `shared/*`README(s)

---------

Co-authored-by: Daniel Olshansky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working - expected behaviour is incorrect infra Core infrastructure - not protocol related
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants