-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine via API if factomd is "ready" #671
Comments
Okay, let’s try. Main aim: we need to know, when factomd node is ready to be enabled back into Open Node pool. Requirements:
|
Anton is right on target. I would only add that there seem to be two things that we need. The first is a simple flag that conveys what Anton just mentioned. The flag's name should be decoupled from implementation terminology, so that the implementation is free to evolve. If you want to add additional flags that are tied to implementation details, that would probably be useful too. I realize that there is no perfect way to determine if the API is ready. What we want is not perfection, but an expert consensus that that flag is the best that can reasonably be implemented. If the core devs don't create such a flag, then it falls on devs such as myself to try and figure out the readiness of the API, given various bits of information. Not a good situation. Second, additional system information is needed from factomd so that it's health can be assessed externally. Just because factomd says the API is ready doesn't make it so. My previous pseudo-code post was an attempt to ask for all of that relevant information in one shot, instead of having to make multiple calls. Finally, please don't relegate this to the |
Any ideas on this? |
Having better API calls to query the internal status of factomd would be really useful for many projects as well as generally for ANOs. We should be able to determine the following status: The above should preferably be implemented with streaming support via web socket as well. If so, it would be beneficial to add things like pending transactions. |
There is a workaround using data composed for health checks using three JSONRPC method calls: heights, current-minute, and properties
TODO: cherry pick some other metrics from the promethus /metrics endpoint and include |
First pass here: https://github.com/FactomProject/factomd/tree/FD-869_add_status_check_to_api test with local network
This should be easy to take over by anyone else to tune the rest of the parameters |
Also NOTE: it's currently believed that we don't need a 'Liveness' check for factomd other than API port check at this time - A follower will always be able to recover on it's own without restart. P.S. Authority nodes are controlled via a coordinated restart - but we'll look for opportunity to add these sort of container-friendly features during the next big refactor. |
Ran this by @carryforward TODO: remove all of the 'sync' items & any that are not useful
Would be nice to add : # of connected peers & detection of the Authority nodes are stalled or not ( to be used to determine if a node should be used to write new entries Also NOTE: last time I spoke w/ collaborators about this - we narrowed the Scope to providing just enough info to be able to signal whether a node should be added to a pool on a load balancer |
From Chatting w/ Anton: [DeFacto] Anton IlzheevLast Friday at 5:30 PM Is DB loading finished? |
PR here #720 Merged into parchment release |
released here 1b6f10f |
OK, it looks like FD-869 as it was implemented didn't actually solve the problem that the OpenNode team actually wanted solving. Some commentary about that is posted here: #664
Lets talk about what the problem that is trying to be solved. Lets not post psudocode or API structure returns, but instead try to nail down the problem we are trying to solve.
It isn't an easy problem to define, as following along with consensus is always a matter of subjectivity (bitcoin boils this subjectivity down to # of confirmations). The subjectivity is harder to define in factom.
@ilzheev @jcheroske @ThomasMeier
The text was updated successfully, but these errors were encountered: