Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine via API if factomd is "ready" #671

Closed
carryforward opened this issue Mar 13, 2019 · 11 comments
Closed

Determine via API if factomd is "ready" #671

carryforward opened this issue Mar 13, 2019 · 11 comments

Comments

@carryforward
Copy link
Contributor

OK, it looks like FD-869 as it was implemented didn't actually solve the problem that the OpenNode team actually wanted solving. Some commentary about that is posted here: #664

Lets talk about what the problem that is trying to be solved. Lets not post psudocode or API structure returns, but instead try to nail down the problem we are trying to solve.

It isn't an easy problem to define, as following along with consensus is always a matter of subjectivity (bitcoin boils this subjectivity down to # of confirmations). The subjectivity is harder to define in factom.

@ilzheev @jcheroske @ThomasMeier

@ilzheev
Copy link
Contributor

ilzheev commented Mar 13, 2019

Okay, let’s try.

Main aim: we need to know, when factomd node is ready to be enabled back into Open Node pool.

Requirements:

  1. Latest entryblock fetched (i.e. read) locally, so any read request (like EC balance or entry ack) returns up-to-date information
  2. Write request (like Commit Entry or Chain), that sent to the factomd, will be processed by network immediately and block won’t be missed.

@jcheroske
Copy link

Anton is right on target. I would only add that there seem to be two things that we need.

The first is a simple flag that conveys what Anton just mentioned. The flag's name should be decoupled from implementation terminology, so that the implementation is free to evolve. If you want to add additional flags that are tied to implementation details, that would probably be useful too.

I realize that there is no perfect way to determine if the API is ready. What we want is not perfection, but an expert consensus that that flag is the best that can reasonably be implemented. If the core devs don't create such a flag, then it falls on devs such as myself to try and figure out the readiness of the API, given various bits of information. Not a good situation.

Second, additional system information is needed from factomd so that it's health can be assessed externally. Just because factomd says the API is ready doesn't make it so. My previous pseudo-code post was an attempt to ask for all of that relevant information in one shot, instead of having to make multiple calls.

Finally, please don't relegate this to the diagnostics call, unless you're planning on making that call a first-class API method. We are asking for a method that will become part of the main API, not some addition to the debug API.

@ilzheev
Copy link
Contributor

ilzheev commented Mar 19, 2019

Any ideas on this?

@TorHogne
Copy link
Contributor

Having better API calls to query the internal status of factomd would be really useful for many projects as well as generally for ANOs.

We should be able to determine the following status:
Ignore/Wait/Syncing/Ready/etc.. As well as if it is a fed or audit.

The above should preferably be implemented with streaming support via web socket as well. If so, it would be beneficial to add things like pending transactions.

@stackdump
Copy link
Contributor

There is a workaround using data composed for health checks using three JSONRPC method calls: heights, current-minute, and properties

{
    "result": {
        "data": {
            "clocks": {
                "spread": 0,
                "spreadTolerance": 10,
                "factomd": 1555446841,
                "proxy": 1555446841
            },
            "isHealthy": true,
            "flags": {
                "isClockSpreadOk": true,
                "isFollowingMinutes": true,
                "isCurrentBlockAgeValid": true,
                "isSynced": true
            },
            "currentMinute": {
                "minute": 2,
                "startTime": 1555446786,
                "age": 55
            },
            "currentBlock": {
                "age": 175,
                "maxAge": 1200,
                "startTime": 1555446666
            },
            "system": {
                "proxyName": "tfa-1",
                "factomdVersion": "6.2.0",
                "factomdApiVersion": "2.0",
                "proxyVersion": "0.5.2"
            },
            "heights": {
                "leader": 188518,
                "entry": 188518,
                "entryBlock": 188518,
                "directoryBlock": 188518
            }
        },
        "message": "Health check succeeded"
    },
    "jsonrpc": "2.0",
    "id": "null"
}

TODO: cherry pick some other metrics from the promethus /metrics endpoint and include
obviously not all data will be present

@stackdump
Copy link
Contributor

stackdump commented Apr 16, 2019

First pass here: https://github.com/FactomProject/factomd/tree/FD-869_add_status_check_to_api

test with local network

factomd --network=LOCAL --count=2 --blktime=30
ork@myork-dev:~/Workspace/factomd$ curl -s localhost:8088/status | python -m json.tool

{   
    "id": 0,
    "jsonrpc": "2.0",
    "result": {
        "BootTime": 1555451714,
        "CurrentBlockStartTime": 1555451714575255658,
        "CurrentMinute": 0,
        "CurrentMinuteStartTime": 1555451714575255658,
        "CurrentTime": 1555451717975470158,
        "EntryBlockDBHeightComplete": 0,
        "EntryBlockDBHeightProcessing": 0,
        "HighestKnownBlock": 0,
        "HighestSavedBlock": 0,
        "LeaderHeight": 1,
        "Syncing": false,
        "SyncingDBSigs": false,
        "SyncingEOMs": false,
        "Version": "6.2.2"
    }
}

This should be easy to take over by anyone else to tune the rest of the parameters

@stackdump
Copy link
Contributor

Also NOTE: it's currently believed that we don't need a 'Liveness' check for factomd other than API port check at this time - A follower will always be able to recover on it's own without restart.

P.S. Authority nodes are controlled via a coordinated restart - but we'll look for opportunity to add these sort of container-friendly features during the next big refactor.

@stackdump
Copy link
Contributor

stackdump commented May 3, 2019

Ran this by @carryforward TODO: remove all of the 'sync' items & any that are not useful

{
    "id": 0,
    "jsonrpc": "2.0",
    "result": {
        "BootTime": 1556889632,
        "CurrentBlockStartTime": 1556913247852724655,
        "CurrentMinute": 6,
        "CurrentMinuteStartTime": 1556913607830885349,
        "CurrentTime": 1556913616053230996,
        "DBHeightComplete": 0,
        "EntryBlockDBHeightComplete": 190961,
        "EntryBlockDBHeightProcessing": 0,
        "HighestKnownBlock": 190962,
        "HighestSavedBlock": 190961,
        "IgnoreDone": true,
        "LeaderHeight": 190962,
        "NodeName": "FNode0",
        "Role": "follower",
        "Running": true,
        "Syncing": false,
        "SyncingDBSigs": false,
        "SyncingEOMs": false,
        "Version": "6.2.2"
    }
}

Would be nice to add : # of connected peers & detection of the Authority nodes are stalled or not ( to be used to determine if a node should be used to write new entries

Also NOTE: last time I spoke w/ collaborators about this - we narrowed the Scope to providing just enough info to be able to signal whether a node should be added to a pool on a load balancer

@stackdump
Copy link
Contributor

From Chatting w/ Anton:

[DeFacto] Anton IlzheevLast Friday at 5:30 PM
Our monitoring script will work in the following way:

Is DB loading finished?
| (yes) Is Node out of Ignore?
|_ (yes) Are node's heights (dblock+entry) + 1 >= leader's height
|_ (yes) Node thinks, that it's ready to use, compare it with other nodes in the pool
We have out of ignore. Need to have data, to determine, that local db loading is finished

@stackdump
Copy link
Contributor

stackdump commented May 5, 2019

PR here #720 Merged into parchment release

@stackdump
Copy link
Contributor

released here 1b6f10f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants