Allow substrate-based chains to define their own rpc middleware #9458

librelois · 2021-07-29T12:05:25Z

jsonrpc-core allows to define a custom middleware but substrate has already one and it doesn't seem possible to add a second one.

For substrate-based projects to be able to customize, for example, the way RPC requests are logged, the substrate middleware would have to be generic in order to be able to handle an optional "sub-middleware".

We have this need at moonbeam, and are forced to maintain a fork of sc-rpc-server for this, this is not a sustainable solution, that's why I propose to contribute to substrate to allow this, I'm waiting for an agreement in principle before doing so :)

The text was updated successfully, but these errors were encountered:

bkchr · 2021-07-29T12:30:23Z

Would you be open to create a pr?

librelois · 2021-07-29T12:31:15Z

Would you be open to create a pr?

Yes :)

bkchr · 2021-07-29T12:33:06Z

I think you can create one and then we can discuss what needs to be done.

dvdplm · 2021-09-22T06:06:01Z

@librelois I'm late to the game here, but do you mind elaborating a bit more on what your use-case for this is? "improved logging" is a bit vague. Do you have a link to your fork I can look at?

librelois · 2021-09-22T09:38:24Z

We use this fork to log the execution time of a request: https://github.com/purestake/substrate/tree/elois-custom-rpc-middleware

We are still using an old version of substrate, but rather than updating our fork, we would prefer that the PR #9466 is finally merged, so we don't have to maintain a fork.

dvdplm · 2021-09-22T13:22:12Z

There was work in that area merged quite recently, #9358. Maybe that is useful to you?

bkchr · 2021-09-23T17:41:05Z

@librelois isn't the linked pr from @dvdplm exactly doing what you want?

crystalin · 2021-09-23T17:45:22Z

@bkchr and @dvdplm , the change we are trying to introduce here is for multiple reason. The possibility to track the timing of an RPC is one of them, but we also want to change the logging mechanism and possibly tweak how we handle the rpc request.

(For ex, we can't use the debug log of the rpc as it includes the response payload in it, which for some request (like tracing transaction) can generate tens of megabytes per request)

bkchr · 2021-09-23T17:53:22Z

@dvdplm informed me yesterday that with jsonrpsee (the new jsonrpc backend) the possibility to define a Middleware is not available anymore afaiu. So, your pr would be removed again in the near future.

@crystalin could you maybe tell us what exactly you need and we will help you to get this kind of information? 🙂

crystalin · 2021-09-23T18:02:11Z

@bkchr I see. We will look into jsonrpsee.
Concerning the requirements, it is a bit complex:
We want to be able to track each request but some of them without the payload (or with a partial payload) as it generates too much data. In addition having the timing associated with the request would be great.
In our current implementation we filter the payload based on the method in the request.
This is important for us as we are getting a huge amount of requests and we need to troubleshoot some of those.

In addition to that, I wanted to experiment with a better request handler where the load would be spread across the RPC threads and not associated with the connection. Otherwise it leads to some request being computed very fast but being kept for a long time before being sent back if the thread is busy with another request. Even if the other RPC threads are totally free. I

Those are the 2 main requirements I had for implementing the rpc middleware

dvdplm · 2021-09-24T07:20:10Z

We will look into jsonrpsee.

Now is a great time to be involved as the feature set isn't frozen and we are actively looking for input from the public. Any level of participation is very much appreciated (PRs, issues, docs, examples, ideas and comments).

Pre-amble. I think that middleware is one of those features that look good on paper but end up being less useful in practice. The idea that "anyone" can add their own code to filter/modify/do whetever to requests is often false: the middleware author needs to know a fair amount of the internal details anyway. Conversely, in cases where a middleware is actually useful they are so useful that 99% users need it, in which case it could just as well be a feature.
From a library author/maintainer POV, it can be tricky to design the API in such a way as to let middleware authors do what they need to do, but also keep the API surface small enough to not hamper our possibilities to refactor and change the internals.
All of that to say that I am not 100% against adding a middleware feature to jsonrpsee but I'll need proper convincing, and if we can get away without it, I'd prefer it (less code => less bugs etc).

Now to the specifics here.

"We want to be able to track each request but some of them without the payload (or with a partial payload)": this is a fair request. I wonder however if we could perhaps move the payload logging to trace, or alternatively add a new rpc_track target. In your own RPCs you're obviously free to add any logging you wish, including conditionally log full/partial payloads. It'd be great to see a code sample here (or a draft PR). FWIW we are very keen on making substrate "observable" and aware that we're not quite there yet.
"In addition having the timing associated with the request would be great." – any feedback you can give us on Better RPC prometheus metrics. #9358 is very welcome. It should give you what you need and if it doesn't we can most likely tweak it.
"In addition to that, I wanted to experiment with a better request handler where the load would be spread across the RPC threads" – This sounds interesting and I'm curious to hear more. Not sure you'd be able to do this with a middleware though? I feel strongly that thread scheduling is a core library concern and that the tokio scheduler is a pretty high bar to improve on, but maybe I'm misunderstanding your idea here?

crystalin · 2021-09-24T12:57:34Z

@dvdplm Thank you for replying.
I understand your point, and agree than keeping it simple is probably better, but I'm not convinced that the needs of users are similar to consider 99% of them would find a middleware useful or not.
With the diversity that parachains are bringing to the ecosystem, the need for different handlers is also spreading.

Even in our own parachain, the different roles of our nodes (normal rpc nodes, "tracing" rpc nodes) already makes us trying to customize the rpc handler to behave differently.
But we can see what we can fit directly in the library itself. However having control (as a user) over the logging system, for an RPC node seems crucial to me.

Concerning the thread scheduling, I believe it can be improved in the library itself. On the jsonrpc one, each connection made to the server is associated to one of the given thread (We recently added the support to customize the number of threads), given if the thread is free or not. However this scenario makes it ineficient:

User A connects and is associated thread1
User A sends a very slow requests which is handled by thread1. This slow request at point relies on another thread, freeing temporarily the thread1
User B connects and is associated to thread1 because it is free at that moment.
User B sends a very fast request which is handled by thread1.
At that point, the thread1 is re-assigned to the user A request (which is finishing) and spends 10 seconds to finish it (this is often the case with tracing where it sends tens of megabytes of data)
User B has to wait for those 10s to get his response served.
At the end, only 1 thread was used, and user B had to wait 10s for a very fast request.

This is one of the multiple scenarios that are highly inefficient with this threading logic to associate with connection.

Concerning jsonrpsee where can we start (ticket, PR, ??) to add ideas and maybe work ?

librelois · 2021-09-24T13:16:48Z

Thanks @dvdplm :)

I think a universal solution is better than specific solutions, and more in line with the philosophy of susbtrate.

I agree that the API should be as minimal as possible to ease maintenance, but I think it's possible to create a middleware with the most flexible API possible, inspired by actix or warp API for example. I could be interested to work on this with you :)

We can at least explore different possible APIs and see if we can make something minimal and easy to maintain.

maciejhirsz · 2021-09-24T14:14:34Z

User A connects and is associated thread1

User A sends a very slow requests which is handled by thread1. This slow request at point relies on another thread, freeing temporarily the thread1

User B connects and is associated to thread1 because it is free at that moment.

User B sends a very fast request which is handled by thread1.

At that point, the thread1 is re-assigned to the user A request (which is finishing) and spends 10 seconds to finish it (this is often the case with tracing where it sends tens of megabytes of data)

User B has to wait for those 10s to get his response served.
At the end, only 1 thread was used, and user B had to wait 10s for a very fast request.

This should never be the case, AFAIK IO is non-blocking in jsonrpc, it definitely is non-blocking in jsonrpsee (which is a bit easier to reason about when it comes to async as it's using modern async await syntax and tokio rather than now antiquated mio). In general you should never see a scenario where a thread in a reactor is blocked for excessive amounts of time, unless a particular rpc method is breaking its contract by not offloading heavy compute to a separate thread (either std thread or using tokio::spawn_blocking).

crystalin · 2021-09-24T14:41:03Z

Ideally I agree, but in the described scenario, It is the json formatter (which is part of the rpc thread) taking 10s to process all the data (there is a lot !) for tracing requests. There is also time spent in sending the data but I don't remember if it was the same thread

It probably cost too much to put on a separate thread for 99% of the requests, but is worth for those tracing ones.

maciejhirsz · 2021-09-24T15:29:51Z

@crystalin that's a relatively easy to solve problem without any need for a middleware. We were already contemplating adding support for blocking class in jsonrpsee, and we can quite easily offload JSON serialization to a blocking tokio task for those, so happy to add support for large json dumps, feel free to open an issue for it.

Sidenote: we don't pass JSON through serde_json::Value in jsonrpsee, so I'd also expect that whatever tracing info you produce will be much quicker to serialize there.

github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Jul 29, 2021

librelois mentioned this issue Jul 29, 2021

feat(rpc): create custon jsonrpc middleware moonbeam-foundation/moonbeam#645

Closed

3 tasks

librelois mentioned this issue Jul 30, 2021

Allow substrate-based chains to define their own rpc middleware #9466

Closed

crystalin mentioned this issue Sep 24, 2021

Offloading JSON serialization for huge payloads paritytech/jsonrpsee#486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow substrate-based chains to define their own rpc middleware #9458

Allow substrate-based chains to define their own rpc middleware #9458

librelois commented Jul 29, 2021

bkchr commented Jul 29, 2021

librelois commented Jul 29, 2021

bkchr commented Jul 29, 2021

dvdplm commented Sep 22, 2021

librelois commented Sep 22, 2021

dvdplm commented Sep 22, 2021

bkchr commented Sep 23, 2021

crystalin commented Sep 23, 2021

bkchr commented Sep 23, 2021

crystalin commented Sep 23, 2021 •

edited

Loading

dvdplm commented Sep 24, 2021

crystalin commented Sep 24, 2021 •

edited

Loading

librelois commented Sep 24, 2021

maciejhirsz commented Sep 24, 2021 •

edited

Loading

crystalin commented Sep 24, 2021 •

edited

Loading

maciejhirsz commented Sep 24, 2021

Allow substrate-based chains to define their own rpc middleware #9458

Allow substrate-based chains to define their own rpc middleware #9458

Comments

librelois commented Jul 29, 2021

bkchr commented Jul 29, 2021

librelois commented Jul 29, 2021

bkchr commented Jul 29, 2021

dvdplm commented Sep 22, 2021

librelois commented Sep 22, 2021

dvdplm commented Sep 22, 2021

bkchr commented Sep 23, 2021

crystalin commented Sep 23, 2021

bkchr commented Sep 23, 2021

crystalin commented Sep 23, 2021 • edited Loading

dvdplm commented Sep 24, 2021

crystalin commented Sep 24, 2021 • edited Loading

librelois commented Sep 24, 2021

maciejhirsz commented Sep 24, 2021 • edited Loading

crystalin commented Sep 24, 2021 • edited Loading

maciejhirsz commented Sep 24, 2021

crystalin commented Sep 23, 2021 •

edited

Loading

crystalin commented Sep 24, 2021 •

edited

Loading

maciejhirsz commented Sep 24, 2021 •

edited

Loading

crystalin commented Sep 24, 2021 •

edited

Loading