Don't use gateway provided requestID for trickle URLs #3408

mjh1 · 2025-02-21T12:31:00Z

Switch back to using generated requestID for trickle URLs and set the gateway request ID as its own logging field.

codecov · 2025-02-21T12:46:04Z

Codecov Report

Attention: Patch coverage is 0% with 84 lines in your changes missing coverage. Please review.

Project coverage is 32.07404%. Comparing base (232df3a) to head (13f9774).
Report is 29 commits behind head on master.

Files with missing lines	Patch %	Lines
ai/worker/runner.gen.go	0.00000%	65 Missing ⚠️
server/ai_http.go	0.00000%	15 Missing ⚠️
core/ai_orchestrator.go	0.00000%	2 Missing ⚠️
ai/worker/worker.go	0.00000%	1 Missing ⚠️
server/ai_process.go	0.00000%	1 Missing ⚠️

Additional details and impacted files

@@                 Coverage Diff                 @@
##              master       #3408         +/-   ##
===================================================
- Coverage   32.11405%   32.07404%   -0.04001%     
===================================================
  Files            147         147                 
  Lines          40789       40843         +54     
===================================================
+ Hits           13099       13100          +1     
- Misses         26916       26969         +53     
  Partials         774         774

Files with missing lines	Coverage Δ
server/rpc.go	`66.66667% <ø> (ø)`
ai/worker/worker.go	`0.00000% <0.00000%> (ø)`
server/ai_process.go	`0.59222% <0.00000%> (ø)`
core/ai_orchestrator.go	`31.06796% <0.00000%> (+0.12136%)`	⬆️
server/ai_http.go	`9.89181% <0.00000%> (-0.03067%)`	⬇️
ai/worker/runner.gen.go	`5.79797% <0.00000%> (-0.18599%)`	⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 232df3a...13f9774. Read the comment docs.

Files with missing lines	Coverage Δ
server/rpc.go	`66.66667% <ø> (ø)`
ai/worker/worker.go	`0.00000% <0.00000%> (ø)`
server/ai_process.go	`0.59222% <0.00000%> (ø)`
core/ai_orchestrator.go	`31.06796% <0.00000%> (+0.12136%)`	⬆️
server/ai_http.go	`9.89181% <0.00000%> (-0.03067%)`	⬇️
ai/worker/runner.gen.go	`5.79797% <0.00000%> (-0.18599%)`	⬇️

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

j0sh

We probably want to add the orch-generated requestID to the orch logging context and return it to the gateway to add to its own context. That way we can filter logs by a specific G-O pair which will come in handy once we do O swapping.

Alternatively, when we construct the mid part of the trickle URL we can make it something like requestID + "-" + gatewayRequestID . I believe we log the full trickle URL in most contexts so that would probably work OK if filtering for a partial match.

mjh1 · 2025-02-25T15:04:59Z

@j0sh I've made a change to include the orch request ID in the logs, I'm thinking we continue to use the gateway requestID as the common ID across all apps to make filtering easy (Evan will be using this), and then also add the other app requestIDs with other field names i.e. orch_request_id.

We could also rename request_id to gateway_request_id across all the apps to make it more explicit wdyt? @ecmulli are you already making use of the request_id in our loki logs? Mind if we rename it?

mjh1 · 2025-02-25T15:08:55Z

@j0sh Also makes sense to return the orch requestID to the gateway. cc @victorges @leszko if we want to return the orch requestID to the gateway, does it seem ok to add that field to the response body or should it be a response header?

ecmulli · 2025-02-25T15:36:44Z

@mjh1 I am fine with renaming it. But we decided on using the stream_id as the common id across logs because it's the only ID that is available everywhere. For example, the request id is not available to the front end apps. Additionally, all investigations start with a front-end stream id and then have to be translated into a request id - it's simpler to avoid this translation.

ecmulli · 2025-02-25T15:39:15Z

Though I also like your recommendation here. We are essentially creating a hierarchy of ids and doing this will make tracing much easier

Stream id
Gateway request id
Other app request ids...

j0sh · 2025-02-25T20:16:12Z

@j0sh Also makes sense to return the orch requestID to the gateway. cc @victorges @leszko if we want to return the orch requestID to the gateway, does it seem ok to add that field to the response body or should it be a response header?

If we're returning the orch request ID to the gateway, why not also send it to the runner as well? I assume that implies putting it in the body. (I think we should also have included the gateway request ID in the body as well; not sure why we are putting data into untyped headers.)

BTW we don't have to do this if we set the mid when constructing the trickle URL to something like requestID + "-" + gatewayRequestID and that would probably be enough when searching the logs.

leszko · 2025-02-26T13:14:54Z

@j0sh Also makes sense to return the orch requestID to the gateway. cc @victorges @leszko if we want to return the orch requestID to the gateway, does it seem ok to add that field to the response body or should it be a response header?

Isn't the requestID already included in the response body, inside publish_url? Then, maybe you can just reuse this function to extract it?

We could include another field in the response body, but the problem is that it's already pretty bad design, this API was design as a sync API to pass data from G into R, but we actually re-use it for fields exchanged between G<>O. So, I'd try to not there anything more than we really need and in the future, re-design it at all.

j0sh · 2025-02-27T00:17:54Z

Isn't the requestID already included in the response body, inside publish_url? Then, maybe you can just reuse this function to extract it?

@leszko Yes the request ID is in the trickle URL and that is probably OK (since we usually print the full trickle URL / path in logs) but we would want to make the mid adjustments that I suggested, so we can get both the orchestrator request ID and gateway request ID in there. That gives us the ability to slice logs by G session or O session as long as the trickle URL is present somewhere in the log line (so, no logging context changes would be necessary).

What I would not do is assume anything about the structure of the trickle URL, eg this function is not a good idea and will cause problems whenever we need to change the URL. We have the ability to send properly structured fields, let's use them.

We could include another field in the response body, but the problem is that it's already pretty bad design, this API was design as a sync API to pass data from G into R, but we actually re-use it for fields exchanged between G<>O. So, I'd try to not there anything more than we really need and in the future, re-design it at all.

Yes it is a little annoying in that we sometimes need to send different data back to the G vs the R. But I don't think this is such a case, because if we send the orch request ID back to the gateway to add to its logging context, we should also send it to the runner.

That being said, I am not sure what the harm is in extending the response body to do what we need it to. Make the fields optional if we have to. Otherwise, why use a schema at all?

leszko · 2025-02-27T09:36:19Z

That being said, I am not sure what the harm is in extending the response body to do what we need it to. Make the fields optional if we have to. Otherwise, why use a schema at all?

Fine, I'm ok with extending the schema. Leave it to @mjh1 to decide.

j0sh · 2025-03-06T18:40:18Z

server/ai_http.go

+			ControlUrl:       &controlUrlOverwrite,
+			Params:           req.Params,
+			GatewayRequestId: &gatewayRequestID,
+			StreamId:         &streamID,


Should we also add the orchestrator generated requestID as well?

j0sh · 2025-03-06T18:44:43Z

core/ai_orchestrator.go

 	// local AIWorker processes job if combined orchestrator/ai worker
 	if orch.node.AIWorker != nil {
-		workerResp, err := orch.node.LiveVideoToVideo(ctx, requestID, streamID, req)
+		workerResp, err := orch.node.LiveVideoToVideo(ctx, gatewayRequestID, streamID, req)


I am a little confused why these additional parameters are needed if they are now in the req ?

Don't use gateway provided requestID for trickle URLs

0770fad

mjh1 requested a review from j0sh February 21, 2025 12:31

github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Feb 21, 2025

j0sh reviewed Feb 24, 2025

View reviewed changes

Include orch request ID in logging

c3e5fa7

rename gateway request id

3c71e93

mjh1 mentioned this pull request Mar 6, 2025

Add request ID fields to request and response body schemas livepeer/ai-runner#442

Merged

Return request ID in response and set worker request ID params

13f9774

mjh1 requested a review from leszko March 6, 2025 14:30

leszko approved these changes Mar 6, 2025

View reviewed changes

j0sh reviewed Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use gateway provided requestID for trickle URLs #3408

Don't use gateway provided requestID for trickle URLs #3408

mjh1 commented Feb 21, 2025 •

edited

Loading

codecov bot commented Feb 21, 2025 •

edited

Loading

j0sh left a comment

mjh1 commented Feb 25, 2025

mjh1 commented Feb 25, 2025

ecmulli commented Feb 25, 2025

ecmulli commented Feb 25, 2025

j0sh commented Feb 25, 2025

leszko commented Feb 26, 2025

j0sh commented Feb 27, 2025

leszko commented Feb 27, 2025

j0sh Mar 6, 2025

j0sh Mar 6, 2025

Don't use gateway provided requestID for trickle URLs #3408

Are you sure you want to change the base?

Don't use gateway provided requestID for trickle URLs #3408

Conversation

mjh1 commented Feb 21, 2025 • edited Loading

codecov bot commented Feb 21, 2025 • edited Loading

Codecov Report

j0sh left a comment

Choose a reason for hiding this comment

mjh1 commented Feb 25, 2025

mjh1 commented Feb 25, 2025

ecmulli commented Feb 25, 2025

ecmulli commented Feb 25, 2025

j0sh commented Feb 25, 2025

leszko commented Feb 26, 2025

j0sh commented Feb 27, 2025

leszko commented Feb 27, 2025

j0sh Mar 6, 2025

Choose a reason for hiding this comment

j0sh Mar 6, 2025

Choose a reason for hiding this comment

mjh1 commented Feb 21, 2025 •

edited

Loading

codecov bot commented Feb 21, 2025 •

edited

Loading