Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use gateway provided requestID for trickle URLs #3408

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mjh1
Copy link
Contributor

@mjh1 mjh1 commented Feb 21, 2025

Switch back to using generated requestID for trickle URLs and set the gateway request ID as its own logging field.

@mjh1 mjh1 requested a review from j0sh February 21, 2025 12:31
@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Feb 21, 2025
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 0% with 84 lines in your changes missing coverage. Please review.

Project coverage is 32.07404%. Comparing base (232df3a) to head (13f9774).
Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
ai/worker/runner.gen.go 0.00000% 65 Missing ⚠️
server/ai_http.go 0.00000% 15 Missing ⚠️
core/ai_orchestrator.go 0.00000% 2 Missing ⚠️
ai/worker/worker.go 0.00000% 1 Missing ⚠️
server/ai_process.go 0.00000% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3408         +/-   ##
===================================================
- Coverage   32.11405%   32.07404%   -0.04001%     
===================================================
  Files            147         147                 
  Lines          40789       40843         +54     
===================================================
+ Hits           13099       13100          +1     
- Misses         26916       26969         +53     
  Partials         774         774                 
Files with missing lines Coverage Δ
server/rpc.go 66.66667% <ø> (ø)
ai/worker/worker.go 0.00000% <0.00000%> (ø)
server/ai_process.go 0.59222% <0.00000%> (ø)
core/ai_orchestrator.go 31.06796% <0.00000%> (+0.12136%) ⬆️
server/ai_http.go 9.89181% <0.00000%> (-0.03067%) ⬇️
ai/worker/runner.gen.go 5.79797% <0.00000%> (-0.18599%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 232df3a...13f9774. Read the comment docs.

Files with missing lines Coverage Δ
server/rpc.go 66.66667% <ø> (ø)
ai/worker/worker.go 0.00000% <0.00000%> (ø)
server/ai_process.go 0.59222% <0.00000%> (ø)
core/ai_orchestrator.go 31.06796% <0.00000%> (+0.12136%) ⬆️
server/ai_http.go 9.89181% <0.00000%> (-0.03067%) ⬇️
ai/worker/runner.gen.go 5.79797% <0.00000%> (-0.18599%) ⬇️
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@j0sh j0sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to add the orch-generated requestID to the orch logging context and return it to the gateway to add to its own context. That way we can filter logs by a specific G-O pair which will come in handy once we do O swapping.

Alternatively, when we construct the mid part of the trickle URL we can make it something like requestID + "-" + gatewayRequestID . I believe we log the full trickle URL in most contexts so that would probably work OK if filtering for a partial match.

@mjh1
Copy link
Contributor Author

mjh1 commented Feb 25, 2025

@j0sh I've made a change to include the orch request ID in the logs, I'm thinking we continue to use the gateway requestID as the common ID across all apps to make filtering easy (Evan will be using this), and then also add the other app requestIDs with other field names i.e. orch_request_id.

We could also rename request_id to gateway_request_id across all the apps to make it more explicit wdyt? @ecmulli are you already making use of the request_id in our loki logs? Mind if we rename it?

@mjh1
Copy link
Contributor Author

mjh1 commented Feb 25, 2025

@j0sh Also makes sense to return the orch requestID to the gateway. cc @victorges @leszko if we want to return the orch requestID to the gateway, does it seem ok to add that field to the response body or should it be a response header?

@ecmulli
Copy link
Member

ecmulli commented Feb 25, 2025

@mjh1 I am fine with renaming it. But we decided on using the stream_id as the common id across logs because it's the only ID that is available everywhere. For example, the request id is not available to the front end apps. Additionally, all investigations start with a front-end stream id and then have to be translated into a request id - it's simpler to avoid this translation.

@ecmulli
Copy link
Member

ecmulli commented Feb 25, 2025

Though I also like your recommendation here. We are essentially creating a hierarchy of ids and doing this will make tracing much easier

  1. Stream id
  2. Gateway request id
  3. Other app request ids...

@j0sh
Copy link
Collaborator

j0sh commented Feb 25, 2025

@j0sh Also makes sense to return the orch requestID to the gateway. cc @victorges @leszko if we want to return the orch requestID to the gateway, does it seem ok to add that field to the response body or should it be a response header?

If we're returning the orch request ID to the gateway, why not also send it to the runner as well? I assume that implies putting it in the body. (I think we should also have included the gateway request ID in the body as well; not sure why we are putting data into untyped headers.)

BTW we don't have to do this if we set the mid when constructing the trickle URL to something like requestID + "-" + gatewayRequestID and that would probably be enough when searching the logs.

@leszko
Copy link
Contributor

leszko commented Feb 26, 2025

@j0sh Also makes sense to return the orch requestID to the gateway. cc @victorges @leszko if we want to return the orch requestID to the gateway, does it seem ok to add that field to the response body or should it be a response header?

Isn't the requestID already included in the response body, inside publish_url? Then, maybe you can just reuse this function to extract it?

We could include another field in the response body, but the problem is that it's already pretty bad design, this API was design as a sync API to pass data from G into R, but we actually re-use it for fields exchanged between G<>O. So, I'd try to not there anything more than we really need and in the future, re-design it at all.

@j0sh
Copy link
Collaborator

j0sh commented Feb 27, 2025

Isn't the requestID already included in the response body, inside publish_url? Then, maybe you can just reuse this function to extract it?

@leszko Yes the request ID is in the trickle URL and that is probably OK (since we usually print the full trickle URL / path in logs) but we would want to make the mid adjustments that I suggested, so we can get both the orchestrator request ID and gateway request ID in there. That gives us the ability to slice logs by G session or O session as long as the trickle URL is present somewhere in the log line (so, no logging context changes would be necessary).

What I would not do is assume anything about the structure of the trickle URL, eg this function is not a good idea and will cause problems whenever we need to change the URL. We have the ability to send properly structured fields, let's use them.

We could include another field in the response body, but the problem is that it's already pretty bad design, this API was design as a sync API to pass data from G into R, but we actually re-use it for fields exchanged between G<>O. So, I'd try to not there anything more than we really need and in the future, re-design it at all.

Yes it is a little annoying in that we sometimes need to send different data back to the G vs the R. But I don't think this is such a case, because if we send the orch request ID back to the gateway to add to its logging context, we should also send it to the runner.

That being said, I am not sure what the harm is in extending the response body to do what we need it to. Make the fields optional if we have to. Otherwise, why use a schema at all?

@leszko
Copy link
Contributor

leszko commented Feb 27, 2025

That being said, I am not sure what the harm is in extending the response body to do what we need it to. Make the fields optional if we have to. Otherwise, why use a schema at all?

Fine, I'm ok with extending the schema. Leave it to @mjh1 to decide.

@mjh1 mjh1 requested a review from leszko March 6, 2025 14:30
ControlUrl: &controlUrlOverwrite,
Params: req.Params,
GatewayRequestId: &gatewayRequestID,
StreamId: &streamID,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the orchestrator generated requestID as well?

// local AIWorker processes job if combined orchestrator/ai worker
if orch.node.AIWorker != nil {
workerResp, err := orch.node.LiveVideoToVideo(ctx, requestID, streamID, req)
workerResp, err := orch.node.LiveVideoToVideo(ctx, gatewayRequestID, streamID, req)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little confused why these additional parameters are needed if they are now in the req ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Issues and PR related to the AI-video branch. go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants