Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: centralize LLM communication tracking in chat model #15146

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

planger
Copy link
Contributor

@planger planger commented Mar 7, 2025

What it does

Currently, our application has fragmented tracking of the LLM communications:

  • Chat model doesn't track all LLM messages (e.g. system message) or multiple LLM requests per user response
  • Chat agents must manually record LLM communications via CommunicationRecordingService
  • Tool calls are tracked in the chat model in content parts, which are rather UI focused, but tool calls missing in the
    CommunicationRecordingService

This fragmentation means we lack a single source of truth for chat session information.

This PR centralizes all LLM communication tracking in the chat model by:

  1. Introducing a new ChatLanguageModelService that encapsulates LLM request tracking
  2. Moving related code from the abstract chat agent to this new service
  3. Feeding the CommunicationRecordingService via a simple listener on the chat model
  • Complete tracking of all LLM requests, responses, and tool calls in one place
  • Simplified interface for chat agents to interact with LLMs
  • Improved data structure to capture tool calls and LLM messages in the History View (with minor UI enhancements)

image

How to test

Test a few chat requests with tool calls and

  • Make sure they end up correctly in the chat model
  • Make sure they show up in the history view

Follow-ups

Currently requests to the Orchestrator are only tracked because the chat model (did before and still) specifies the Orchestrator as the agent id of the request. Thus, the requests also only show up in the history view for the orchestrator.

We need to find a better solution for the delegation as the one we have now, which was very implicit and quite a special case. I suggest to make the delegation a more explicit event but didn't want to put this into this PR for now.

Breaking changes

  • This PR introduces breaking changes and requires careful review. If yes, the breaking changes section in the changelog has been updated.

Attribution

Contributed on behalf of STMicroelectronics

Review checklist

Reminder for reviewers

Currently, our application has fragmented tracking of the LLM
communications:

- Chat model doesn't track all LLM messages (e.g. system message) or
multiple LLM requests per user response
- Chat agents must manually record LLM communications via
`CommunicationRecordingService`
- Tool calls are tracked in the chat model in content parts, which are
rather UI focused, but tool calls missing in the
`CommunicationRecordingService`

This fragmentation means we lack a single source of truth for chat
session information.

This PR centralizes all LLM communication tracking in the chat model by:

1. Introducing a new `ChatLanguageModelService` that encapsulates LLM
request tracking
2. Moving related code from the abstract chat agent to this new service
3. Feeding the `CommunicationRecordingService` via a simple listener on
the chat model

- Complete tracking of all LLM requests, responses, and tool calls in
one place
- Simplified interface for chat agents to interact with LLMs
- Improved data structure to capture tool calls and LLM messages in the
History View (with minor UI enhancements)

Contributed on behalf of STMicroelectronics
@planger planger force-pushed the planger/improve-chat-recording branch from e49e96d to 7ddbe87 Compare March 7, 2025 10:07
@planger planger requested a review from eneufeld March 7, 2025 10:08

try {
for await (const chunk of originalStream) {
if (chunk.tool_calls?.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this duplicating what we have in our stream parser and with building up response messages?
wouldn't it make more sense to hook in the monitoring there?

Copy link
Contributor Author

@planger planger Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, kind of. The existing code is just quite entangled with the response content object merge handling and rather UI oriented, while this here is simply capturing all data. It might still make sense to consolidate this though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the code and your PR again.
IMHO with your change we now extend the request object to contain information that we get and parse anyway but we put it in the response. This PR now adds the same information differently formatted into the request, just so that we can add it in the chat-history-entry to the 'fromRequest' method and use it in the history widget.
IMHO we can just extend the fromResponse method so that the response is an array and pass the response contents formatted in what ever way down.
Especially with the changes I proposed here: https://github.com/eclipse-theia/theia/pull/15092/files#diff-ce6934330cf08ddabaac8d16f9d9751081668d862eecdd6c49e6dd9cfe3770e2
I would claim that it makes more sense to enhance the reponse part of the history entry to just accept an array of response contents and then in the history renderer handle them accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the simplest case do something like this:

export function fromResponse(
        agentId: string,
        request: ChatRequestModel,
        args: Partial<CommunicationResponseEntryParam> = {}
    ): CommunicationResponseEntryParam {
        return {
            agentId: agentId,
            sessionId: request.session.id,
            requestId: request.id,
            response: request.response.response.content.map(c => `${c.kind}: ${c.asDisplayString?.()}`),
            ...args,
        };
    }

and in the history card:

{entry.response.map((response, index) => (
                        <pre key={`response-${index}`}>{response}</pre>
                    ))}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for looking into this PR!

I understand your point and this change indeed adds some duplication (in parsing and storing), which we may want to avoid.

However the main goal of this PR isn't specifically to improve the history recording (this is actually just something that we can now simplify by having a centralized place where we have access and capture the information).

The main goal is that we have one central place that contains all information about the entire chat request/response in one place. This allows e.g. to make more powerful listeners (like the history recorder) or derive tests from a chat model by replaying the chat session against another LLM or by mocking the LLM for testing the agent -- all based in the data of the chat model. For that we need the complete and plain information.

Currently we capture parts of this information but already parse it into a structure that is optimized for the UI and not for post processing, and in this process we loose information that isn't relevant to the UI (e.g. request settings, or multiple LLM requests that were combined into one user-targeted response).

I'm completely open to challenge and reconsider where and how we capture the information, but I'd like to have one central place to capture the entire information for all subsequent use cases.

It'd probably also be good to have @sdirix in the discussion as I talked about this approach with him recently. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't get me wrong, I like the centralization. And I agree that the current approach is very UI centric.
That is why I pointed to the changes of the Model in the Thinking PR.
We should align the approach and classes. I focused on the history part just because it is the biggest code change line wise in you pr 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting on reviewers
Development

Successfully merging this pull request may close these issues.

2 participants