Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 15, 2026

Motivation and Context

ChatClientAgent was modified to include Activity.Current manipulation for distributed tracing, introducing telemetry concerns into a core component designed to be telemetry-agnostic. This violates separation of concerns—telemetry should be a decorator concern, not baked into the base agent.

Description

Moved Activity preservation logic from core to decorator:

  • ChatClientAgent (cleaned - telemetry-agnostic)

    • Removed System.Diagnostics import
    • Removed all Activity.Current capture/restore logic (23 assignments removed)
    • Component is now completely agnostic of telemetry concepts
  • OpenTelemetryAgent (enhanced - owns telemetry)

    • ForwardingChatClient.GetResponseAsync: Captures Activity from ForwardedOptions, restores after ConfigureAwait(false)
    • ForwardingChatClient.GetStreamingResponseAsync: Restores Activity before each yield and after streaming completes
    • Leverages existing ForwardedOptions.CurrentActivity pattern
  • Tests

    • Renamed 4 Activity tracing tests to OpenTelemetryAgent_* pattern
    • All tests updated to wrap ChatClientAgent with OpenTelemetryAgent
    • Added OpenTelemetryAgent_WithMockedMcpTool_PreservesTraceId integration test
    • Added comprehensive manual testing documentation for MCP + OpenTelemetry scenarios

Pattern:

// Without OpenTelemetryAgent: No Activity manipulation
var agent = new ChatClientAgent(chatClient, ...);

// With OpenTelemetryAgent: Full distributed tracing
using var agent = new OpenTelemetryAgent(new ChatClientAgent(chatClient, ...), sourceName);
await agent.RunAsync(messages); // TraceId preserved across MCP tool calls

Architecture: Decorator pattern correctly applied—ChatClientAgent provides core functionality, OpenTelemetryAgent adds telemetry as an opt-in concern.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No
Original prompt

This section details on the original issue you should resolve

<issue_title>.NET:ChatClientAgent creates new TraceId for every operation after mcp tool call (http), breaking distributed tracing</issue_title>
<issue_description># ChatClientAgent creates new TraceId for every operation, breaking distributed tracing

Problem

ChatClientAgent creates a new TraceId for every operation (LLM calls, mcp tool executions) instead of continuing the parent HTTP request's trace during mcp tool execution (HttpClientTransport). This makes distributed tracing completely broken.

Expected: One TraceId for entire request with child spans
Actual: New TraceId for every single operation

Example

app.MapPost("/chat", async (IServiceProvider sp) =>
{
    var agent = new ChatClientAgent(chatClient, tools: mcpTools);
    
    Console.WriteLine($"Request TraceId: {Activity.Current?.TraceId}"); // abc123
    
    await foreach (var update in agent.RunStreamingAsync(messages))
    {
    }
});

Result:

  • HTTP Request: TraceId abc123
  • mcp Tool executions (HttpClientTransport): TraceId ghi789 ctu376... ❌
  • Second LLM call: TraceId jkl012

Every operation gets disconnected traces instead of one correlated trace.

When It Breaks

TraceId changes immediately after the first HTTP response from LLM:

  1. Request comes in → TraceId A ✅
  2. Agent starts → still TraceId A ✅
  3. POST to LLM → TraceId A ✅
  4. *Mcp tool execution HttpClientTransport → NEW TraceId B ❌
  5. Everything after → NEW TraceIds C, D, E... ❌

What Doesn't Work

  • .UseOpenTelemetry() - makes it worse
  • DistributedContextPropagator.Inject() in handlers - Activity.Current already lost
  • AsyncLocal<Activity> - still breaks
  • ❌ Standard OpenTelemetry auto-instrumentation - bypassed

Environment

  • Agent Framework: latest main branch
  • .NET 9.0
  • OpenTelemetry 1.10.0 (AspNetCore + Http instrumentation)
  • Azure.AI.OpenAI with HttpClientPipelineTransport

Impact

Unusable for production - every tool call and LLM invocation creates separate traces. Impossible to correlate multi-step agent operations.

Need guidance on preserving Activity across agent execution lifecycle.

Note: This issue does NOT occur with Semantic Kernel. Using the exact same OpenTelemetry configuration with Semantic Kernel, the TraceId is correctly preserved across all operations (LLM calls, plugin executions, etc.). The problem is specific to ChatClientAgent from Microsoft.Agents.AI.

</issue_description>

<agent_instructions>Please analyze the issue and investigate how the TraceId context for the tracing may be getting lost after a Mcp Call.

Ideally create additional UT to replicate this error (with and without mcp calls) and ensure via further code updates that the issue is sorted or what potentially might be the root cause.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits January 15, 2026 11:17
…ndaries

Preserve Activity.Current across ConfigureAwait(false) boundaries by:
1. Capturing Activity.Current at start of RunCoreAsync/RunCoreStreamingAsync
2. Restoring it after each await with ConfigureAwait(false)
3. Ensuring it's set before yielding to consumer code in streaming

This fixes distributed tracing when using ChatClientAgent with tools/MCP.

Co-authored-by: rogerbarreto <[email protected]>
- Enhanced comments to clarify why Activity.Current restoration is needed
- Specified what "consumer code" means in comments
- Improved comment clarity for tracing context flow

Co-authored-by: rogerbarreto <[email protected]>
Copilot AI changed the title [WIP] Fix ChatClientAgent to maintain single TraceId for operations Fix Activity/TraceId loss in ChatClientAgent breaking distributed tracing Jan 15, 2026
Copilot AI requested a review from rogerbarreto January 15, 2026 11:31
…metryAgent

- Remove Activity/telemetry concepts from ChatClientAgent (removed System.Diagnostics using)
- Add Activity preservation logic to OpenTelemetryAgent's ForwardingChatClient
- Update tests to reflect that Activity preservation is an OpenTelemetryAgent feature
- All 155 ChatClientAgent tests pass
- All 26 OpenTelemetryAgent tests pass
- All 4 Activity tracing tests pass

This keeps ChatClientAgent agnostic of telemetry concerns while enabling
Activity/TraceId preservation through the OpenTelemetryAgent decorator.

Co-authored-by: rogerbarreto <[email protected]>
Copilot AI changed the title Fix Activity/TraceId loss in ChatClientAgent breaking distributed tracing Refactor: Move Activity preservation from ChatClientAgent to OpenTelemetryAgent Jan 15, 2026
- Add OpenTelemetryAgent_WithMockedMcpTool_PreservesTraceId unit test
- Test validates Activity/TraceId preservation with mock MCP tool
- Add ModelContextProtocol package reference for future integration tests
- Include comprehensive README with manual integration test instructions
- Provide standalone sample code for manual verification

The mocked test runs automatically and validates the pattern works correctly.
Full integration tests with real Azure OpenAI can be run manually following the README.

Co-authored-by: rogerbarreto <[email protected]>
Copilot AI changed the title Refactor: Move Activity preservation from ChatClientAgent to OpenTelemetryAgent Refactor: Move Activity/TraceId preservation from ChatClientAgent to OpenTelemetryAgent Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET:ChatClientAgent creates new TraceId for every operation after mcp tool call (http), breaking distributed tracing

2 participants