Skip to content

fix(functions): avoid quadratic-time debug logging in CleanupLLMResult / ParseFunctionCall#10592

Open
pos-ei-don wants to merge 1 commit into
mudler:masterfrom
pos-ei-don:fix-cleanup-llm-result-quadratic-log
Open

fix(functions): avoid quadratic-time debug logging in CleanupLLMResult / ParseFunctionCall#10592
pos-ei-don wants to merge 1 commit into
mudler:masterfrom
pos-ei-don:fix-cleanup-llm-result-quadratic-log

Conversation

@pos-ei-don

Copy link
Copy Markdown
Contributor

What

pkg/functions/parse.go::CleanupLLMResult and ParseFunctionCall both xlog.Debug the full llmresult string twice per call (on entry and after the regex replace loop). The streaming chat path (core/http/endpoints/openai/chat_stream_workers.go:359) calls CleanupLLMResult once per streaming delta chunk with the full accumulated result so far. For an N-chunk generation this means roughly chunk_size * N^2 bytes of debug output total — quadratic in the number of chunks.

Why this matters

Under LOG_LEVEL=debug I observed this drive a LocalAI container's log volume to about 96 GiB during a single ~50K-token streaming session (SGLang-via-LocalAI backend on a DGX Spark / GB10, sm_121). The resulting disk pressure interacted with the streaming hot loop on the same filesystem and contributed to a host-wide hard hang. Workaround was setting LOG_LEVEL=info, but the quadratic shape is a foot-gun for anyone enabling debug intentionally for field diagnostics — it's not obvious from the code that a single Debug-level field grows superlinearly with response length.

The fix

Replace the four result-content debug arguments with len(...) plus a fixed-size head (200 bytes via a new local truncForLog helper), bounding per-call output to a constant. The debug signal stays useful in practice:

  • the length field lets you observe growth/saturation
  • the first 200 chars are usually enough to identify which generation is in flight (system prompt prefix or tool-call XML opener)

The Replacing debug entries inside the replacement loop are unchanged — they are linear in the number of configured ReplaceLLMResult entries, not in the result length, so they don't accumulate.

Same fix applied to ParseFunctionCall (mirrors the same pattern, called from the same hot streaming path).

Compatibility

No API change. No behaviour change for LOG_LEVEL != debug (the default). Only the form of two log records changes when debug is enabled.

Verification

I don't have a Go toolchain on the host where I wrote this, so I haven't run go test ./pkg/functions/... locally — the change is intentionally small (4 logger arg-tuples + one helper, all in one file) and CI here will catch anything I missed. Happy to add a regression test that asserts the per-call log payload is bounded if a maintainer thinks that's worth it.

…t/ParseFunctionCall

The streaming chat path (core/http/endpoints/openai/chat_stream_workers.go)
calls CleanupLLMResult / ParseFunctionCall once per delta chunk with the
*full accumulated* LLM result so far. Both functions xlog.Debug the entire
argument on entry and exit, so a single N-chunk stream emits roughly
chunk_size * N^2 bytes of debug output.

Under LOG_LEVEL=debug this was observed in a recent SGLang-via-LocalAI
session on a DGX Spark host (about 50K tokens, long streaming generation)
to drive container logs to ~96 GiB, which interacted with the streaming
hot loop on the same filesystem and contributed to a host-wide hard hang
once disk pressure built up. Workaround was setting LOG_LEVEL=info, but
the quadratic shape remains a foot-gun for anyone intentionally enabling
debug.

Replace the four result-content debug arguments with len(...) plus a
fixed-size head (200 bytes via a new truncForLog helper), bounding per-
call output to a constant. The debug signal stays useful: the first 200
chars are enough to identify which generation is in flight, and the
length lets you observe growth without paying for the payload itself.

No API change. No behaviour change for LOG_LEVEL != debug.

Signed-off-by: Poseidon <philipp.wacker@ibf-solutions.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant