DRAFT: basic Profiler support and WIP hierarchical model level to op level profiling support by WuSiYu · Pull Request #1247 · ModelTC/LightLLM

WuSiYu · 2026-04-01T04:13:26Z

1. feat(misc): Profiler support

use --enable_profiling=MODE to enable, currently support torch_profile and nvtx (use with NVIDIA Nsight system) mode

2. WIP feat: hierarchical model level to op level profiling support

To use:

set GlobalPerfContext.STATIC_DISABLED = False in lightllm/utils/profiler.py
(Optional): Adjust the sampling rate in lightllm/common/basemodel/basemodel.py using GlobalPerfContext.begin_with_sample_rate(xxx)
Start lightllm and send requests, the results (log and jsonl) will be saved to the working directory

use --enable_profiling=MODE to enable, currently support torch_profile and nvtx (use with NVIDIA Nsight system) mode

To use: 1. set GlobalPerfContext.STATIC_DISABLED = False in lightllm/utils/profiler.py 2. (Optional): Adjust the sampling rate in lightllm/common/basemodel/basemodel.py using GlobalPerfContext.begin_with_sample_rate(xxx) 3. Start lightllm and send requests, the results (log and jsonl) will be saved to the working directory

gemini-code-assist

Code Review

This pull request implements a comprehensive profiling framework featuring kernel-level timing and process-level tracing (Torch Profiler/NVTX), integrated across model layers, Triton kernels, and server APIs. Significant feedback was provided regarding a performance-degrading GPU-CPU synchronization in grouped_matmul, thread-safety vulnerabilities in the global context and decorators, and a bug in wrap_func preventing multiple calls. Further improvements were suggested to optimize memory usage via __slots__, avoid blocking synchronizations in the main thread, and remove redundant JSON validation logic.

gemini-code-assist · 2026-04-01T04:16:03Z

lightllm/common/basemodel/triton_kernel/fused_moe/grouped_fused_moe.py

+    m_total = int(expert_to_token_num.sum().item())
+    grouped_matmul.record_shape(m=m_total, k=k, n=n)


The expert_to_token_num.sum().item() call is a synchronous operation that forces a GPU-to-CPU synchronization. This will significantly degrade performance, especially since it runs unconditionally even when profiling is disabled (the record_shape call becomes a no-op but the sum and item transfer still occur). It should be guarded by a check to is_perf_counter_active().

Suggested change

m_total = int(expert_to_token_num.sum().item())

grouped_matmul.record_shape(m=m_total, k=k, n=n)

if grouped_matmul.is_perf_counter_active():

m_total = int(expert_to_token_num.sum().item())

grouped_matmul.record_shape(m=m_total, k=k, n=n)

gemini-code-assist · 2026-04-01T04:16:03Z

lightllm/utils/profiler.py

+            stacks = cls.threadlocal_active_counter_stack
+            cls.threadlocal_active_counter_stack = DefaultDict(list)
+            counter_list = cls.recorded_counters
+            cls.recorded_counters = []


The GlobalPerfContext and the PerfCounter decorator are not thread-safe for concurrent inference.

finalize_async clears the threadlocal_active_counter_stack and recorded_counters for ALL threads in the process. If multiple threads are performing inference simultaneously (e.g., in overlap mode or multi-threaded decoding), one thread calling finalize_async will clear or corrupt the profiling data for other threads.

The decorator assigns attributes like record_shape directly to the wrapped_func object (lines 401-403). Concurrent calls to the same decorated function will overwrite these attributes, leading to race conditions where one thread records data for another thread's counter.

gemini-code-assist · 2026-04-01T04:16:04Z

lightllm/utils/profiler.py

+        def wrapped_func(*args, **kwds):
+            with self:
+                return func(*args, **kwds)


The wrap_func implementation reuses the same PerfCounter instance (self). Since PerfCounter prevents restarts (line 353), a function wrapped this way can only be called once. Subsequent calls will raise a RuntimeError. The wrapper should create a new PerfCounter instance per call, similar to the logic in the __call__ decorator implementation.

gemini-code-assist · 2026-04-01T04:16:04Z

lightllm/utils/profiler.py

+]
+
+class PerfCounter:
+    __slot__ = ("name", "type", "shapes", "depth",


Typo in attribute name: __slot__ should be __slots__ to correctly restrict attribute creation and optimize memory usage.

Suggested change

__slot__ = ("name", "type", "shapes", "depth",

__slots__ = ("name", "type", "shapes", "depth",

gemini-code-assist · 2026-04-01T04:16:04Z

lightllm/utils/profiler.py

+        if torch.cuda.is_current_stream_capturing():
+            return
+
+        torch.cuda.synchronize()


torch.cuda.synchronize() is called in the main thread during finalize_async. This will block the inference loop until all GPU operations are finished, which can significantly impact throughput. Consider recording a completion event and having the worker thread wait for it asynchronously to avoid blocking the main inference loop.

gemini-code-assist · 2026-04-01T04:16:04Z

lightllm/utils/profiler.py

+            json_str = json.dumps({key: getattr(counter, key) for key in keys} | {"marker": marker})
+            _ = json.loads(json_str)  # validate
+            f.write(json_str + '\n')


The validation step json.loads(json_str) is redundant and adds unnecessary overhead. json.dumps is guaranteed to produce a valid JSON string if it completes successfully.

Suggested change

json_str = json.dumps({key: getattr(counter, key) for key in keys} | {"marker": marker})

_ = json.loads(json_str) # validate

f.write(json_str + '\n')

json_str = json.dumps({key: getattr(counter, key) for key in keys} | {"marker": marker})

f.write(json_str + '\n')

gemini-code-assist · 2026-04-01T04:16:04Z

lightllm/utils/profiler.py

+            logger.error(f"Error in finalizing counters: {e}")
+            raise e
+            return


The return statement is unreachable because it follows a raise e statement.

Suggested change

logger.error(f"Error in finalizing counters: {e}")

raise e

return

except Exception as e:

logger.error(f"Error in finalizing counters: {e}")

raise e

WuSiYu added 2 commits April 1, 2026 04:00

feat(misc): Profiler support

96234bf

use --enable_profiling=MODE to enable, currently support torch_profile and nvtx (use with NVIDIA Nsight system) mode

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: basic Profiler support and WIP hierarchical model level to op level profiling support#1247

DRAFT: basic Profiler support and WIP hierarchical model level to op level profiling support#1247
WuSiYu wants to merge 2 commits intoModelTC:mainfrom
WuSiYu:perf2

WuSiYu commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		m_total = int(expert_to_token_num.sum().item())
		grouped_matmul.record_shape(m=m_total, k=k, n=n)

	__slot__ = ("name", "type", "shapes", "depth",
	__slots__ = ("name", "type", "shapes", "depth",

Conversation

WuSiYu commented Apr 1, 2026

1. feat(misc): Profiler support

2. WIP feat: hierarchical model level to op level profiling support

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant