Skip to content

[SPARK-56326] Include streaming query and batch ids in scheduling logs#55166

Open
BrooksWalls wants to merge 3 commits intoapache:masterfrom
BrooksWalls:SPARK-56326/streamingQueryIdAndBatchIdInSchedulingLogs
Open

[SPARK-56326] Include streaming query and batch ids in scheduling logs#55166
BrooksWalls wants to merge 3 commits intoapache:masterfrom
BrooksWalls:SPARK-56326/streamingQueryIdAndBatchIdInSchedulingLogs

Conversation

@BrooksWalls
Copy link
Copy Markdown

What changes were proposed in this pull request?

This change adds the streaming query Id and batch Id to some of the scheduling logs in order to aid in debugging structured streaming queries.

There are three log lines which have been updated to include the query and batch Id:

26/04/02 16:34:01 INFO TaskSetManager: [queryId = 1251e] [batchId = 5] Starting task 0.0 in stage 5.0 (TID 129) (...,executor driver, partition 0, PROCESS_LOCAL, 9728 bytes)

26/04/02 16:34:01 INFO TaskSetManager: [queryId = 1251e] [batchId = 5] Finished task 6.0 in stage 5.0 (TID 135) in 12 ms on ...(executor driver) (6/32)

26/04/02 16:39:09 INFO FairSchedulableBuilder: [queryId = f5660] [batchId = 5] Added task set TaskSet_5.0 to pool default

Why are the changes needed?

When debugging multiple streaming queries running at the same time it can be difficult to go through the scheduling logs. By including the query and batch Id it is much easier to isolate logs to specific queries and batches.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests were added.

Also manually tested by running the spark shell and redirecting info logs to a temporary file. Then ran a basic streaming query and grepped the temp file for the desired log lines to ensure they included the query and batch id. Also confirmed a batch query ran in the shell does not include the query and batch Id in its logs.

Was this patch authored or co-authored using generative AI tooling?

yes, coauthored

Generated-by: claude

Copy link
Copy Markdown
Contributor

@dichlorodiphen dichlorodiphen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good

@BrooksWalls BrooksWalls force-pushed the SPARK-56326/streamingQueryIdAndBatchIdInSchedulingLogs branch from 10d7760 to 92d2f95 Compare April 2, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants