Skip to content

feat: Add LogElements transform to Java SDK#38533

Open
lalitx17 wants to merge 4 commits into
apache:masterfrom
lalitx17:add-java-log-elements-transform
Open

feat: Add LogElements transform to Java SDK#38533
lalitx17 wants to merge 4 commits into
apache:masterfrom
lalitx17:add-java-log-elements-transform

Conversation

@lalitx17
Copy link
Copy Markdown
Contributor

Please add a meaningful description for your change here

What does this PR do?

Added a Java SDK LogElements transform for logging each element of a PCollection while passing the elements through unchanged.

Supports SLF4J log levels plus optional prefix, timestamp, window, and pane info logging, mirroring the Python SDK’s LogElements convenience transform.

addresses #38528


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copilot AI review requested due to automatic review settings May 19, 2026 05:06
@github-actions github-actions Bot added the java label May 19, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a convenient logging utility to the Apache Beam Java SDK, mirroring the existing functionality in the Python SDK. It enables developers to easily inspect PCollection elements during pipeline execution without altering the data flow, which is particularly useful for debugging and monitoring data processing pipelines.

Highlights

  • New LogElements Transform: Introduced a new LogElements PTransform in the Java SDK that allows logging elements of a PCollection while passing them through unchanged.
  • Flexible Logging Configuration: Supports SLF4J log levels and provides optional configuration for including prefixes, timestamps, windowing, and pane information.
  • Testing: Added comprehensive unit tests to verify element preservation, metadata formatting, and display data population.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new LogElements PTransform to the Beam Java core SDK that logs each element of a PCollection (optionally with prefix, timestamp, window, and pane info) and passes elements through unchanged, along with unit tests.

Changes:

  • Introduces LogElements<T> PTransform with fluent builders (trace/debug/info/warn/error/of, withPrefix, withTimestamp, withWindow, withPaneInfo) and display data.
  • Implements an internal LoggingFn DoFn that formats and logs each element at the configured SLF4J level, then outputs it unchanged.
  • Adds LogElementsTest covering element pass-through, format string composition, and display data.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/LogElements.java New PTransform for logging PCollection elements at a configurable SLF4J level with optional metadata.
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/LogElementsTest.java Unit tests for element pass-through, log formatting, and display data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +167
default:
LOG.error("{}", message);
Comment on lines +75 to +85
@Test
public void testDisplayData() {
DisplayData displayData =
DisplayData.from(
LogElements.of(Level.WARN).withPrefix("row: ").withTimestamp().withWindow());

assertThat(displayData, hasDisplayItem("level", "WARN"));
assertThat(displayData, hasDisplayItem("prefix", "row: "));
assertThat(displayData, hasDisplayItem("withTimestamp", true));
assertThat(displayData, hasDisplayItem("withWindow", true));
}
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the LogElements transform, which allows for logging elements in a PCollection at various log levels with optional metadata such as timestamps and window information. The review feedback highlights a performance concern where log messages are formatted regardless of whether the log level is enabled; it is recommended to wrap the logging logic in a level check to avoid unnecessary overhead. Additionally, the explicit call to setCoder should be removed to avoid potential runtime exceptions and rely on Beam's automatic coder propagation.

Comment on lines +193 to +205
log(
level,
formatForLogging(
element,
prefix,
withTimestamp,
withWindow,
withPaneInfo,
timestamp,
window,
paneInfo));
receiver.output(element);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The formatForLogging method is called for every element before checking if the log level is enabled. This involves string concatenation and toString() calls (potentially on large objects, windows, or timestamps) which can be very expensive in high-throughput pipelines. You should check if the log level is enabled before constructing the log message.

      if (isLoggingEnabled(level)) {
        log(
            level,
            formatForLogging(
                element,
                prefix,
                withTimestamp,
                withWindow,
                withPaneInfo,
                timestamp,
                window,
                paneInfo));
      }
      receiver.output(element);
    }

    private boolean isLoggingEnabled(Level level) {
      switch (level) {
        case TRACE:
          return LOG.isTraceEnabled();
        case DEBUG:
          return LOG.isDebugEnabled();
        case INFO:
          return LOG.isInfoEnabled();
        case WARN:
          return LOG.isWarnEnabled();
        case ERROR:
        default:
          return LOG.isErrorEnabled();
      }
    }


@Override
public PCollection<T> expand(PCollection<T> input) {
return input.apply("Log", ParDo.of(new LoggingFn<>(this))).setCoder(input.getCoder());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling setCoder(input.getCoder()) is generally redundant for a ParDo that returns the same type as its input, as Beam automatically propagates the coder. Additionally, input.getCoder() can throw an IllegalStateException if the coder hasn't been set yet and cannot be inferred at this stage of pipeline construction. It is safer to let the SDK handle coder propagation.

Suggested change
return input.apply("Log", ParDo.of(new LoggingFn<>(this))).setCoder(input.getCoder());
return input.apply("Log", ParDo.of(new LoggingFn<>(this)));

@github-actions
Copy link
Copy Markdown
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@lalitx17
Copy link
Copy Markdown
Contributor Author

lalitx17 commented May 19, 2026

the failed checks seems unrelated to my commit.

@lalitx17
Copy link
Copy Markdown
Contributor Author

assign set of reviewers

@github-actions
Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @chamikaramj for label java.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@lalitx17
Copy link
Copy Markdown
Contributor Author

retest this please

@github-actions
Copy link
Copy Markdown
Contributor

Reviewers are already assigned to this PR: @chamikaramj

@lalitx17
Copy link
Copy Markdown
Contributor Author

R: @ahmedabu98 please take a look. Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Copy link
Copy Markdown
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transform looks great, but I think testing could be better

Copy link
Copy Markdown
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahmedabu98
Copy link
Copy Markdown
Contributor

Will merge when tests go green

@lalitx17
Copy link
Copy Markdown
Contributor Author

really appreciate the speedy review.

@ahmedabu98
Copy link
Copy Markdown
Contributor

Looks like the new test failed:

2026-05-21T17:04:06.3026397Z LogElementsTest > testLogElementsLogsAtConfiguredLevels FAILED
2026-05-21T17:04:06.3057465Z     java.lang.AssertionError at LogElementsTest.java:88

@lalitx17
Copy link
Copy Markdown
Contributor Author

so the issue is that the DirectRunner uses slf4j-simple which outputs to stderr, which ExpectedLogs can't capture(only works with JUL).
also slf4j-simple has trace and debug disabled by default.

solutions like changing slf4j-simple to slf4j-jdk14 has high blast radius.

the next most obvious path is to capture stderr(trace and debug still won't appear) but can be flaky. (bad idea)

Lmk if you have better idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants