Skip to content

netty: Allow network errors to override graceful shutdown status#12822

Open
themechbro wants to merge 2 commits into
grpc:masterfrom
themechbro:fix/shutdown-error-masking
Open

netty: Allow network errors to override graceful shutdown status#12822
themechbro wants to merge 2 commits into
grpc:masterfrom
themechbro:fix/shutdown-error-masking

Conversation

@themechbro
Copy link
Copy Markdown
Contributor

Fixes #12812

Motivation

Currently, ClientTransportLifecycleManager acts as a strict latch for the first shutdown status it receives. If a user initiates a graceful shutdown (shutdownStatus with no cause) and the channel subsequently experiences a hard network drop (channelInactive firing a ClosedChannelException), the transport masks the physical network error and propagates the benign graceful status to active streams. This blinds telemetry to actual infrastructure drops during shutdown windows.

Modifications

  • Updated ClientTransportLifecycleManager#notifyShutdown to introduce a status upgrade mechanism. If the existing shutdownStatus is a graceful intent (has no Throwable cause) and the incoming Status represents a hard error (has a Throwable cause), the manager now overwrites the cached status.
  • Added networkErrorOverridesGracefulShutdownStatus to NettyClientTransportTest which perfectly simulates the reproducer by firing a graceful shutdown followed by fireChannelInactive(), asserting that the ClosedChannelException is properly propagated to the transport listener.

Result

Active streams that are forcefully interrupted during a graceful shutdown window will now correctly fail with UNAVAILABLE: channel closed (with the underlying Netty exception) rather than UNAVAILABLE: Channel shutdown invoked.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to ensure that when a Netty client transport is gracefully shut down and a subsequent real network failure occurs, the later failure is not masked by the earlier graceful shutdown status—so active streams can surface the underlying infrastructure error (per #12812).

Changes:

  • Adds a “status upgrade” path in ClientTransportLifecycleManager#notifyShutdown() to replace an already-cached graceful shutdown status with a later status that has a Throwable cause.
  • Adds a new unit test attempting to reproduce “graceful shutdown followed by network drop” behavior in NettyClientTransportTest.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
netty/src/main/java/io/grpc/netty/ClientTransportLifecycleManager.java Adds logic intended to upgrade cached shutdown status when a later shutdown has a cause
netty/src/test/java/io/grpc/netty/NettyClientTransportTest.java Adds a test intended to validate that network errors override graceful shutdown status

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +72 to +79
// Check if the incoming error is just the routine channel closure exception
boolean isClosedChannel = s.getCause() instanceof java.nio.channels.ClosedChannelException;

// Status Upgrade: Overwrite graceful shutdown if a hard network error occurs
if (shutdownStatus.getCause() == null && s.getCause() != null && !isClosedChannel) {
shutdownStatus = s;
return true;
}
Comment on lines 67 to 70
/** Returns {@code true} if was the first shutdown. */
@CanIgnoreReturnValue
public boolean notifyShutdown(Status s, DisconnectError disconnectError) {
notifyGracefulShutdown(s, disconnectError);
Comment on lines +302 to +324
@Test
public void networkErrorOverridesGracefulShutdownStatus() throws Exception {
startServer();
NettyClientTransport transport = newTransport(newNegotiator());
callMeMaybe(transport.start(clientTransportListener));

// 1. Trigger graceful shutdown
Status gracefulStatus = Status.UNAVAILABLE.withDescription("Channel shutdown invoked");
transport.shutdown(gracefulStatus);

// 2. Simulate a real network drop (e.g., Connection Reset)
java.io.IOException networkCause = new java.io.IOException("Connection reset by peer");
transport.channel().pipeline().fireExceptionCaught(networkCause);
transport.channel().pipeline().fireChannelInactive();

// 3. Verify the listener receives the IO error, NOT the graceful status
verify(clientTransportListener, timeout(5000)).transportShutdown(
org.mockito.ArgumentMatchers.argThat(status ->
status != null && status.getCause() instanceof java.io.IOException
),
org.mockito.ArgumentMatchers.any()
);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

channel shutdown error obtrudes on network errors

2 participants