Skip to content

Core: Keep FileSystem reachable during Avro writes when Hadoop FS cache is disabled#16642

Open
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/16640-avro-fs-reachable
Open

Core: Keep FileSystem reachable during Avro writes when Hadoop FS cache is disabled#16642
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/16640-avro-fs-reachable

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Problem

Follow-up to #16641, which fixed this class of bug for the Parquet write path (reported in #16640). When the Hadoop FileSystem cache is disabled (for example fs.abfs.impl.disable.cache=true), a FileSystem resolved for a write has no shared strong referrer and can be garbage-collected mid-write. On Azure, AzureBlobFileSystem.finalize() then shuts down the thread pool that the open AbfsOutputStream depends on, and the write fails with Could not submit task to executor ... ThreadPoolExecutor [Terminated].

Root cause

AvroFileAppender keeps only the output stream, not the OutputFile. The data and delete writers that wrap it (DataWriter, PositionDeleteWriter, EqualityDeleteWriter) keep the appender and a location string, but not the OutputFile either. So for an Avro data or delete file written with the cache disabled, nothing keeps the write's FileSystem reachable, and it can be collected while the file is still being written.

Manifests are not affected: ManifestWriter and ManifestListWriter retain the OutputFile themselves, so the FileSystem stays reachable through them. ORC is also unaffected because OrcFileAppender already retains its OutputFile.

Change

Retain the OutputFile on AvroFileAppender so its FileSystem stays reachable for the appender's lifetime, mirroring OrcFileAppender. The retained file is also used to include the file location in the write-error message.

Tests

Added TestAvroWriteFileSystemReachability, an end-to-end test that writes an Avro position-delete file through PositionDeleteWriter (which drops the OutputFile) with the Hadoop FileSystem cache disabled, against a local FileSystem that mimics AzureBlobFileSystem: its finalize() terminates a thread pool the open stream depends on, and the stream references the pool rather than the FileSystem. Without the production change the write FileSystem is collected mid-write and close() fails with Could not submit task to executor: thread pool was terminated; with the change the FileSystem stays reachable and the write completes. The test fails without the fix and passes with it.

Related to #16640 and #16641.

…he is disabled

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the core label Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant