Skip to content

tar/asm: add NewInputTarStreamWithDone + tests#86

Open
jankaluza wants to merge 2 commits intovbatts:mainfrom
jankaluza:container-libs-148
Open

tar/asm: add NewInputTarStreamWithDone + tests#86
jankaluza wants to merge 2 commits intovbatts:mainfrom
jankaluza:container-libs-148

Conversation

@jankaluza
Copy link

@jankaluza jankaluza commented Jan 29, 2026

Refactor tar disassembly into a shared runInputTarStream goroutine and newInputTarStreamCommon setup helper.

Introduce bestEffortPipeWriter to allow tar parsing and metadata packing to complete even if the consumer closes the pipe early.

Add NewInputTarStreamWithDone, which returns an io.ReadCloser together with a done channel that is signaled when the internal goroutine finishes (or fails), including the final padding-draining phase.

Preserve existing NewInputTarStream behavior and API.

Add unit tests covering:

  • successful full reads
  • early consumer close
  • packer error propagation
  • underlying reader failure while the tar-split goroutine is still running

This is needed to fix containers/container-libs#148. Please check this ticket for more context.

@jankaluza
Copy link
Author

@mtrmac , can you please check this when you have time? :-)

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fairly quick look. I didn’t read the tests at all yet.

Is there a corresponding consumer PR to prove the concept and demonstrate how the API is used?


Nothing in the commit / comments says why we need it, and I think it’s important to preserve that knowledge. (The PR description does link to a part of the motivation, but that’s not going to be obvious to people just reading the code.)

  • Whether processing the tar stream succeeds, fails inside tar-split, or fails in the consumer of tar-split, we need an ability to block until the goroutine terminates (and the input r reader is safe to deallocate / reuse). I think the done channel works fine for that.

  • If processing the tar stream fails in the consumer of tar-split, and that consumer aborts, the consumer does not want to read all of the tar to completion. In that case, the current code’s goroutine will just hang trying to write to an abandoned pipe, and keep memory allocated forever.

    • Returning a ReadCloser and having the Close signal an abort works (unlike using a Context, which can’t abort a pipe write).
    • I wonder about directly returning a PipeReader. That might allow the consumer to use CloseWithError and potentially allow better diagnostics, OTOH the goroutine is not currently logging anything anyway — and returning a PipeReader would be more of an API commitment. I don’t have a strong opinion.
    • In both of the callers in c/storage, if consuming the tar from this function fails, nothing uses the data produced by storage.Packer. So I don’t think the bestEffortPipeWriter is necessary — it’s actually unwanted for us, because the goroutine will continue consuming the input and burning I/O or CPU (and blocking the caller who just wants a done signal and terminate), for no benefit. It would be better if the goroutine saw the pipe write fail, and terminated.

@jankaluza jankaluza force-pushed the container-libs-148 branch 2 times, most recently from 39ff334 to 153c68f Compare February 3, 2026 12:33
Refactor tar disassembly into a shared `runInputTarStream`,
`runInputTarStreamGoroutine` and `newInputTarStreamCommon` setup helper.

Add `NewInputTarStreamWithDone`, which returns an `io.ReadCloser`
together with a `done` channel that is signaled when the internal
goroutine finishes (or fails), including the final padding-draining
phase.

Preserve existing `NewInputTarStream` behavior and API.

Add unit tests covering:
- successful full reads
- early consumer close
- packer error propagation
- underlying reader failure while the tar-split goroutine is still
  running

Motivation: callers need a reliable way to (1) abort consumption when
they fail early and (2) block until the background goroutine
has terminated so the underlying input reader can be safely
released/reused. The wrapper guarantees the protocol
(close pipe + send exactly one done value) even on panics, while
preserving prompt termination when the consumer closes early.

Relates: containers/container-libs#148

Signed-off-by: Jan Kaluza <jkaluza@redhat.com>
@jankaluza jankaluza closed this Feb 3, 2026
@jankaluza jankaluza reopened this Feb 4, 2026
@jankaluza
Copy link
Author

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think this all works correctly; but looking at containers/podman#28018 , there’s quite a bit of careful code that needs to happen in the caller.

Clearly that can be done in a helper in c/storage, but maybe we could handle that all in this package?

func ConsumeInputTarStream[T](r io.Reader, p storage.Packer, fp storage.FilePutter, consumer func(r io.Reader) (T, error)) {
   // initialize the pipe
   val, err := consumer(tar)
   // _perhaps_, even, if consumer returned no error,
   // and saw a tar EOF but not a raw bytes EOF, consume
   // the trailing data here, automatically?!

   // wait until goroutine is done

   // return the most relevant error if any
}

(and AFAICS we can even remove the fp parameter, both of our consumers set “discard” there).

Admittedly designing that would be more work in this package, and we can easily take it on in container-libs. OTOH it might be a more elegant interface, easier to learn/use. @vbatts any preference?

p.mu.Lock()
defer p.mu.Unlock()
out := make([]storage.Entry, len(p.entries))
copy(out, p.entries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slices.Clone

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slices.Clone does not exist in go1.18 which is current target for tar-split.


out := buf.Bytes()
if extraPadding > 0 {
out = append(append([]byte(nil), out...), make([]byte, extraPadding)...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slices.Concat

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I know remember why I did not use it - slices.Concat does not exist in go1.18 which is current target for tar-split.

t.Fatalf("timeout waiting for underlying reader to enter blocked state")
}

// Now "close" the underlying reader while the tar-split goroutine is still running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused here — closing a reader while something is using it would generally be undefined.

“Now trigger an error to be returned from the underlying …” ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right that "close" the underlying reader would be undefined, but this test uses closableBlockingReader which simulates the "read failure/cancellcation.

It unblocks a pending Read and makes future reads to return errUnderlyingClosed.

The goal is to simulate an error returned from the underlying data source while the tar-split goroutine is still running to verify error propagation + done signaling.

I will change the code to make it easier to understand this.

@jankaluza
Copy link
Author

@mtrmac , Thanks for the review. I've pushed new commit which should address the issues you've found.

For your suggestion about even better interface, I will wait for @vbatts 's opinion :-).

@jankaluza
Copy link
Author

@vbatts , do you have any opinion on this PR? :-).

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

Refactor `newInputTarStreamCommon` to accept an optional
`done chan<- error` instead of a `withDone` bool.

Clarify documentation to state that callers must fully read the returned
reader.

Signed-off-by: Jan Kaluza <jkaluza@redhat.com>
@jankaluza
Copy link
Author

@mtrmac , I fixed the Deprecate comment.

Thinking how to proceed now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in applyDiffWithOptions

2 participants