Skip to content

Fix AppendNextTokensToSequences heap overflow#2111

Open
apsonawane wants to merge 3 commits intomainfrom
asonawane/heap
Open

Fix AppendNextTokensToSequences heap overflow#2111
apsonawane wants to merge 3 commits intomainfrom
asonawane/heap

Conversation

@apsonawane
Copy link
Copy Markdown
Contributor

This pull request adds robust bounds checking to prevent writing past the allocated sequence buffer during token generation, both on CPU and CUDA backends. It ensures that sequence appending operations do not exceed the configured max_length, and introduces early error handling and logging when the maximum length is reached. These changes improve the safety and stability of sequence generation, preventing out-of-bounds writes and making debugging easier.

Sequence bounds checking and safety:

  • Added explicit checks in GreedySearch_Cuda::SampleTopKTopP, GreedySearch_Cuda::AppendTokens, GreedySearch_Cpu::AppendNextTokensToSequences, and BeamSearch_Cpu::AppendNextTokensToSequences to ensure tokens are only appended if the current sequence length is less than max_length, preventing buffer overflows. [1] [2] [3] [4]
  • Updated conditions to use >= instead of == when checking if the sequence has reached max_length, ensuring no further tokens are appended once the limit is reached. [1] [2]

Error handling and logging:

  • In Generator::GenerateNextToken, added an early runtime error if called when the sequence is already at max_length, providing a clear message for debugging.
  • Added logging for when the maximum sequence length is hit, making it easier to trace and debug issues related to buffer limits. [1] [2] [3]

State management improvements:

  • Ensured that the done_ state is only reset if the sequence buffer is not full, preserving the correct state and preventing accidental out-of-bounds writes on subsequent calls. [1] [2]

Copilot AI review requested due to automatic review settings April 30, 2026 20:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent out-of-bounds writes into the preallocated sequence buffers during generation by adding explicit max_length bounds checks across CPU and CUDA search implementations, and by failing fast when GenerateNextToken() is called after reaching max_length.

Changes:

  • Added early-return bounds checks in CPU greedy/beam append paths to prevent writing past the sequences buffer.
  • Added CUDA-side guards to avoid launching append kernels once the sequence length reaches max_length, and updated done/max-length checks.
  • Added an early GenerateNextToken() runtime error when sequence length is already at max_length.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/search.cpp Adds CPU-side bounds checks and adjusts done/reset behavior around appending tokens.
src/generators.cpp Adds a fast-fail guard in GenerateNextToken() when already at max_length.
src/cuda/search_cuda.cpp Adds CUDA-side conditional guards to avoid appending once at/over max_length, plus logging/done handling.

Comment thread src/generators.cpp Outdated
Comment thread src/search.cpp Outdated
@apsonawane apsonawane requested a review from a team as a code owner May 5, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants