Skip to content

fix(enrichment): preserve memory enrichment table state on reload#25547

Open
esensar wants to merge 2 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state
Open

fix(enrichment): preserve memory enrichment table state on reload#25547
esensar wants to merge 2 commits into
vectordotdev:masterfrom
esensar:fix/memory-table-reload-state

Conversation

@esensar
Copy link
Copy Markdown
Contributor

@esensar esensar commented Jun 1, 2026

Summary

While working on #25143, it was brought to my attention that reload was not handled for memory tables (#25143 (comment)) - that is, new components were generated, that were not attached to the tables that were queried. This PR resolves that by making these tables take over the state of previous components.

Vector configuration

enrichment_tables:
  memory_table:
    type: memory
    ttl: 60
    flush_interval: 5
    inputs: ["cache_generator"]


sources:
  demo_logs_test:
    type: "demo_logs"
    format: "json"

transforms:
  demo_logs_processor:
    type: "remap"
    inputs: ["demo_logs_test"]
    source: |
      . = parse_json!(.message)
      user_id = get!(., path: ["user-identifier"])

      existing, err = get_enrichment_table_record("memory_table", { "key": user_id })

      if err == null {
        . = existing.value
        .source = "cache"
      } else {
        .referer = parse_url!(.referer)
        .referer.host = encode_punycode!(.referer.host)
        .source = "transform"
      }      

  cache_generator:
    type: "remap"
    inputs: ["demo_logs_processor"]
    source: |
      existing, err = get_enrichment_table_record("memory_table", { "key": get!(., path: ["user-identifier"]) })
      if err != null {
        data = .
        . = set!(value: {}, path: [get!(data, path: ["user-identifier"])], data: data)
      } else {
        . = {}
      }      

sinks:
  console:
    inputs: ["demo_logs_processor"]
    target: "stdout"
    type: "console"
    encoding:
      codec: "json"

How did you test this PR?

Ran vector with the above configuration and the --watch-config flag. Changed TTL a couple of times and Vector properly reloaded and kept state, observed by seeing cached output data, instead of newly generated.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

Sponsored by Quad9

@esensar esensar requested a review from a team as a code owner June 1, 2026 12:51
@github-actions github-actions Bot added the domain: topology Anything related to Vector's topology code label Jun 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6583fd70e4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/topology/running.rs
Comment on lines +983 to +985
if diff.contains(&input.component)
|| diff.is_changed(key)
|| inputs_to_add.contains(&input)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace paused inputs instead of adding duplicates

When a regular sink or transform is changed but keeps the same unchanged upstream input, shutdown_diff pauses the old fanout entry rather than removing it; with this new diff.is_changed(key) branch, reconnecting now sends ControlMessage::Add instead of Replace. The fanout still contains the paused component key, so Fanout::add asserts on a duplicate output id and the topology can panic during a config reload of any changed component whose input did not change.

Useful? React with 👍 / 👎.

Comment thread src/topology/builder.rs
Comment on lines +223 to +228
if !self.diff.enrichment_tables.is_added(name)
&& let Some(existing_table) = ENRICHMENT_TABLES.get(&table_name)
&& existing_table.stateful()
&& table.stateful()
{
match table.take_state(existing_table) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve state in the memory config cache too

For a changed memory enrichment table, this transfers the old handles only into the boxed table that will be inserted into ENRICHMENT_TABLES; however MemoryConfig::get_or_build_memory has already cached the freshly built empty Memory, and the rebuilt sink/source later clone that cached instance. After such a reload, transforms read the preserved registry table while the new memory sink writes to a different empty table, so post-reload inserts are no longer visible to enrichment lookups.

Useful? React with 👍 / 👎.

@pront
Copy link
Copy Markdown
Member

pront commented Jun 1, 2026

Thanks @esensar! Per our new policy I will come back to this once codex comments are resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants