Skip to content

fix: include plugin column types in display_sample_record()#365

Open
3mei wants to merge 6 commits intomainfrom
yev/fix/345-display-sample-record-plugin-columns
Open

fix: include plugin column types in display_sample_record()#365
3mei wants to merge 6 commits intomainfrom
yev/fix/345-display-sample-record-plugin-columns

Conversation

@3mei
Copy link

@3mei 3mei commented Mar 3, 2026

Replace hardcoded column type list with dynamic iteration over
get_column_display_order(), which already includes plugin-registered
types. Column types with dedicated display sections (SEED_DATASET,
IMAGE, LLM_CODE, VALIDATION, LLM_JUDGE) are excluded from the
"Generated Columns" table as before.

Also display side_effect_columns for plugin column types (identified
via plugin_manager), matching the existing behavior for CUSTOM
columns. Side-effect columns are rendered before the primary column
output.

Fixes #345

Replace hardcoded column type list with dynamic iteration over
get_column_display_order(), which already includes plugin-registered
types. Column types with dedicated display sections (SEED_DATASET,
IMAGE, LLM_CODE, VALIDATION, LLM_JUDGE) are excluded from the
"Generated Columns" table as before.

Also display side_effect_columns for plugin column types, matching
the existing behavior for CUSTOM columns.

Fixes #345

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Made-with: Cursor
Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Made-with: Cursor
@3mei 3mei requested a review from a team as a code owner March 3, 2026 14:20
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@3mei
Copy link
Author

3mei commented Mar 3, 2026

I have read the DCO document and I hereby sign the DCO.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR fixes issue #345 by replacing the hardcoded column-type list in display_sample_record() with a dynamic loop over get_column_display_order(), which already includes plugin-registered types. It also introduces the is_plugin_column_type() helper in column_types.py and extends the side-effect-column rendering (previously limited to CUSTOM) to all plugin column types.

Key changes:

  • column_types.py: New is_plugin_column_type(column_type) utility that checks if a column type was registered by a plugin via plugin_manager.get_column_generator_plugin_if_exists().
  • visualization.py: non_code_columns is now populated dynamically from get_column_display_order() with proper exclusion of dedicated display columns (SEED_DATASET, IMAGE, LLM_CODE, VALIDATION, LLM_JUDGE); side-effect column rendering now includes plugin types alongside CUSTOM.
  • Tests: New parametrized test test_is_plugin_column_type_false_for_builtins verifies the helper returns False for all built-in enum members; test_display_sample_record_includes_plugin_column_types uses targeted patching to verify both the primary plugin column and its side-effect appear in the saved HTML output.

The refactor is minimal and targeted — it removes a hardcoded list in favor of the canonical display order already used elsewhere in the codebase.

Confidence Score: 5/5

  • Safe to merge — refactor is minimal and well-tested with no identified issues.
  • The PR makes targeted changes to replace a hardcoded column-type list with dynamic iteration over the canonical display order. The new is_plugin_column_type() helper is simple and correctly delegates to the plugin manager. Tests are comprehensive and verify both basic functionality and integration with plugin columns. The refactor removes code duplication and aligns display behavior with the canonical column ordering used elsewhere in the codebase.
  • No files require special attention

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/column_types.py Adds is_plugin_column_type() utility that correctly identifies plugin column types by delegating to plugin_manager.get_column_generator_plugin_if_exists(). Handles both DataDesignerColumnType enum members and raw string values correctly. No issues found.
packages/data-designer-config/src/data_designer/config/utils/visualization.py Replaces hardcoded non_code_columns list with dynamic loop over get_column_display_order(), introduces _DEDICATED_DISPLAY_COL_TYPES module-level set for clarity, and extends side-effect column rendering to plugin types. Logic is correct and consistent with existing patterns. The refactor properly delegates to the canonical display order used elsewhere in the codebase.
packages/data-designer-config/tests/config/test_columns.py Adds parametrized test test_is_plugin_column_type_false_for_builtins that comprehensively verifies the is_plugin_column_type() helper returns False for all built-in enum members (both enum form and string value). Test is clear and correct.
packages/data-designer-config/tests/config/utils/test_visualization.py Adds well-structured integration test test_display_sample_record_includes_plugin_column_types using SimpleNamespace and targeted patches to verify plugin column support. Correctly asserts presence of both primary plugin column and side-effect content in rendered HTML output.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[display_sample_record called] --> B[get_column_display_order]
    B --> B1[built-in types list]
    B --> B2[plugin_manager.get_plugin_column_types]
    B2 --> B3[append plugin types]
    B1 & B3 --> C[iterate display_order]

    C --> D{col_type in<br/>_DEDICATED_DISPLAY_COL_TYPES?}
    D -->|Yes: SEED_DATASET, IMAGE,<br/>LLM_CODE, VALIDATION, LLM_JUDGE| E[skip — rendered elsewhere]
    D -->|No| F[get_columns_of_type col_type]
    F --> G[append to non_code_columns]

    G --> H[for col in non_code_columns]
    H --> I{col.drop?}
    I -->|True| H
    I -->|False| J[add primary row to table]
    J --> K{CUSTOM or<br/>is_plugin_column_type?}
    K -->|True| L[iterate side_effect_columns]
    L --> M{output_col in record?}
    M -->|Yes| N[add side-effect row to table]
    M -->|No| L
    K -->|False| H
    N --> L
Loading

Last reviewed commit: 3c66261

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

3mei added 2 commits March 3, 2026 09:37
Add is_plugin_column_type() helper to column_types.py to avoid
redundant plugin_manager.get_plugin_column_types() calls. Use it
in display_sample_record() to show side_effect_columns for plugin
column types, matching existing CUSTOM behavior.

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Made-with: Cursor
Column configs store column_type as a Literal string, not a
DataDesignerColumnType enum. Accept both str and enum to avoid
AttributeError when calling .value on a plain string.

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Made-with: Cursor
table.add_row(col.name, convert_to_row_element(record[col.name]))
# Also display side_effect_columns for custom generators
if col.column_type == DataDesignerColumnType.CUSTOM:
if col.column_type == DataDesignerColumnType.CUSTOM or is_plugin_column_type(col.column_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_plugin_column_type(col.column_type) invokes plugin_manager.get_column_generator_plugin_if_exists() once for every column in non_code_columns — including every built-in SAMPLER, LLM_TEXT, EXPRESSION, etc. column.

The full set of plugin column types is already known from get_column_display_order() (line 299), so building a frozenset once before the loop avoids repeated plugin-manager lookups:

display_order = get_column_display_order()
builtin_types = set(DataDesignerColumnType)
plugin_col_types = frozenset(ct for ct in display_order if ct not in builtin_types)

non_code_columns = []
for col_type in display_order:
    if col_type not in _DEDICATED_DISPLAY_COL_TYPES:
        non_code_columns.extend(config_builder.get_columns_of_type(col_type))

# ... later ...
if col.column_type == DataDesignerColumnType.CUSTOM or col.column_type in plugin_col_types:

This derives the plugin-type set from a single source of truth without additional plugin-manager calls.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-config/src/data_designer/config/utils/visualization.py
Line: 312

Comment:
`is_plugin_column_type(col.column_type)` invokes `plugin_manager.get_column_generator_plugin_if_exists()` once for every column in `non_code_columns` — including every built-in SAMPLER, LLM_TEXT, EXPRESSION, etc. column.

The full set of plugin column types is already known from `get_column_display_order()` (line 299), so building a frozenset once before the loop avoids repeated plugin-manager lookups:

```python
display_order = get_column_display_order()
builtin_types = set(DataDesignerColumnType)
plugin_col_types = frozenset(ct for ct in display_order if ct not in builtin_types)

non_code_columns = []
for col_type in display_order:
    if col_type not in _DEDICATED_DISPLAY_COL_TYPES:
        non_code_columns.extend(config_builder.get_columns_of_type(col_type))

# ... later ...
if col.column_type == DataDesignerColumnType.CUSTOM or col.column_type in plugin_col_types:
```

This derives the plugin-type set from a single source of truth without additional plugin-manager calls.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "optimization" would reintroduce the maintenance burden of having to remember to update the frozenset.

Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(moved to inline comment)

Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, nice fix! Left a few minor comments. The only thing I'd want a second opinion on is the side-effect column ordering change in visualization.py (side effects now render before the primary column, which flips existing CUSTOM behavior and is inconsistent with how validation/judge render after the columns they describe). @johnnygreco @nabinchha any thoughts on that?

- Move test imports to module level per project convention
- Replace plumbing-only test with one that renders a fake plugin
  column (with side-effect columns) to HTML and asserts the values
  appear in the output
- Add parametrized test for is_plugin_column_type() verifying all
  built-in types return False for both enum and string forms

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Made-with: Cursor
Comment on lines +298 to +301
non_code_columns = []
for col_type in get_column_display_order():
if col_type not in _DEDICATED_DISPLAY_COL_TYPES:
non_code_columns.extend(config_builder.get_columns_of_type(col_type))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent display-order change for EXPRESSION columns

The refactor delegates column ordering entirely to get_column_display_order(). Before this PR the hardcoded list placed EXPRESSION second (after SAMPLER):

SAMPLER → EXPRESSION → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → CUSTOM

With get_column_display_order() the canonical order is used, moving EXPRESSION much later:

SAMPLER → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → EXPRESSION → CUSTOM

This is a user-visible change in the "Generated Columns" table for any config that has both EXPRESSION and LLM_TEXT/LLM_STRUCTURED columns. If the reordering is intentional (aligning with the canonical display order), it would be worth a line in the PR description or changelog so users know what to expect.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-config/src/data_designer/config/utils/visualization.py
Line: 298-301

Comment:
**Silent display-order change for EXPRESSION columns**

The refactor delegates column ordering entirely to `get_column_display_order()`. Before this PR the hardcoded list placed `EXPRESSION` second (after `SAMPLER`):

```
SAMPLER → EXPRESSION → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → CUSTOM
```

With `get_column_display_order()` the canonical order is used, moving `EXPRESSION` much later:

```
SAMPLER → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → EXPRESSION → CUSTOM
```

This is a user-visible change in the "Generated Columns" table for any config that has both EXPRESSION and LLM_TEXT/LLM_STRUCTURED columns. If the reordering is intentional (aligning with the canonical display order), it would be worth a line in the PR description or changelog so users know what to expect.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is arguably the correct change -- it aligns display_sample_record() with the canonical display order already used by config_builder.display() and analysis/reporting. The old hardcoded list was the outlier. cc: @johnnygreco @andreatgretel

Render primary column before side-effect columns, matching the
existing CUSTOM column behavior. Avoids introducing display
ordering discrepancies.

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>
Made-with: Cursor
Copy link
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @3mei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

display_sample_record() silently omits plugin-generated columns

3 participants