fix: include plugin column types in display_sample_record()#365
fix: include plugin column types in display_sample_record()#365
Conversation
Replace hardcoded column type list with dynamic iteration over get_column_display_order(), which already includes plugin-registered types. Column types with dedicated display sections (SEED_DATASET, IMAGE, LLM_CODE, VALIDATION, LLM_JUDGE) are excluded from the "Generated Columns" table as before. Also display side_effect_columns for plugin column types, matching the existing behavior for CUSTOM columns. Fixes #345 Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Made-with: Cursor Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Made-with: Cursor
|
All contributors have signed the DCO ✍️ ✅ |
|
I have read the DCO document and I hereby sign the DCO. |
Greptile SummaryThis PR fixes issue #345 by replacing the hardcoded column-type list in Key changes:
The refactor is minimal and targeted — it removes a hardcoded list in favor of the canonical display order already used elsewhere in the codebase.
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/column_types.py | Adds is_plugin_column_type() utility that correctly identifies plugin column types by delegating to plugin_manager.get_column_generator_plugin_if_exists(). Handles both DataDesignerColumnType enum members and raw string values correctly. No issues found. |
| packages/data-designer-config/src/data_designer/config/utils/visualization.py | Replaces hardcoded non_code_columns list with dynamic loop over get_column_display_order(), introduces _DEDICATED_DISPLAY_COL_TYPES module-level set for clarity, and extends side-effect column rendering to plugin types. Logic is correct and consistent with existing patterns. The refactor properly delegates to the canonical display order used elsewhere in the codebase. |
| packages/data-designer-config/tests/config/test_columns.py | Adds parametrized test test_is_plugin_column_type_false_for_builtins that comprehensively verifies the is_plugin_column_type() helper returns False for all built-in enum members (both enum form and string value). Test is clear and correct. |
| packages/data-designer-config/tests/config/utils/test_visualization.py | Adds well-structured integration test test_display_sample_record_includes_plugin_column_types using SimpleNamespace and targeted patches to verify plugin column support. Correctly asserts presence of both primary plugin column and side-effect content in rendered HTML output. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[display_sample_record called] --> B[get_column_display_order]
B --> B1[built-in types list]
B --> B2[plugin_manager.get_plugin_column_types]
B2 --> B3[append plugin types]
B1 & B3 --> C[iterate display_order]
C --> D{col_type in<br/>_DEDICATED_DISPLAY_COL_TYPES?}
D -->|Yes: SEED_DATASET, IMAGE,<br/>LLM_CODE, VALIDATION, LLM_JUDGE| E[skip — rendered elsewhere]
D -->|No| F[get_columns_of_type col_type]
F --> G[append to non_code_columns]
G --> H[for col in non_code_columns]
H --> I{col.drop?}
I -->|True| H
I -->|False| J[add primary row to table]
J --> K{CUSTOM or<br/>is_plugin_column_type?}
K -->|True| L[iterate side_effect_columns]
L --> M{output_col in record?}
M -->|Yes| N[add side-effect row to table]
M -->|No| L
K -->|False| H
N --> L
Last reviewed commit: 3c66261
packages/data-designer-config/src/data_designer/config/utils/visualization.py
Outdated
Show resolved
Hide resolved
Add is_plugin_column_type() helper to column_types.py to avoid redundant plugin_manager.get_plugin_column_types() calls. Use it in display_sample_record() to show side_effect_columns for plugin column types, matching existing CUSTOM behavior. Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Made-with: Cursor
Column configs store column_type as a Literal string, not a DataDesignerColumnType enum. Accept both str and enum to avoid AttributeError when calling .value on a plain string. Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Made-with: Cursor
| table.add_row(col.name, convert_to_row_element(record[col.name])) | ||
| # Also display side_effect_columns for custom generators | ||
| if col.column_type == DataDesignerColumnType.CUSTOM: | ||
| if col.column_type == DataDesignerColumnType.CUSTOM or is_plugin_column_type(col.column_type): |
There was a problem hiding this comment.
is_plugin_column_type(col.column_type) invokes plugin_manager.get_column_generator_plugin_if_exists() once for every column in non_code_columns — including every built-in SAMPLER, LLM_TEXT, EXPRESSION, etc. column.
The full set of plugin column types is already known from get_column_display_order() (line 299), so building a frozenset once before the loop avoids repeated plugin-manager lookups:
display_order = get_column_display_order()
builtin_types = set(DataDesignerColumnType)
plugin_col_types = frozenset(ct for ct in display_order if ct not in builtin_types)
non_code_columns = []
for col_type in display_order:
if col_type not in _DEDICATED_DISPLAY_COL_TYPES:
non_code_columns.extend(config_builder.get_columns_of_type(col_type))
# ... later ...
if col.column_type == DataDesignerColumnType.CUSTOM or col.column_type in plugin_col_types:This derives the plugin-type set from a single source of truth without additional plugin-manager calls.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-config/src/data_designer/config/utils/visualization.py
Line: 312
Comment:
`is_plugin_column_type(col.column_type)` invokes `plugin_manager.get_column_generator_plugin_if_exists()` once for every column in `non_code_columns` — including every built-in SAMPLER, LLM_TEXT, EXPRESSION, etc. column.
The full set of plugin column types is already known from `get_column_display_order()` (line 299), so building a frozenset once before the loop avoids repeated plugin-manager lookups:
```python
display_order = get_column_display_order()
builtin_types = set(DataDesignerColumnType)
plugin_col_types = frozenset(ct for ct in display_order if ct not in builtin_types)
non_code_columns = []
for col_type in display_order:
if col_type not in _DEDICATED_DISPLAY_COL_TYPES:
non_code_columns.extend(config_builder.get_columns_of_type(col_type))
# ... later ...
if col.column_type == DataDesignerColumnType.CUSTOM or col.column_type in plugin_col_types:
```
This derives the plugin-type set from a single source of truth without additional plugin-manager calls.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
This "optimization" would reintroduce the maintenance burden of having to remember to update the frozenset.
packages/data-designer-config/src/data_designer/config/utils/visualization.py
Outdated
Show resolved
Hide resolved
packages/data-designer-config/src/data_designer/config/utils/visualization.py
Outdated
Show resolved
Hide resolved
packages/data-designer-config/tests/config/utils/test_visualization.py
Outdated
Show resolved
Hide resolved
packages/data-designer-config/tests/config/utils/test_visualization.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Looks good overall, nice fix! Left a few minor comments. The only thing I'd want a second opinion on is the side-effect column ordering change in visualization.py (side effects now render before the primary column, which flips existing CUSTOM behavior and is inconsistent with how validation/judge render after the columns they describe). @johnnygreco @nabinchha any thoughts on that?
packages/data-designer-config/src/data_designer/config/utils/visualization.py
Show resolved
Hide resolved
- Move test imports to module level per project convention - Replace plumbing-only test with one that renders a fake plugin column (with side-effect columns) to HTML and asserts the values appear in the output - Add parametrized test for is_plugin_column_type() verifying all built-in types return False for both enum and string forms Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Made-with: Cursor
| non_code_columns = [] | ||
| for col_type in get_column_display_order(): | ||
| if col_type not in _DEDICATED_DISPLAY_COL_TYPES: | ||
| non_code_columns.extend(config_builder.get_columns_of_type(col_type)) |
There was a problem hiding this comment.
Silent display-order change for EXPRESSION columns
The refactor delegates column ordering entirely to get_column_display_order(). Before this PR the hardcoded list placed EXPRESSION second (after SAMPLER):
SAMPLER → EXPRESSION → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → CUSTOM
With get_column_display_order() the canonical order is used, moving EXPRESSION much later:
SAMPLER → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → EXPRESSION → CUSTOM
This is a user-visible change in the "Generated Columns" table for any config that has both EXPRESSION and LLM_TEXT/LLM_STRUCTURED columns. If the reordering is intentional (aligning with the canonical display order), it would be worth a line in the PR description or changelog so users know what to expect.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-config/src/data_designer/config/utils/visualization.py
Line: 298-301
Comment:
**Silent display-order change for EXPRESSION columns**
The refactor delegates column ordering entirely to `get_column_display_order()`. Before this PR the hardcoded list placed `EXPRESSION` second (after `SAMPLER`):
```
SAMPLER → EXPRESSION → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → CUSTOM
```
With `get_column_display_order()` the canonical order is used, moving `EXPRESSION` much later:
```
SAMPLER → LLM_TEXT → LLM_STRUCTURED → EMBEDDING → EXPRESSION → CUSTOM
```
This is a user-visible change in the "Generated Columns" table for any config that has both EXPRESSION and LLM_TEXT/LLM_STRUCTURED columns. If the reordering is intentional (aligning with the canonical display order), it would be worth a line in the PR description or changelog so users know what to expect.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
this is arguably the correct change -- it aligns display_sample_record() with the canonical display order already used by config_builder.display() and analysis/reporting. The old hardcoded list was the outlier. cc: @johnnygreco @andreatgretel
Render primary column before side-effect columns, matching the existing CUSTOM column behavior. Avoids introducing display ordering discrepancies. Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Made-with: Cursor
Replace hardcoded column type list with dynamic iteration over
get_column_display_order(), which already includes plugin-registered
types. Column types with dedicated display sections (SEED_DATASET,
IMAGE, LLM_CODE, VALIDATION, LLM_JUDGE) are excluded from the
"Generated Columns" table as before.
Also display side_effect_columns for plugin column types (identified
via plugin_manager), matching the existing behavior for CUSTOM
columns. Side-effect columns are rendered before the primary column
output.
Fixes #345