Conversation
Add arrow_convertor.py with convert_arrow_table_to_dataframe(), which turns the GoodData /binary execution endpoint response into a pandas DataFrame matching the JSON-path output (MultiIndex rows/columns, totals uppercased, transposition handled via x-gdc-view-v1.isTransposed). Wire it into DataFrameFactory.for_exec_def_arrow() replacing the previous .to_pandas() stub.
Add compute_row_totals_indexes() which derives row_totals_indexes from the Arrow table metadata and execution dimension headers, matching the DataFrameMetadata produced by the JSON path. Update for_exec_def_arrow() to return (DataFrame, DataFrameMetadata) for API parity with for_exec_def(), making the two paths interchangeable. primary_labels_from_index/columns are empty dicts as the Arrow path does not support use_primary_labels_in_attributes.
- Compute primary attribute labels from Arrow table metadata and populate DataFrameMetadata.primary_labels_from_index/columns (was returning empty dicts previously) - Add DataFrameFactory.for_arrow_table() for callers who hold a pa.Table already (raw export REST path, future Flight RPC); accepts optional BareExecutionResponse for accurate row_totals_indexes - Make DataFrameMetadata.execution_response Optional to support the no-execution-response path - Export convert_arrow_table_to_dataframe from gooddata_pandas.__init__
Add a thin method that polls the raw export endpoint for an already-submitted export and returns Arrow IPC bytes. Reuses the existing _get_exported_content() polling loop; the caller is responsible for submitting the execution and export request.
- read_result_arrow(): drain response into BytesIO before releasing the connection, eliminating the fragile try/finally ordering - Remove unused _FULL_TYPES_MAPPER dead code from arrow_convertor.py - Remove stale docstring claiming primary_labels are always empty dicts (compute_primary_labels() now populates them) - Add for_arrow_table() to DataFrameFactory class docstring - convert_arrow_table_to_dataframe stub in __init__.py now raises ImportError with a helpful pyarrow install hint instead of being silently absent
[project.optional-dependencies] was incorrectly placed between dependencies and classifiers in both gooddata-pandas and gooddata-sdk pyproject.toml files, causing TOML to parse classifiers as belonging to optional-dependencies instead of [project]. This caused hatchling to fail with "Dependency of option 'classifiers' is invalid". Regenerate uv.lock after fixing the TOML structure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The keyword form (executionResponse=) breaks the generated API client at runtime — __init__ expects execution_response as a positional arg. Revert to the original call and suppress the ty false-positive on the __new__ signature with type: ignore[invalid-argument-type].
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1489 +/- ##
==========================================
+ Coverage 77.32% 77.55% +0.22%
==========================================
Files 227 229 +2
Lines 14768 15054 +286
==========================================
+ Hits 11420 11675 +255
- Misses 3348 3379 +31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
gooddata_api_client is installed as a wheel in CI so ty cannot resolve its .pyi stubs, making the suppressions both ineffective and unnecessary there. Remove them to keep the code clean.
… paths - Parametrize test_compute_primary_labels against all 36 fixtures: verifies compute_primary_labels output against ground truth stored in meta.json, covering _compute_primary_labels_from_inline (identity branch) and _compute_primary_labels_from_fields across all cases. - Add test_primary_labels_from_inline_separate_column: exercises the branch where primaryLabelId != labelId and a separate column exists. - Add test_primary_labels_from_inline_fallback_identity: exercises the fallback branch where the primary label column is absent. - Add test_primary_labels_from_fields_skips_non_string: exercises the continue branch for non-string label_values / primary_label_values. - Add test_for_arrow_table_without_execution_response: tests DataFrameFactory.for_arrow_table with no execution_response, covering the no-server-needed path and verifying empty metadata fields. - Add test_arrow_converter_unknown_types_mapper: tests the ValueError raised for unrecognised types_mapper values.
packages/gooddata-sdk/pyproject.toml
Outdated
| ] | ||
|
|
||
| [project.optional-dependencies] | ||
| arrow = ["pyarrow>=16.1.0"] |
There was a problem hiding this comment.
Nitpick: this is a new dependency. Consider setting the threshold higher e.g., pyarrow>=23.0.1
Jira: CQ-105 risk: low
Jira: CQ-105 risk: low
| grand_totals_position=grand_totals_position, | ||
| ) | ||
|
|
||
| def for_exec_def_arrow( |
There was a problem hiding this comment.
Hm, I would probably do this as a parameter to the existing functions. Because ideally, we would like to use Arrow only moving forward. So some opt-in parameter that we can then flip the default value of would be a better migration experience. Wdyt?
There was a problem hiding this comment.
But the mappers make this a bit awkward 🤔
Jira: CQ-105 risk: low
No description provided.