Skip to content

feat(backend/kernel): wire TSparkParameter through to kernel bind_param#789

Merged
vikrantpuppala merged 2 commits into
mainfrom
feat/kernel-bind-param
May 19, 2026
Merged

feat(backend/kernel): wire TSparkParameter through to kernel bind_param#789
vikrantpuppala merged 2 commits into
mainfrom
feat/kernel-bind-param

Conversation

@vikrantpuppala
Copy link
Copy Markdown
Contributor

@vikrantpuppala vikrantpuppala commented May 15, 2026

Summary

Lifts the NotSupportedError the kernel backend currently raises for parametrized queries. The kernel-side PyO3 binding (Statement.bind_param) shipped in databricks-sql-kernel#18; this PR wires the connector's TSparkParameter shape through to it.

Before: cursor.execute("SELECT ?", [IntegerParameter(42)]) on use_kernel=True raised NotSupportedError. After: positional parameter binding works end-to-end.

What it does

bind_tspark_params(kernel_stmt, parameters) in type_mapping.py forwards each TSparkParameter to the kernel as (ordinal, value.stringValue, type):

  • ordinal is the 1-based position in the parameters list (matches kernel's SEA-wire convention).
  • value.stringValue is the connector's already-string-encoded value (or None for SQL NULL).
  • type is the SQL type name the connector produces ("INT", "DECIMAL(10,2)", etc.).

The connector's ordinal: bool flag is checked only to reject named bindings — kernel v0 doesn't accept named bindings on the SEA wire, so we surface that at the connector layer with a pointed NotSupportedError rather than letting the server reject.

What's supported

Every type the connector's native parameter classes emit:

  • BOOLEAN, TINYINT / SMALLINT / INT / BIGINT, FLOAT / DOUBLE
  • STRING, DATE, TIMESTAMP / TIMESTAMP_NTZ, INTERVAL
  • DECIMAL(p,s) (precision/scale carried in the SQL type string; kernel parses them)
  • VOID / Python None

What's not supported (known gaps, raises NotSupportedError)

  • Named bindings (cursor.execute(":foo = ?", [param_named("foo", ...)])) — kernel v0 SEA wire is positional-only.
  • Compound types (ARRAY / MAP / STRUCT) — kernel parser rejects; surfaces as KernelError(InvalidArgument).
  • BINARY — SEA wire limitation kernel-side; documented workaround is hex-encode + unhex(?) in SQL.

Test plan

  • 6 new unit tests in tests/unit/test_kernel_type_mapping.py for bind_tspark_params — positional forwarding, None / VOID, named-binding rejection, missing-type defaulting, empty list.
  • Replaced test_execute_command_rejects_parameters (which previously asserted the NotSupportedError) with test_execute_command_forwards_parameters_to_bind_param: stubs the kernel statement, asserts bind_param is called once per TSparkParameter in order with 1-based ordinals, and execute() fires after binding.
  • Full kernel unit suite: 106/106 pass (up from 100 on main).
  • 3 new live e2e tests in tests/e2e/test_kernel_backend.py against dogfood:
    • Mixed-type round-trip (IntegerParameter, StringParameter, BooleanParameter)
    • None parameter
    • DECIMAL parameter (precision/scale flows through the SQL type string)
  • Full kernel e2e suite: 14/14 pass against dogfood (locally with the kernel wheel built from databricks-sql-kernel main).

This pull request and its description were written by Isaac.

Lifts the NotSupportedError that execute_command currently raises
for parametrized queries. The kernel-side PyO3 binding for
Statement.bind_param landed in databricks-sql-kernel#18; this
commit wires the connector's TSparkParameter shape through to it.

Implementation:

- New `bind_tspark_params(kernel_stmt, parameters)` in
  type_mapping.py forwards each TSparkParameter to the kernel as
  `(ordinal, value.stringValue, type)`. ordinal is the 1-based
  position in the parameters list; the connector's `ordinal: bool`
  flag is checked only to reject named bindings (kernel v0 doesn't
  accept them on the wire).
- execute_command no longer raises on `parameters=[...]`. The
  query_tags branch stays — that's a separate gap.

Tests:

- 6 new unit tests in tests/unit/test_kernel_type_mapping.py for
  the mapper:
  - positional forwarding preserves ordering and (ordinal, value, type)
  - None value forwards as SQL NULL
  - VOID passes through verbatim (kernel parser ignores value for VOID)
  - named bindings raise NotSupportedError with a pointed message
  - missing TSparkParameter.type defaults to STRING (defensive)
  - empty parameters list is a no-op
- `test_execute_command_rejects_parameters` (which previously
  asserted the NotSupportedError) replaced with
  `test_execute_command_forwards_parameters_to_bind_param` — stubs
  the kernel statement and verifies bind_param is called once per
  TSparkParameter in order with 1-based ordinals, and execute
  fires after binding.
- 3 new e2e tests in tests/e2e/test_kernel_backend.py against
  dogfood:
  - mixed-type round-trip (INT, STRING, BOOLEAN) via the
    connector's native IntegerParameter/StringParameter/BooleanParameter
  - None parameter (VoidParameter → SQL NULL)
  - DECIMAL parameter with precision/scale carried in the SQL type
    string (auto-inferred — explicit-arg path has a pre-existing
    bug in native.py where format-args are swapped)

106/106 kernel unit tests pass.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala vikrantpuppala force-pushed the feat/kernel-bind-param branch from aa8bd24 to 19c0bb6 Compare May 18, 2026 10:24
@vikrantpuppala vikrantpuppala changed the base branch from feat/kernel-backend to main May 18, 2026 10:24
@gopalldb
Copy link
Copy Markdown

P1 — Important (should fix)

  1. ArrayParameter / MapParameter are silently bound as NULL — data corruption risk
    type_mapping.py:113-147, cross-ref parameters/native.py:470-535

The PR docstring says compound types "route through the kernel parser which currently rejects them." Reality: ArrayParameter.as_tspark_param() and MapParameter.as_tspark_param() build
TSparkParameter(type="ARRAY"/"MAP", ordinal=True) with tsp.value = None and the real payload on tsp.arguments (a list of TSparkParameterValueArg). bind_tspark_params only reads param.value and param.type;
arguments is never inspected. So:

  • _tspark_param_value_str returns None
  • kernel_stmt.bind_param(1, None, "ARRAY") is called
  • Kernel binds a typed NULL — likely succeeds — and the user's list contents are silently discarded

cur.execute("SELECT array_contains(?, 1)", [ArrayParameter([1,2,3])]) returns NULL with no error from either side. This contradicts the documented behavior.

Fix: Detect compound types up front in bind_tspark_params and raise NotSupportedError:
if (param.type or "").split("(", 1)[0].upper() in {"ARRAY", "MAP", "STRUCT"}
or getattr(param, "arguments", None):
raise NotSupportedError(
f"Compound parameter types (got {param.type!r}) are not yet supported"
)
Add a unit test enforcing the guard.

  1. test_bind_tspark_params_void_passes_through pins a shape the connector never produces
    tests/unit/test_kernel_type_mapping.py:141-149

The test builds _mk_param(type="VOID", value="None") → p.value = TSparkParameterValue(stringValue="None"). But VoidParameter._tspark_param_value() (parameters/native.py:337-339) returns Python None, so the
real wire shape is TSparkParameter(value=None, type="VOID", ordinal=True) and _tspark_param_value_str returns None (because param.value is None). The test asserts [(1, "None", "VOID")] — production call is
(1, None, "VOID"). The _tspark_param_value_str docstring at type_mapping.py:99-110 is misleading on the same point. Code is correct; the test just doesn't verify it.

Fix: Change _mk_param(type="VOID", value="None") to _mk_param(type="VOID", value=None) and assert [(1, None, "VOID")]. Correct the docstring.

  1. Named-binding discriminator can mis-route when ordinal is None
    type_mapping.py:134

Thrift defaults ordinal=None, name=None. The check if getattr(param, "ordinal", None) is False and getattr(param, "name", None): only triggers when ordinal is exactly False. A TSparkParameter with
ordinal=None and a non-empty name (defensive caller, or a future code path that forgets to set ordinal) skips the guard and binds positionally with the name silently dropped. In-tree callers always set
ordinal=True/False, so unlikely today — but the guard is presented as a robustness net.

Fix:
name = getattr(param, "name", None)
if name and getattr(param, "ordinal", None) is not True:
raise NotSupportedError(...)

P2 — Minor

  • Misleading lazy-import comment (client.py:243-248) — claims to avoid eager pyarrow load, but KernelResultSet (imported at top) already imports description_from_arrow_schema from type_mapping. Move to
    module top or fix the comment.
  • NotSupportedError re-imported inside the loop body (type_mapping.py:135) — cached, but cleaner at module top.
  • Partial-bind state on per-param failure (client.py:240-273) — relies on stmt.close() releasing partially-bound state. Worth a one-line confirming comment.
  • SqlType.* refactor verified safe — SqlType.BOOLEAN == "boolean", SqlType.INT == "int" etc., all lowercase matching the prior literals. No cursor.description consumer break.
  • type=None defaults to "STRING" (type_mapping.py:142) — connector's native classes always set type, so this only triggers for hand-rolled TSparkParameter. Consider raising instead of silently defaulting, to
    surface upstream bugs.

vikrantpuppala added a commit that referenced this pull request May 19, 2026
Address review feedback on #789:

- ArrayParameter / MapParameter / StructParameter put their payload
  on TSparkParameter.arguments (not .value); the previous binding
  path forwarded value=None and silently bound a typed NULL. Reject
  compound types up front with NotSupportedError so callers see a
  clear error instead of silent data loss.
- Tighten the named-binding guard to fire whenever a name is set
  and ordinal is not exactly True (Thrift defaults ordinal to None;
  the prior check only triggered on ordinal=False).
- Correct test_bind_tspark_params_void_passes_through: VoidParameter
  yields value=None on the wire, so the production call is
  (1, None, "VOID"), not (1, "None", "VOID"). Fix the misleading
  docstring on _tspark_param_value_str to match.
- Remove the misleading lazy-import comment in client.py — pyarrow
  is already loaded transitively via KernelResultSet. Move
  bind_tspark_params and NotSupportedError imports to module top.

Co-authored-by: Isaac
@vikrantpuppala
Copy link
Copy Markdown
Contributor Author

Thanks @gopalldb — addressed in f384f1a. Summary:

P1 #1 — compound types silently binding NULL (data loss)

You were right, this was a real bug. Added a guard in bind_tspark_params that rejects ARRAY / MAP / STRUCT (and any param with arguments set) up front with NotSupportedError. Detects both the SQL-type name (strips (...) and <...> suffixes so MAP(string,int) and STRUCT<a:int> are caught) and the presence of arguments so a hand-rolled compound TSparkParameter is also covered. 6 parametrized + 1 standalone unit tests enforce the guard.

P1 #2 — VOID test pinned the wrong shape

Confirmed: VoidParameter._tspark_param_value() returns Python None, so the production call is (1, None, "VOID"). Updated the test and corrected the misleading docstring on _tspark_param_value_str.

P1 #3 — named-binding discriminator

Tightened from ordinal is False and namename and ordinal is not True, so a future caller that forgets to set ordinal=True with a name set still gets rejected instead of silently dropping the name. Added a defensive test for the ordinal=None case.

P2 cleanups

  • Removed the misleading lazy-import comment in client.py. You're right that pyarrow is already loaded transitively via KernelResultSet (which imports description_from_arrow_schema from type_mapping at module top). Import moved to module top.
  • NotSupportedError moved out of the loop body to module top in type_mapping.py.

P2 not changed (with rationale)

  • type=None defaulting to "STRING": left as-is. The existing test test_bind_tspark_params_missing_type_defaults_to_string pins this defensive behavior, and the connector's native classes always set type so this only triggers for hand-rolled TSparkParameter. Happy to flip to raising if you'd prefer to surface upstream bugs more aggressively.
  • Partial-bind state confirming comment: not added. stmt.close() in the finally block is the standard kernel lifecycle pattern used throughout this file; calling it out only here felt asymmetric.

Tests

  • 35/35 in tests/unit/test_kernel_type_mapping.py pass (was 28; added 7 new cases).
  • Full kernel unit suite: 114/114 (was 106).
  • E2E suite unchanged from PR description.

Address review feedback on #789:

- ArrayParameter / MapParameter / StructParameter put their payload
  on TSparkParameter.arguments (not .value); the previous binding
  path forwarded value=None and silently bound a typed NULL. Reject
  compound types up front with NotSupportedError so callers see a
  clear error instead of silent data loss.
- Tighten the named-binding guard to fire whenever a name is set
  and ordinal is not exactly True (Thrift defaults ordinal to None;
  the prior check only triggered on ordinal=False).
- Correct test_bind_tspark_params_void_passes_through: VoidParameter
  yields value=None on the wire, so the production call is
  (1, None, "VOID"), not (1, "None", "VOID"). Fix the misleading
  docstring on _tspark_param_value_str to match.
- Remove the misleading lazy-import comment in client.py — pyarrow
  is already loaded transitively via KernelResultSet. Move
  bind_tspark_params and NotSupportedError imports to module top.

Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
@vikrantpuppala vikrantpuppala force-pushed the feat/kernel-bind-param branch 2 times, most recently from f384f1a to b60b3c9 Compare May 19, 2026 09:25
@vikrantpuppala vikrantpuppala merged commit 98da799 into main May 19, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants