Cursor.description reports TIMESTAMP_NTZ columns as 'timestamp' — the _NTZ distinction is lost on the SELECT path
Summary
For any SELECT-style query, cursor.description returns the same type_code ('timestamp') for columns/expressions of Spark type TIMESTAMP and Spark type TIMESTAMP_NTZ. The _NTZ distinction is erased before it reaches the DB-API caller, so downstream code that relies on cursor.description to decide whether a column is timezone-aware cannot tell the two apart.
This is the cursor-description analog of the SQLAlchemy reflection bug fixed in #295 / #296. That fix landed only on the DESCRIBE-driven reflection path; the SELECT path was not touched.
It is also structurally identical to the still-open #336 ("Cursor().description reports NULL (VOID) and INTERVAL fields as 'string'") — same root cause, different victim type.
Reproduction
from databricks import sql
with sql.connect(
server_hostname=...,
http_path=...,
access_token=...,
) as conn:
with conn.cursor() as cur:
cur.execute(
"SELECT "
" CAST('2024-10-07 12:00:00' AS TIMESTAMP) AS tz_aware, "
" CAST('2024-10-07 12:00:00' AS TIMESTAMP_NTZ) AS tz_naive"
)
for col in cur.description:
print(col[0], "->", col[1])
Observed output (against a Databricks SQL Warehouse, connector 4.2.6):
tz_aware -> timestamp
tz_naive -> timestamp
The same collapse happens for SELECT <ntz_column> FROM <table> where the underlying column is genuinely TIMESTAMP_NTZ (verified via DESCRIBE), and for explicit CAST(... AS TIMESTAMP_NTZ) over any expression.
The repo's own unit-test fixture pins this behavior:
tests/unit/test_util.py
("timestamp_column", "timestamp", None, None, None, None, None),
("timestamp_ntz_column", "timestamp", None, None, None, None, None),
Expected
cursor.description[i][1] should be 'timestamp_ntz' for genuine TIMESTAMP_NTZ columns/expressions, and 'timestamp' for TIMESTAMP. This matches the string returned by DESCRIBE on the same column and the type names already used by the SQLAlchemy dialect's reflection map (where "timestamp_ntz" is a valid key — see databricks-sqlalchemy/src/databricks/sqlalchemy/_parse.py).
Root cause
In src/databricks/sql/backend/thrift_backend.py, _col_to_description derives the type_code from the Thrift TTypeId enum name:
# thrift_backend.py, _col_to_description
name = ttypes.TTypeId._VALUES_TO_NAMES[type_entry.primitiveEntry.type]
# Drop _TYPE suffix
cleaned_type = (name[:-5] if name.endswith("_TYPE") else name).lower()
The Thrift TTypeId enum has a single TIMESTAMP_TYPE; there is no TIMESTAMP_NTZ_TYPE. Both Spark TIMESTAMP and Spark TIMESTAMP_NTZ columns arrive over the wire with primitiveEntry.type = TIMESTAMP_TYPE, so after the suffix-strip + lowercase both end up as 'timestamp'.
The same function already contains the mechanism needed to recover the lost distinction. Further down in _col_to_description:
# Extract variant type from field if available
if field is not None:
try:
if field.metadata and b"Spark:DataType:SqlName" in field.metadata:
sql_type = field.metadata.get(b"Spark:DataType:SqlName")
if sql_type == b"VARIANT":
cleaned_type = "variant"
except Exception as e:
logger.debug(f"Could not extract variant type from field: {e}")
The Arrow field metadata key Spark:DataType:SqlName carries the true Spark SQL type name, including the TIMESTAMP_NTZ distinction. PR #560 used this hook to recover the VARIANT type that was being collapsed to 'string'. The same hook can recover TIMESTAMP_NTZ collapsed to 'timestamp'.
Proposed fix
Extend the existing Spark:DataType:SqlName override in _col_to_description to also handle TIMESTAMP_NTZ:
if field is not None:
try:
if field.metadata and b"Spark:DataType:SqlName" in field.metadata:
sql_type = field.metadata.get(b"Spark:DataType:SqlName")
if sql_type == b"VARIANT":
cleaned_type = "variant"
elif sql_type == b"TIMESTAMP_NTZ":
cleaned_type = "timestamp_ntz"
except Exception as e:
logger.debug(f"Could not extract type from field metadata: {e}")
The same shape would cleanly extend to the cases in #336 (VOID, INTERVAL) if their Spark:DataType:SqlName metadata is populated by DBR — worth verifying as part of this fix.
The corresponding unit-test fixture in tests/unit/test_util.py would need to be updated to assert "timestamp_ntz" (rather than "timestamp") for the timestamp_ntz_column row.
Impact
Any consumer that uses cursor.description to classify columns as timezone-aware vs timezone-naive on Databricks must currently treat every timestamp column as tz-aware, or fall back to a separate DESCRIBE round trip to recover the _NTZ suffix. This is a real-world blocker for tools building on the connector — for example, any validation that requires "this time axis is tz-naive" cannot be expressed against a Databricks TIMESTAMP_NTZ column based on connector output alone.
Bumping the connector pin does not help: verified that 4.2.6 (latest at time of writing) still collapses the type. The TIMESTAMP_NTZ-related CHANGELOG entries since the initial fix concern parameter binding (TimestampNTZParameter) and SQLAlchemy reflection — neither touches the cursor.description path.
Related
Environment
databricks-sql-connector 4.2.6
- Backend: Databricks SQL Warehouse
- Behavior is independent of the Python or OS version.
Cursor.descriptionreportsTIMESTAMP_NTZcolumns as'timestamp'— the_NTZdistinction is lost on the SELECT pathSummary
For any SELECT-style query,
cursor.descriptionreturns the sametype_code('timestamp') for columns/expressions of Spark typeTIMESTAMPand Spark typeTIMESTAMP_NTZ. The_NTZdistinction is erased before it reaches the DB-API caller, so downstream code that relies oncursor.descriptionto decide whether a column is timezone-aware cannot tell the two apart.This is the cursor-description analog of the SQLAlchemy reflection bug fixed in #295 / #296. That fix landed only on the DESCRIBE-driven reflection path; the SELECT path was not touched.
It is also structurally identical to the still-open #336 ("
Cursor().descriptionreports NULL (VOID) and INTERVAL fields as'string'") — same root cause, different victim type.Reproduction
Observed output (against a Databricks SQL Warehouse, connector
4.2.6):The same collapse happens for
SELECT <ntz_column> FROM <table>where the underlying column is genuinelyTIMESTAMP_NTZ(verified viaDESCRIBE), and for explicitCAST(... AS TIMESTAMP_NTZ)over any expression.The repo's own unit-test fixture pins this behavior:
tests/unit/test_util.pyExpected
cursor.description[i][1]should be'timestamp_ntz'for genuineTIMESTAMP_NTZcolumns/expressions, and'timestamp'forTIMESTAMP. This matches the string returned byDESCRIBEon the same column and the type names already used by the SQLAlchemy dialect's reflection map (where"timestamp_ntz"is a valid key — seedatabricks-sqlalchemy/src/databricks/sqlalchemy/_parse.py).Root cause
In
src/databricks/sql/backend/thrift_backend.py,_col_to_descriptionderives thetype_codefrom the ThriftTTypeIdenum name:The Thrift
TTypeIdenum has a singleTIMESTAMP_TYPE; there is noTIMESTAMP_NTZ_TYPE. Both SparkTIMESTAMPand SparkTIMESTAMP_NTZcolumns arrive over the wire withprimitiveEntry.type = TIMESTAMP_TYPE, so after the suffix-strip + lowercase both end up as'timestamp'.The same function already contains the mechanism needed to recover the lost distinction. Further down in
_col_to_description:The Arrow field metadata key
Spark:DataType:SqlNamecarries the true Spark SQL type name, including theTIMESTAMP_NTZdistinction. PR #560 used this hook to recover theVARIANTtype that was being collapsed to'string'. The same hook can recoverTIMESTAMP_NTZcollapsed to'timestamp'.Proposed fix
Extend the existing
Spark:DataType:SqlNameoverride in_col_to_descriptionto also handleTIMESTAMP_NTZ:The same shape would cleanly extend to the cases in #336 (VOID, INTERVAL) if their
Spark:DataType:SqlNamemetadata is populated by DBR — worth verifying as part of this fix.The corresponding unit-test fixture in
tests/unit/test_util.pywould need to be updated to assert"timestamp_ntz"(rather than"timestamp") for thetimestamp_ntz_columnrow.Impact
Any consumer that uses
cursor.descriptionto classify columns as timezone-aware vs timezone-naive on Databricks must currently treat every timestamp column as tz-aware, or fall back to a separateDESCRIBEround trip to recover the_NTZsuffix. This is a real-world blocker for tools building on the connector — for example, any validation that requires "this time axis is tz-naive" cannot be expressed against a DatabricksTIMESTAMP_NTZcolumn based on connector output alone.Bumping the connector pin does not help: verified that
4.2.6(latest at time of writing) still collapses the type. TheTIMESTAMP_NTZ-related CHANGELOG entries since the initial fix concern parameter binding (TimestampNTZParameter) and SQLAlchemy reflection — neither touches thecursor.descriptionpath.Related
Cursor().descriptionreports NULL (VOID) and INTERVAL fields as'string'#336 — same root cause (_col_to_descriptionderives type fromTTypeIdenum), different victim types (VOID, INTERVAL).TIMESTAMP_NTZas a user-facing SQLAlchemy type; complementary, not a fix for the cursor path.Spark:DataType:SqlNameoverride hook used here.Environment
databricks-sql-connector4.2.6