Skip to content

SPJ with Bucket Partition Key: Error: Wrong class, expected java.lang.CharSequence, but was java.lang.Integer #15349

@ammarchalifah

Description

@ammarchalifah

Apache Iceberg version

Tried both:

  • iceberg-spark-runtime-3.5_2.12-1.10.1.jar
  • /usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar (from AWS EMR)

Query engine

Spark 3.5.6

Please describe the bug 🐞

I have a target table & input table, both are written in Iceberg. Based on the manifest, the partition spec of target table is:

"partition-specs":[{"spec-id":0,"fields":[{"name":"collected_date","transform":"identity","source-id":29,"field-id":1000},{"name":"user_id_bucket","transform":"bucket[32]","source-id":2,"field-id":1001}]}]

And partition spec of input table is

"partition-specs":[{"spec-id":0,"fields":[{"name":"collected_date","transform":"identity","source-id":29,"field-id":1000},{"name":"user_id_bucket","transform":"bucket[32]","source-id":2,"field-id":1001}]}]

The field user_id is originally a string column, but the bucketed value turns to Integer. I'm running SPJ by doing a MERGE INTO with join on user_id

MERGE INTO target AS t
        USING source AS s
        ON t.post_id = s.post_id AND t.user_id = s.user_id
        WHEN MATCHED AND (t.collected_at IS NULL OR t.collected_at <= s.collected_at)
            THEN UPDATE SET *
        WHEN NOT MATCHED
            THEN INSERT *

The MERGE INTO fails with this error

pyspark.errors.exceptions.captured.IllegalArgumentException: Wrong class, expected java.lang.CharSequence, but was java.lang.Integer, for object: 1

Here's the trace

2026-02-17 15:06:12,666 - job - ERROR - Error: Wrong class, expected java.lang.CharSequence, but was java.lang.Integer, for object: 1
Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1771340623606_0001/container_1771340623606_0001_01_000001/pyspark.zip/pyspark/sql/session.py", line 1631, in sql
  File "/mnt/yarn/usercache/hadoop/appcache/application_1771340623606_0001/container_1771340623606_0001_01_000001/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/mnt/yarn/usercache/hadoop/appcache/application_1771340623606_0001/container_1771340623606_0001_01_000001/pyspark.zip/pyspark/errors/exceptions/captured.py", line 185, in deco
pyspark.errors.exceptions.captured.IllegalArgumentException: Wrong class, expected java.lang.CharSequence, but was java.lang.Integer, for object: 1

Both target & source are stored with format-version=2

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions