Skip to content

Commit 5dabf9f

Browse files
committed
chore: Add Comet released artifacts and links to maven
1 parent c0d5d81 commit 5dabf9f

1 file changed

Lines changed: 26 additions & 87 deletions

File tree

docs/source/user-guide/latest/compatibility.md

Lines changed: 26 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,32 @@ this can be overridden by setting `spark.comet.regexp.allowIncompatible=true`.
6969
Comet's support for window functions is incomplete and known to be incorrect. It is disabled by default and
7070
should not be used in production. The feature will be enabled in a future release. Tracking issue: [#2721](https://github.com/apache/datafusion-comet/issues/2721).
7171

72+
## Round-Robin Partitioning
73+
74+
Comet's native shuffle implementation of round-robin partitioning (`df.repartition(n)`) is not compatible with
75+
Spark's implementation and is disabled by default. It can be enabled by setting
76+
`spark.comet.native.shuffle.partitioning.roundrobin.enabled=true`.
77+
78+
**Why the incompatibility exists:**
79+
80+
Spark's round-robin partitioning sorts rows by their binary `UnsafeRow` representation before assigning them to
81+
partitions. This ensures deterministic output for fault tolerance (task retries produce identical results).
82+
Comet uses Arrow format internally, which has a completely different binary layout than `UnsafeRow`, making it
83+
impossible to match Spark's exact partition assignments.
84+
85+
**Comet's approach:**
86+
87+
Instead of true round-robin assignment, Comet implements round-robin as hash partitioning on ALL columns. This
88+
achieves the same semantic goals:
89+
90+
- **Even distribution**: Rows are distributed evenly across partitions (as long as the hash varies sufficiently -
91+
in some cases there could be skew)
92+
- **Deterministic**: Same input always produces the same partition assignments (important for fault tolerance)
93+
- **No semantic grouping**: Unlike hash partitioning on specific columns, this doesn't group related rows together
94+
95+
The only difference is that Comet's partition assignments will differ from Spark's. When results are sorted,
96+
they will be identical to Spark. Unsorted results may have different row ordering.
97+
7298
## Cast
7399

74100
Cast operations in Comet fall into three levels of support:
@@ -84,104 +110,17 @@ Cast operations in Comet fall into three levels of support:
84110

85111
### Legacy Mode
86112

87-
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
88-
89113
<!--BEGIN:CAST_LEGACY_TABLE-->
90-
<!-- prettier-ignore-start -->
91-
| | binary | boolean | byte | date | decimal | double | float | integer | long | short | string | timestamp |
92-
|---|---|---|---|---|---|---|---|---|---|---|---|---|
93-
| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
94-
| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
95-
| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
96-
| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
97-
| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
98-
| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
99-
| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
100-
| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
101-
| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
102-
| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
103-
| string | C | C | C | C | I | C | C | C | C | C | - | I |
104-
| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
105-
<!-- prettier-ignore-end -->
106-
107-
**Notes:**
108-
- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
109-
- **double -> decimal**: There can be rounding differences
110-
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
111-
- **float -> decimal**: There can be rounding differences
112-
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
113-
- **string -> date**: Only supports years between 262143 BC and 262142 AD
114-
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
115-
or strings containing null bytes (e.g \\u0000)
116-
- **string -> timestamp**: Not all valid formats are supported
117114
<!--END:CAST_LEGACY_TABLE-->
118115

119116
### Try Mode
120117

121-
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
122-
123118
<!--BEGIN:CAST_TRY_TABLE-->
124-
<!-- prettier-ignore-start -->
125-
| | binary | boolean | byte | date | decimal | double | float | integer | long | short | string | timestamp |
126-
|---|---|---|---|---|---|---|---|---|---|---|---|---|
127-
| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
128-
| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
129-
| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
130-
| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
131-
| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
132-
| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
133-
| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
134-
| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
135-
| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
136-
| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
137-
| string | C | C | C | C | I | C | C | C | C | C | - | I |
138-
| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
139-
<!-- prettier-ignore-end -->
140-
141-
**Notes:**
142-
- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
143-
- **double -> decimal**: There can be rounding differences
144-
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
145-
- **float -> decimal**: There can be rounding differences
146-
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
147-
- **string -> date**: Only supports years between 262143 BC and 262142 AD
148-
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
149-
or strings containing null bytes (e.g \\u0000)
150-
- **string -> timestamp**: Not all valid formats are supported
151119
<!--END:CAST_TRY_TABLE-->
152120

153121
### ANSI Mode
154122

155-
<!-- WARNING! DO NOT MANUALLY MODIFY CONTENT BETWEEN THE BEGIN AND END TAGS -->
156-
157123
<!--BEGIN:CAST_ANSI_TABLE-->
158-
<!-- prettier-ignore-start -->
159-
| | binary | boolean | byte | date | decimal | double | float | integer | long | short | string | timestamp |
160-
|---|---|---|---|---|---|---|---|---|---|---|---|---|
161-
| binary | - | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | C | N/A |
162-
| boolean | N/A | - | C | N/A | U | C | C | C | C | C | C | U |
163-
| byte | U | C | - | N/A | C | C | C | C | C | C | C | U |
164-
| date | N/A | U | U | - | U | U | U | U | U | U | C | U |
165-
| decimal | N/A | C | C | N/A | - | C | C | C | C | C | C | U |
166-
| double | N/A | C | C | N/A | I | - | C | C | C | C | C | U |
167-
| float | N/A | C | C | N/A | I | C | - | C | C | C | C | U |
168-
| integer | U | C | C | N/A | C | C | C | - | C | C | C | U |
169-
| long | U | C | C | N/A | C | C | C | C | - | C | C | U |
170-
| short | U | C | C | N/A | C | C | C | C | C | - | C | U |
171-
| string | C | C | C | C | I | C | C | C | C | C | - | I |
172-
| timestamp | N/A | U | U | C | U | U | U | U | C | U | C | - |
173-
<!-- prettier-ignore-end -->
174-
175-
**Notes:**
176-
- **decimal -> string**: There can be formatting differences in some case due to Spark using scientific notation where Comet does not
177-
- **double -> decimal**: There can be rounding differences
178-
- **double -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
179-
- **float -> decimal**: There can be rounding differences
180-
- **float -> string**: There can be differences in precision. For example, the input "1.4E-45" will produce 1.0E-45 instead of 1.4E-45
181-
- **string -> date**: Only supports years between 262143 BC and 262142 AD
182-
- **string -> decimal**: Does not support fullwidth unicode digits (e.g \\uFF10)
183-
or strings containing null bytes (e.g \\u0000)
184-
- **string -> timestamp**: ANSI mode not supported
185124
<!--END:CAST_ANSI_TABLE-->
186125

187126
See the [tracking issue](https://github.com/apache/datafusion-comet/issues/286) for more details.

0 commit comments

Comments
 (0)