Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3052 +/- ##
=============================================
- Coverage 56.12% 20.79% -35.33%
+ Complexity 976 493 -483
=============================================
Files 119 175 +56
Lines 11743 16165 +4422
Branches 2251 2681 +430
=============================================
- Hits 6591 3362 -3229
- Misses 4012 12273 +8261
+ Partials 1140 530 -610 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| file_source, | ||
| ) | ||
| .with_projection_indices(Some(projection_vector)) | ||
| .with_table_partition_cols(partition_fields) |
There was a problem hiding this comment.
with_partition_cols now derived automatically
https://datafusion.apache.org/library-user-guide/upgrading.html#refactoring-of-filesource-constructors-and-filescanconfigbuilder-to-accept-schemas-upfront
| make_decimal_scalar(a, precision, scale, &f) | ||
| } | ||
| ScalarValue::Float32(_) | ScalarValue::Float64(_) => Ok(ColumnarValue::Scalar( | ||
| ScalarValue::try_from_array(&round(&[a.to_array()?, args[1].to_array(1)?])?, 0)?, |
There was a problem hiding this comment.
the direct function is private now, creating UDF instead
| // Determine the schema to use for ParquetSource | ||
| let table_schema = if let Some(ref data_schema) = data_schema { |
There was a problem hiding this comment.
There is also TableSchema::with_table_partition_cols which might make this easier
| // Check for CastColumnExpr and replace with spark_expr::Cast | ||
| // CastColumnExpr is in datafusion_physical_expr::expressions | ||
| if let Some(cast) = expr | ||
| .as_any() | ||
| .downcast_ref::<datafusion::physical_expr::expressions::CastColumnExpr>() | ||
| { |
There was a problem hiding this comment.
Note from call: we are trading a CastColumnExpr -> Cast. The latter doesn't have the ability to handle struct casts. So we should check if we are casting from struct -> struct and if so not replace with the spark compatible cast.
I think this is another example of why we need to unify CastColumnExpr and Cast: ideally you'd be able to cast from struct<c1: int, c2: int>[] -> struct<c1: text>[] while applying your Spark casting rules to the c1: int -> c2: text cast but having DataFusion handle the list->struct->literal part.
5dabf9f to
e639805
Compare
|
I'm seeing issues like Checking if this can be addressed in Comet or in DataFusion Created a native test reproduce |
|
hm, somehow with DF52 the data array comes to UPD: |
This builds on the DF52 migration work in PR apache#3052 with fixes for failing Rust tests. Pushing as draft PR for CI validation. Fixes: - Date32 +/- Int8/Int16/Int32 arithmetic: Use SparkDateAdd/SparkDateSub UDFs since DF52's arrow-arith only supports Date32 +/- Interval types - Schema adapter nested types: Replace equals_datatype with PartialEq (==) so struct field name differences are detected and spark_parquet_convert is invoked for field-name-based selection - Schema adapter complex nested casts: Add fallback path (wrap_all_type_mismatches) when default adapter fails for complex nested type casts (List<Struct>, Map) - Schema adapter CastColumnExpr replacement: Route Struct/List/Map casts through CometCastColumnExpr with spark_parquet_convert, simple scalars through Spark Cast - Dictionary unpack tests: Restructure polling to handle DF52's FilterExec batch coalescer which accumulates rows before returning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Replaced by #3470 |
Which issue does this PR close?
Closes #3046 .
Rationale for this change
What changes are included in this PR?
How are these changes tested?