Comet is an open-source project and contributors are welcome to work on any issues at any time, but we find it helpful to have a roadmap for some of the major items that require coordination between contributors.
Iceberg tables reads are now fully native, powered by a scan operator backed by Iceberg-rust (#2528). We anticipate major improvements expected in the next few releases, including bringing Iceberg table format V3 features (e.g., encryption) to the reader.
Comet has experimental support for Spark 4.0, but there is more work to do (#1637), such as enabling more Spark SQL tests and fully implementing ANSI support (#313) for all supported expressions.
Iceberg table scans support Dynamic Partition Pruning (DPP) filters generated by Spark's PlanDynamicPruningFilters
optimizer rule (#3349). However, we still need to bring this functionality to our Parquet reader. Furthermore,
Spark's PlanAdaptiveDynamicPruningFilters optimizer rule runs after Comet's rules, so DPP with Adaptive Query
Execution requires a redesign of Comet's plan translation. We are focused on implementing DPP to keep Comet competitive
with benchmarks that benefit from this feature like TPC-DS. This effort can be tracked at #3510.
In addition to the major initiatives above, we have the following ongoing areas of work:
- Adding support for more Spark expressions
- Moving more expressions to the
datafusion-sparkcrate in the core DataFusion repository - Performance tuning
- Nested type support improvements