Skip to content

Add apache-datafusion provider skelton#64998

Open
gopidesupavan wants to merge 2 commits into
apache:mainfrom
gopidesupavan:apache-datafusion-provider
Open

Add apache-datafusion provider skelton#64998
gopidesupavan wants to merge 2 commits into
apache:mainfrom
gopidesupavan:apache-datafusion-provider

Conversation

@gopidesupavan
Copy link
Copy Markdown
Member


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@boring-cyborg boring-cyborg Bot added area:dev-tools area:providers backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch kind:documentation labels Apr 10, 2026
@gopidesupavan gopidesupavan changed the title Add apache-datafusion provider Add apache-datafusion provider skelton Apr 10, 2026
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Srabasti
Copy link
Copy Markdown
Contributor

Thanks for adding this new provider @gopidesupavan!!
Looks like the static checks are failing since the the new provider apache.datafusion is not listed in path below.
https://github.com/apache/airflow/blob/main/airflow-core/docs/extra-packages-ref.rst

Direct link:
https://github.com/apache/airflow/blob/main/airflow-core/docs/extra-packages-ref.rst#apache-software-extras

Suggest to add this to the relevant reference.

| apache.datafusion | pip install apache-airflow[apache.datafusion] | Apache.datafusion hooks and operators |

Copy link
Copy Markdown
Contributor

@Srabasti Srabasti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link "https://airflow.apache.org/docs/apache-airflow-providers-apache-datafusion/0.1.0" gives 404 Error, as below.

Image

I would be happy to update the link if you permit Sir! Was curious to learn more about this new provider, hence was checking out the links.

@gopidesupavan
Copy link
Copy Markdown
Member Author

Link "https://airflow.apache.org/docs/apache-airflow-providers-apache-datafusion/0.1.0" gives 404 Error, as below.

Image I would be happy to update the link if you permit Sir! Was curious to learn more about this new provider, hence was checking out the links.

these will be published part of the release for now not required..

@gopidesupavan
Copy link
Copy Markdown
Member Author

Link "https://airflow.apache.org/docs/apache-airflow-providers-apache-datafusion/0.1.0" gives 404 Error, as below.

Image I would be happy to update the link if you permit Sir! Was curious to learn more about this new provider, hence was checking out the links.

yes look at this about https://datafusion.apache.org/python/ functionalities it provides

+---------------------+-----------------------------------------------------+------------------------------------------------+
| apache-beam | ``pip install 'apache-airflow[apache-beam]'`` | Apache Beam operators & hooks |
+---------------------+-----------------------------------------------------+------------------------------------------------+
| apache-datafusion | ``pip install 'apache-airflow[apache-datafusion]'`` | Apache DataFusion provider package |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alphabetical order: apache-cassandra should come before apache-datafusion here. The entry was inserted between apache-beam and apache-cassandra, which breaks the alphabetical sort the rest of the table follows. Move it down two rows so the order reads beam, cassandra, datafusion, drill.

Comment thread .github/boring-cyborg.yml
provider:apache-beam:
- providers/apache/beam/**

provider:apache-datafusion:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same alphabetical-order issue as the docs table. Current order is apache-beam -> apache-datafusion -> apache-cassandra -> apache-drill. The labeler entries are alphabetical elsewhere in this file. Move the apache-datafusion block below apache-cassandra.


.. note::

This provider is currently not ready and only contains the initial package skeleton.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is marked "AUTOMATICALLY GENERATED" at the top, and the "currently not ready / package skeleton" sentence isn't in PROVIDER_README_TEMPLATE.rst.jinja2. It will be clobbered the next time the README is regenerated (at release time). If you want a durable disclaimer, the right place is either the template or the description: field in provider.yaml.

description: |
`Apache DataFusion <https://datafusion.apache.org/>`__

state: not-ready
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other in-tree skeleton providers (vespa, akeyless, common-ai, informatica) use state: ready with lifecycle: incubation even when the package only contains the skeleton. state: not-ready here will exclude the provider from regular builds and releases (see valid_states handling in dev/breeze/src/airflow_breeze/utils/packages.py). Is the intent to defer the first release until hooks/operators land? If so, fine. If you wanted "release at 0.1.0 as an incubating provider," switch to state: ready to match the others.

provider_info = "airflow.providers.apache.datafusion.get_provider_info:get_provider_info"

[tool.flit.module]
name = "airflow.providers.apache.datafusion"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other in-tree flit providers (cassandra, vespa, akeyless, ...) carry an explicit [tool.flit.sdist] block directly after [tool.flit.module], with the comment "Explicit sdist contents so the build does not rely on VCS information (flit 4.0 makes --no-use-vcs the default -- see pypa/flit#782)." This file is missing that block, so the sdist contents will depend on VCS state. Looks like the pyproject was generated from an older template -- regenerating (or copying the block from a recent provider like providers/apache/cassandra/pyproject.toml) should add it.



def test_example():
assert True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert True will pass even if the provider metadata is wrong, the entry point is broken, or the package fails to import. Since this PR adds a new provider package, it would be worth asserting something real -- e.g. that airflow.providers.apache.datafusion.get_provider_info.get_provider_info() returns the expected package-name and name. That catches both the import path and the provider registration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

uv.lock on main just moved via #67860 ("[main] Upgrade important CI environment"), commit 707e316 and this PR currently conflicts.

Quickest fix:

git fetch upstream main && git rebase upstream/main
rm uv.lock && uv lock
git add uv.lock && git rebase --continue
git push --force-with-lease

Automated nudge — ignore if you're not ready to rebase. This comment is updated in place on future uv.lock bumps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:providers backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants