From e9fceb7d5a7b27bca9f23d1f17563727f02ebfd3 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Mon, 16 Mar 2026 09:30:09 -0400 Subject: [PATCH 1/6] Improve getting started / testing guides for Humans and Agents --- AGENTS.md | 29 ++++--------- .../development_environment.md | 41 ++++++++++++++++--- docs/source/contributor-guide/index.md | 19 ++++++++- docs/source/contributor-guide/testing.md | 33 +++++++++++++++ 4 files changed, 93 insertions(+), 29 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index eeedbd8bc45ec..9bb980af45aba 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,33 +2,18 @@ ## Developer Documentation +- [Quick Start Setup](docs/source/contributor-guide/development_environment.md#quick-start) +- [Testing Quick Start](docs/source/contributor-guide/testing.md#testing-quick-start) +- [Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr) - [Contributor Guide](docs/source/contributor-guide/index.md) - [Architecture Guide](docs/source/contributor-guide/architecture.md) ## Before Committing -Before committing any changes, you **must** run the following checks and fix any issues: - -```bash -cargo fmt --all -cargo clippy --all-targets --all-features -- -D warnings -``` - -- `cargo fmt` ensures consistent code formatting across the project. -- `cargo clippy` catches common mistakes and enforces idiomatic Rust patterns. All warnings must be resolved (treated as errors via `-D warnings`). - -Do not commit code that fails either of these checks. +See [Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr) +for the required formatting and lint checks. ## Testing -Run relevant tests before submitting changes: - -```bash -cargo test --all-features -``` - -For SQL logic tests: - -```bash -cargo test -p datafusion-sqllogictest -``` +See the [Testing quick start](docs/source/contributor-guide/testing.md#testing-quick-start) +for the recommended pre-PR test commands. diff --git a/docs/source/contributor-guide/development_environment.md b/docs/source/contributor-guide/development_environment.md index 77910b3540dc1..0daea8f2c84ad 100644 --- a/docs/source/contributor-guide/development_environment.md +++ b/docs/source/contributor-guide/development_environment.md @@ -21,7 +21,38 @@ This section describes how you can get started at developing DataFusion. -## Windows setup +## Quick Start + +For the fastest path to a working local environment, follow these steps +from the repository root: + +```shell +# 1. Install Rust (https://rust-lang.org/tools/install/) and verify version with +rustup show + +# 2. Install protoc 3.15+ (see details below) +protoc --version + +# 3. Download test data used by examples and many tests +git submodule update --init --recursive + +# 4. Build the workspace +cargo build + +# 5. Verify that Rust integration tests can be run +cargo test -p datafusion --test parquet_integration + +# 6. Verify that sqllogictests can run +cargo test --profile=ci --test sqllogictests +``` + +Notes: + +- The pinned Rust version is defined in `rust-toolchain.toml`. +- `protoc` is required to compile DataFusion from source. +- Some tests and examples rely on git submodule data being present locally. + +## Windows Setup ```shell wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip @@ -34,7 +65,7 @@ cargo build DataFusion has support for [dev containers](https://containers.dev/) which may be used for developing DataFusion in an isolated environment either locally or remote if desired. Using dev containers for developing -DataFusion is not a requirement by any means but is available for those where doing local development could be tricky +DataFusion is not a requirement but is available where doing local development could be tricky such as with Windows and WSL2, those with older hardware, etc. For specific details on IDE support for dev containers see the documentation for [Visual Studio Code](https://code.visualstudio.com/docs/devcontainers/containers), @@ -42,11 +73,11 @@ For specific details on IDE support for dev containers see the documentation for [Rust Rover](https://www.jetbrains.com/help/rust/connect-to-devcontainer.html), and [GitHub Codespaces](https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/adding-a-dev-container-configuration/introduction-to-dev-containers). -## Protoc Installation +## `protoc` Installation Compiling DataFusion from sources requires an installed version of the protobuf compiler, `protoc`. -On most platforms this can be installed from your system's package manager +On most platforms this can be installed from your system's package manager. For example ``` # Ubuntu @@ -71,7 +102,7 @@ libprotoc 3.15.0 Alternatively a binary release can be downloaded from the [Release Page](https://github.com/protocolbuffers/protobuf/releases) or [built from source](https://github.com/protocolbuffers/protobuf/blob/main/src/README.md). -## Bootstrap environment +## Bootstrap Environment DataFusion is written in Rust and it uses a standard rust toolkit: diff --git a/docs/source/contributor-guide/index.md b/docs/source/contributor-guide/index.md index 2ee8a2aaac6cc..5d54cfe13921f 100644 --- a/docs/source/contributor-guide/index.md +++ b/docs/source/contributor-guide/index.md @@ -32,8 +32,10 @@ community as well as get more familiar with Rust and the relevant codebases. ## Development Environment -Setup your development environment [here](development_environment.md), and learn -how to test the code [here](testing.md). +Start with the [Development Environment Quick Start](development_environment.md#quick-start). + +For more detail, see the full [development environment guide](development_environment.md) +and the [testing guide](testing.md). ## Finding and Creating Issues to Work On @@ -99,6 +101,19 @@ If you are concerned that a larger design will be lost in a string of small PRs, Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR after merge. +## Before Submitting a PR + +Before submitting a PR, run the standard formatting and lint checks and fix any +issues they report: + +```bash +./ci/scripts/rust_fmt.sh +./ci/scripts/rust_clippy.sh +``` + +These scripts are the same checks run in CI for Rust formatting and clippy. +You should also run any relevant commands from the [testing quick start](testing.md#testing-quick-start). + ## Conventional Commits & Labeling PRs We generate change logs for each release using an automated process that will categorize PRs based on the title diff --git a/docs/source/contributor-guide/testing.md b/docs/source/contributor-guide/testing.md index 43b727211de77..b519ba22c5f0b 100644 --- a/docs/source/contributor-guide/testing.md +++ b/docs/source/contributor-guide/testing.md @@ -23,6 +23,39 @@ Tests are critical to ensure that DataFusion is working properly and is not accidentally broken during refactorings. All new features should have test coverage and the entire test suite is run as part of CI. +## Testing Quick Start + +While developing a feature or bug fix, best practice is to run the smallest set +of tests that gives confidence for your change, then expand as needed. + +Initially, run the tests in the crates you changed. For example if you made changes +to files in `datafusion-optimizer/src`, run the corresponding crate tests: + +```shell +cargo test -p datafusion-optimizer +``` + +Then, run the `sqllogictest` suite, which is the main regression suite for SQL +behavior and covers most DataFusion features. + +```shell +# run sqllogictests +cargo test --profile=ci --test sqllogictests +``` + +Finally, before submitting a PR, run the tests for the core `datafusion` and +`datafusion-cli` crates + +```shell +cargo test -p datafusion +cargo test -p datafusion-cli +``` + +Some integration tests require optional external services such as Docker-backed +containers and may skip when unavailable. + +## Testing Overview + DataFusion has several levels of tests in its [Test Pyramid] and tries to follow the Rust standard [Testing Organization] described in [The Book]. From 2ed084f0097937e9a37cdc36c93668f06e99656f Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Mon, 16 Mar 2026 15:34:37 -0400 Subject: [PATCH 2/6] touchups --- AGENTS.md | 2 +- docs/source/contributor-guide/development_environment.md | 4 ++-- docs/source/contributor-guide/testing.md | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 9bb980af45aba..04d8dc4d9b95b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,5 +15,5 @@ for the required formatting and lint checks. ## Testing -See the [Testing quick start](docs/source/contributor-guide/testing.md#testing-quick-start) +See the [Testing Quick Start](docs/source/contributor-guide/testing.md#testing-quick-start) for the recommended pre-PR test commands. diff --git a/docs/source/contributor-guide/development_environment.md b/docs/source/contributor-guide/development_environment.md index 0daea8f2c84ad..4d04d38c746f8 100644 --- a/docs/source/contributor-guide/development_environment.md +++ b/docs/source/contributor-guide/development_environment.md @@ -27,7 +27,7 @@ For the fastest path to a working local environment, follow these steps from the repository root: ```shell -# 1. Install Rust (https://rust-lang.org/tools/install/) and verify version with +# 1. Install Rust (https://rust-lang.org/tools/install/) and verify the active toolchain with rustup show # 2. Install protoc 3.15+ (see details below) @@ -77,7 +77,7 @@ For specific details on IDE support for dev containers see the documentation for Compiling DataFusion from sources requires an installed version of the protobuf compiler, `protoc`. -On most platforms this can be installed from your system's package manager. For example +On most platforms this can be installed from your system's package manager. For example: ``` # Ubuntu diff --git a/docs/source/contributor-guide/testing.md b/docs/source/contributor-guide/testing.md index b519ba22c5f0b..022e07d9d0674 100644 --- a/docs/source/contributor-guide/testing.md +++ b/docs/source/contributor-guide/testing.md @@ -28,7 +28,7 @@ should have test coverage and the entire test suite is run as part of CI. While developing a feature or bug fix, best practice is to run the smallest set of tests that gives confidence for your change, then expand as needed. -Initially, run the tests in the crates you changed. For example if you made changes +Initially, run the tests in the crates you changed. For example, if you made changes to files in `datafusion-optimizer/src`, run the corresponding crate tests: ```shell @@ -44,7 +44,7 @@ cargo test --profile=ci --test sqllogictests ``` Finally, before submitting a PR, run the tests for the core `datafusion` and -`datafusion-cli` crates +`datafusion-cli` crates: ```shell cargo test -p datafusion From 20db17d8c3eff7e0111188eed6c626b6ac0d83b8 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Mon, 16 Mar 2026 15:36:26 -0400 Subject: [PATCH 3/6] clean --- docs/source/contributor-guide/testing.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/contributor-guide/testing.md b/docs/source/contributor-guide/testing.md index 022e07d9d0674..304df064dc8b3 100644 --- a/docs/source/contributor-guide/testing.md +++ b/docs/source/contributor-guide/testing.md @@ -39,7 +39,6 @@ Then, run the `sqllogictest` suite, which is the main regression suite for SQL behavior and covers most DataFusion features. ```shell -# run sqllogictests cargo test --profile=ci --test sqllogictests ``` From 350b910b58b7064bd7251501b9cd671715dd2d0e Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 18 Mar 2026 14:44:36 -0400 Subject: [PATCH 4/6] Stronger AGENTS.md instructions --- AGENTS.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 04d8dc4d9b95b..acb0cbccaeec6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -10,8 +10,12 @@ ## Before Committing -See [Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr) -for the required formatting and lint checks. +Before committing any changes, you MUST follow the instructions in +[Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr) +and ensure the required checks listed there pass. Do not commit code that +fails any of those checks. + +When creating a PR, you MUST follow the [PR template](.github/pull_request_template.md). ## Testing From 37d8daa82ba078d0751e2d8bb73d78fc00960582 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 19 Mar 2026 11:38:59 -0400 Subject: [PATCH 5/6] Update docs/source/contributor-guide/testing.md Co-authored-by: Yongting You <2010youy01@gmail.com> --- docs/source/contributor-guide/testing.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/contributor-guide/testing.md b/docs/source/contributor-guide/testing.md index 304df064dc8b3..6b8e4568ec8ab 100644 --- a/docs/source/contributor-guide/testing.md +++ b/docs/source/contributor-guide/testing.md @@ -35,8 +35,7 @@ to files in `datafusion-optimizer/src`, run the corresponding crate tests: cargo test -p datafusion-optimizer ``` -Then, run the `sqllogictest` suite, which is the main regression suite for SQL -behavior and covers most DataFusion features. +Then, run the `sqllogictest` suite, which provides a strong speed–coverage tradeoff for development: it runs quickly while offering broad regression coverage across most SQL behavior in DataFusion. ```shell cargo test --profile=ci --test sqllogictests From 6411347502ed76a66ae049fc2e972c05649a195a Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Thu, 19 Mar 2026 11:48:22 -0400 Subject: [PATCH 6/6] Update to use rust_lint rather --- docs/source/contributor-guide/index.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/contributor-guide/index.md b/docs/source/contributor-guide/index.md index 5d54cfe13921f..4ace4be49499b 100644 --- a/docs/source/contributor-guide/index.md +++ b/docs/source/contributor-guide/index.md @@ -103,15 +103,15 @@ Note all commits in a PR are squashed when merged to the `main` branch so there ## Before Submitting a PR -Before submitting a PR, run the standard formatting and lint checks and fix any -issues they report: +Before submitting a PR, run the standard non-functional checks. PRs must pass +before merge. ```bash -./ci/scripts/rust_fmt.sh -./ci/scripts/rust_clippy.sh +./dev/rust_lint.sh +# use `--write` to automatically fix some formatting and lint errors +# ./dev/rust_lint.sh --write --allow-dirty ``` -These scripts are the same checks run in CI for Rust formatting and clippy. You should also run any relevant commands from the [testing quick start](testing.md#testing-quick-start). ## Conventional Commits & Labeling PRs