diff --git a/AGENTS.md b/AGENTS.md index eeedbd8bc45ec..acb0cbccaeec6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -2,33 +2,22 @@ ## Developer Documentation +- [Quick Start Setup](docs/source/contributor-guide/development_environment.md#quick-start) +- [Testing Quick Start](docs/source/contributor-guide/testing.md#testing-quick-start) +- [Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr) - [Contributor Guide](docs/source/contributor-guide/index.md) - [Architecture Guide](docs/source/contributor-guide/architecture.md) ## Before Committing -Before committing any changes, you **must** run the following checks and fix any issues: +Before committing any changes, you MUST follow the instructions in +[Before Submitting a PR](docs/source/contributor-guide/index.md#before-submitting-a-pr) +and ensure the required checks listed there pass. Do not commit code that +fails any of those checks. -```bash -cargo fmt --all -cargo clippy --all-targets --all-features -- -D warnings -``` - -- `cargo fmt` ensures consistent code formatting across the project. -- `cargo clippy` catches common mistakes and enforces idiomatic Rust patterns. All warnings must be resolved (treated as errors via `-D warnings`). - -Do not commit code that fails either of these checks. +When creating a PR, you MUST follow the [PR template](.github/pull_request_template.md). ## Testing -Run relevant tests before submitting changes: - -```bash -cargo test --all-features -``` - -For SQL logic tests: - -```bash -cargo test -p datafusion-sqllogictest -``` +See the [Testing Quick Start](docs/source/contributor-guide/testing.md#testing-quick-start) +for the recommended pre-PR test commands. diff --git a/docs/source/contributor-guide/development_environment.md b/docs/source/contributor-guide/development_environment.md index 77910b3540dc1..4d04d38c746f8 100644 --- a/docs/source/contributor-guide/development_environment.md +++ b/docs/source/contributor-guide/development_environment.md @@ -21,7 +21,38 @@ This section describes how you can get started at developing DataFusion. -## Windows setup +## Quick Start + +For the fastest path to a working local environment, follow these steps +from the repository root: + +```shell +# 1. Install Rust (https://rust-lang.org/tools/install/) and verify the active toolchain with +rustup show + +# 2. Install protoc 3.15+ (see details below) +protoc --version + +# 3. Download test data used by examples and many tests +git submodule update --init --recursive + +# 4. Build the workspace +cargo build + +# 5. Verify that Rust integration tests can be run +cargo test -p datafusion --test parquet_integration + +# 6. Verify that sqllogictests can run +cargo test --profile=ci --test sqllogictests +``` + +Notes: + +- The pinned Rust version is defined in `rust-toolchain.toml`. +- `protoc` is required to compile DataFusion from source. +- Some tests and examples rely on git submodule data being present locally. + +## Windows Setup ```shell wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/VirtualBox/MSEdge/MSEdge.Win10.VirtualBox.zip @@ -34,7 +65,7 @@ cargo build DataFusion has support for [dev containers](https://containers.dev/) which may be used for developing DataFusion in an isolated environment either locally or remote if desired. Using dev containers for developing -DataFusion is not a requirement by any means but is available for those where doing local development could be tricky +DataFusion is not a requirement but is available where doing local development could be tricky such as with Windows and WSL2, those with older hardware, etc. For specific details on IDE support for dev containers see the documentation for [Visual Studio Code](https://code.visualstudio.com/docs/devcontainers/containers), @@ -42,11 +73,11 @@ For specific details on IDE support for dev containers see the documentation for [Rust Rover](https://www.jetbrains.com/help/rust/connect-to-devcontainer.html), and [GitHub Codespaces](https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/adding-a-dev-container-configuration/introduction-to-dev-containers). -## Protoc Installation +## `protoc` Installation Compiling DataFusion from sources requires an installed version of the protobuf compiler, `protoc`. -On most platforms this can be installed from your system's package manager +On most platforms this can be installed from your system's package manager. For example: ``` # Ubuntu @@ -71,7 +102,7 @@ libprotoc 3.15.0 Alternatively a binary release can be downloaded from the [Release Page](https://github.com/protocolbuffers/protobuf/releases) or [built from source](https://github.com/protocolbuffers/protobuf/blob/main/src/README.md). -## Bootstrap environment +## Bootstrap Environment DataFusion is written in Rust and it uses a standard rust toolkit: diff --git a/docs/source/contributor-guide/index.md b/docs/source/contributor-guide/index.md index 2ee8a2aaac6cc..4ace4be49499b 100644 --- a/docs/source/contributor-guide/index.md +++ b/docs/source/contributor-guide/index.md @@ -32,8 +32,10 @@ community as well as get more familiar with Rust and the relevant codebases. ## Development Environment -Setup your development environment [here](development_environment.md), and learn -how to test the code [here](testing.md). +Start with the [Development Environment Quick Start](development_environment.md#quick-start). + +For more detail, see the full [development environment guide](development_environment.md) +and the [testing guide](testing.md). ## Finding and Creating Issues to Work On @@ -99,6 +101,19 @@ If you are concerned that a larger design will be lost in a string of small PRs, Note all commits in a PR are squashed when merged to the `main` branch so there is one commit per PR after merge. +## Before Submitting a PR + +Before submitting a PR, run the standard non-functional checks. PRs must pass +before merge. + +```bash +./dev/rust_lint.sh +# use `--write` to automatically fix some formatting and lint errors +# ./dev/rust_lint.sh --write --allow-dirty +``` + +You should also run any relevant commands from the [testing quick start](testing.md#testing-quick-start). + ## Conventional Commits & Labeling PRs We generate change logs for each release using an automated process that will categorize PRs based on the title diff --git a/docs/source/contributor-guide/testing.md b/docs/source/contributor-guide/testing.md index 43b727211de77..6b8e4568ec8ab 100644 --- a/docs/source/contributor-guide/testing.md +++ b/docs/source/contributor-guide/testing.md @@ -23,6 +23,37 @@ Tests are critical to ensure that DataFusion is working properly and is not accidentally broken during refactorings. All new features should have test coverage and the entire test suite is run as part of CI. +## Testing Quick Start + +While developing a feature or bug fix, best practice is to run the smallest set +of tests that gives confidence for your change, then expand as needed. + +Initially, run the tests in the crates you changed. For example, if you made changes +to files in `datafusion-optimizer/src`, run the corresponding crate tests: + +```shell +cargo test -p datafusion-optimizer +``` + +Then, run the `sqllogictest` suite, which provides a strong speed–coverage tradeoff for development: it runs quickly while offering broad regression coverage across most SQL behavior in DataFusion. + +```shell +cargo test --profile=ci --test sqllogictests +``` + +Finally, before submitting a PR, run the tests for the core `datafusion` and +`datafusion-cli` crates: + +```shell +cargo test -p datafusion +cargo test -p datafusion-cli +``` + +Some integration tests require optional external services such as Docker-backed +containers and may skip when unavailable. + +## Testing Overview + DataFusion has several levels of tests in its [Test Pyramid] and tries to follow the Rust standard [Testing Organization] described in [The Book].