-
Notifications
You must be signed in to change notification settings - Fork 1.9k
docs: add Docker-based workflow for building documentation #19863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: add Docker-based workflow for building documentation #19863
Conversation
- Installs Rust 1.92.0, cargo-depgraph, and all doc dependencies - Provides isolated, reproducible build environment - Fixes line endings for cross-platform compatibility (Windows/Unix) - No additional host setup required beyond Docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Docker-based infrastructure to enable building DataFusion documentation in a containerized environment without requiring manual installation of dependencies (Rust, cargo-depgraph, Sphinx, etc.).
Changes:
- Adds Dockerfile with all necessary documentation build dependencies (Rust 1.92.0, Python packages, cargo-depgraph, graphviz)
- Updates docker-compose.yml to include a
docsservice for easy documentation building - Updates docs/README.md with Docker-based workflow instructions as the recommended approach
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| docs/Dockerfile | Defines container image with all doc build dependencies, handles line endings for cross-platform compatibility |
| docker-compose.yml | Adds docs service configuration for simplified Docker workflow |
| docs/README.md | Documents Docker-based build option as recommended approach, maintains existing local installation instructions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
- Creates separate entrypoint script instead of complex bash in CMD - Improves readability and makes logic easier to test and modify - Cleaner Dockerfile with ENTRYPOINT pattern - Script intelligently checks for CRLF before converting - Handles both build-time and runtime line ending fixes
docs/Dockerfile
Outdated
| # Install Rust 1.92.0 | ||
| ENV RUST_VERSION=1.92.0 | ||
| ENV RUSTUP_VERSION=1.27.1 | ||
| RUN set -eux; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're probably better off using a Rust image and installing Python as likely it'd be simpler
docs/Dockerfile
Outdated
|
|
||
| ENV PATH="/root/.cargo/bin:${PATH}" | ||
|
|
||
| # Install cargo-depgraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these LLM comments they are not useful
docker-compose.yml
Outdated
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| services: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a docker-compose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - docker-compose is overkill for this use case. I've removed it. Users can simply run:
docker build -t datafusion-docs ./docs
docker run --rm -v $(pwd):/datafusion datafusion-docs
docs/Dockerfile
Outdated
| # Copy the entire repository | ||
| COPY . /work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying the entire repository into the docker container does not seem correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I've removed the COPY . /work approach. Instead, the repository should be mounted as a volume at runtime. This keeps the image lean and allows users to build docs from their current working tree without rebuilding the image.
docs/entrypoint.sh
Outdated
| # Fix line endings when running (for volume-mounted files from Windows) only if needed | ||
| if find /work -name '*.sh' -type f -print0 | xargs -0 grep -Il $'\r' >/dev/null 2>&1; then | ||
| echo "Converting CRLF to LF in shell scripts..." | ||
| find /work -name '*.sh' -type f -exec sh -c 'tr -d "\r" < "$1" > "$1.tmp" && mv "$1.tmp" "$1"' _ {} \; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question the need for this logic; why exactly do we need it in the entrypoint script and in the build stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point - I've removed the CRLF conversion logic. Users should configure git properly (git config core.autocrlf input on Windows) rather than having the container fix line endings at runtime. This simplifies the setup significantly.
- Use rust:1.92.0-bookworm base image instead of python with manual Rust install - Remove docker-compose.yml (unnecessary for this use case) - Remove entrypoint.sh and CRLF conversion logic - Use volume mount at runtime instead of copying entire repository Usage: docker build -t datafusion-docs ./docs docker run --rm -v C:\Users\HP\Music\Apache_org\data\datafusion:/datafusion datafusion-docs
|
Thanks for the feedback @Jefffrey! I've simplified the approach significantly: Changes made: ✅ Switched to rust:1.92.0-bookworm base image (install Python on top instead of vice versa) Tested locally and the docs build successfully. Please take another look!> docker build -t datafusion-docs ./docs Tested locally and the docs build successfully. Please take another look! |
This PR adds a Docker-based workflow for building DataFusion documentation, allowing contributors to build docs without installing additional host dependencies.
The container includes all required tools (Rust 1.92.0, cargo-depgraph, Sphinx, etc.) and provides a simple, reproducible build process.
This addresses #19777.
What's included:
Usage: