Skip to content

Conversation

@GaneshPatil7517
Copy link
Contributor

This PR adds a Docker-based workflow for building DataFusion documentation, allowing contributors to build docs without installing additional host dependencies.

The container includes all required tools (Rust 1.92.0, cargo-depgraph, Sphinx, etc.) and provides a simple, reproducible build process.

This addresses #19777.

What's included:

  • Dockerfile with all doc build dependencies
  • docker-compose.yml for easy usage
  • Updated docs/README.md with clear instructions
  • Cross-platform compatibility (handles Windows/Unix line endings)

Usage:

docker-compose run --rm docs bash build.sh
# or
docker build -t datafusion-docs -f docs/Dockerfile .
docker run --rm -v $(pwd):/work datafusion-docs bash build.sh

- Installs Rust 1.92.0, cargo-depgraph, and all doc dependencies
- Provides isolated, reproducible build environment
- Fixes line endings for cross-platform compatibility (Windows/Unix)
- No additional host setup required beyond Docker
Copilot AI review requested due to automatic review settings January 17, 2026 06:44
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Docker-based infrastructure to enable building DataFusion documentation in a containerized environment without requiring manual installation of dependencies (Rust, cargo-depgraph, Sphinx, etc.).

Changes:

  • Adds Dockerfile with all necessary documentation build dependencies (Rust 1.92.0, Python packages, cargo-depgraph, graphviz)
  • Updates docker-compose.yml to include a docs service for easy documentation building
  • Updates docs/README.md with Docker-based workflow instructions as the recommended approach

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
docs/Dockerfile Defines container image with all doc build dependencies, handles line endings for cross-platform compatibility
docker-compose.yml Adds docs service configuration for simplified Docker workflow
docs/README.md Documents Docker-based build option as recommended approach, maintains existing local installation instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GaneshPatil7517 and others added 6 commits January 17, 2026 12:26
Co-authored-by: Copilot <[email protected]>
- Creates separate entrypoint script instead of complex bash in CMD
- Improves readability and makes logic easier to test and modify
- Cleaner Dockerfile with ENTRYPOINT pattern
- Script intelligently checks for CRLF before converting
- Handles both build-time and runtime line ending fixes
docs/Dockerfile Outdated
# Install Rust 1.92.0
ENV RUST_VERSION=1.92.0
ENV RUSTUP_VERSION=1.27.1
RUN set -eux; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're probably better off using a Rust image and installing Python as likely it'd be simpler

docs/Dockerfile Outdated

ENV PATH="/root/.cargo/bin:${PATH}"

# Install cargo-depgraph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these LLM comments they are not useful

# specific language governing permissions and limitations
# under the License.

services:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a docker-compose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - docker-compose is overkill for this use case. I've removed it. Users can simply run:
docker build -t datafusion-docs ./docs
docker run --rm -v $(pwd):/datafusion datafusion-docs

docs/Dockerfile Outdated
Comment on lines 60 to 61
# Copy the entire repository
COPY . /work
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the entire repository into the docker container does not seem correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I've removed the COPY . /work approach. Instead, the repository should be mounted as a volume at runtime. This keeps the image lean and allows users to build docs from their current working tree without rebuilding the image.

# Fix line endings when running (for volume-mounted files from Windows) only if needed
if find /work -name '*.sh' -type f -print0 | xargs -0 grep -Il $'\r' >/dev/null 2>&1; then
echo "Converting CRLF to LF in shell scripts..."
find /work -name '*.sh' -type f -exec sh -c 'tr -d "\r" < "$1" > "$1.tmp" && mv "$1.tmp" "$1"' _ {} \;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the need for this logic; why exactly do we need it in the entrypoint script and in the build stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point - I've removed the CRLF conversion logic. Users should configure git properly (git config core.autocrlf input on Windows) rather than having the container fix line endings at runtime. This simplifies the setup significantly.

- Use rust:1.92.0-bookworm base image instead of python with manual Rust install
- Remove docker-compose.yml (unnecessary for this use case)
- Remove entrypoint.sh and CRLF conversion logic
- Use volume mount at runtime instead of copying entire repository

Usage:
  docker build -t datafusion-docs ./docs
  docker run --rm -v C:\Users\HP\Music\Apache_org\data\datafusion:/datafusion datafusion-docs
@GaneshPatil7517
Copy link
Contributor Author

Thanks for the feedback @Jefffrey! I've simplified the approach significantly:

Changes made:

✅ Switched to rust:1.92.0-bookworm base image (install Python on top instead of vice versa)
✅ Removed docker-compose.yml - overkill for this use case
✅ Removed entrypoint.sh and CRLF conversion logic - users should configure git properly
✅ Removed COPY . /work - use volume mount at runtime instead
✅ Removed unnecessary comments
Usage is now simply:
docker build -t datafusion-docs ./docs
docker run --rm -v $(pwd):/datafusion datafusion-docs

Tested locally and the docs build successfully. Please take another look!> docker build -t datafusion-docs ./docs
docker run --rm -v $(pwd):/datafusion datafusion-docs

Tested locally and the docs build successfully. Please take another look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants