Skip to content

Improve Docker-in-Docker (DinD) support#248

Merged
jarlah merged 14 commits intomainfrom
feat/docker-in-docker-support
Mar 17, 2026
Merged

Improve Docker-in-Docker (DinD) support#248
jarlah merged 14 commits intomainfrom
feat/docker-in-docker-support

Conversation

@jarlah
Copy link
Member

@jarlah jarlah commented Mar 16, 2026

Summary

Adds comprehensive Docker-in-Docker / Docker-outside-of-Docker support so that tests can run inside a container with a shared Docker socket:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/app -w /app elixir:1.17 bash -c "mix deps.get && MIX_ENV=test mix test"

Core changes

  • Container networking mode: At startup, detects if the bridge gateway is unreachable (hairpin NAT) and switches to connecting to containers via their internal IPs on the bridge network. New APIs get_host(container) and get_port(container, port) return the correct host/port based on the connection mode.
  • Ryuk internal IP fallback: When connecting to Ryuk via gateway:mapped_port fails, falls back to container_ip:8080 (internal port) since both containers are on the same bridge network.
  • Connect timeout: gen_tcp.connect for Ryuk now has a 5-second timeout instead of infinite, preventing indefinite hangs.

Host detection improvements

  • TESTCONTAINERS_HOST_OVERRIDE env var / tc.host.override property to explicitly set the Docker host, bypassing auto-detection (matches testcontainers-java)
  • TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE env var to override the Docker socket path mounted into Ryuk
  • /proc/net/route fallback: Parses the kernel route table for the default gateway when bridge network inspection fails
  • Better container detection: Checks /proc/1/cgroup for docker|kubepods|lxc|containerd patterns when /.dockerenv doesn't exist

Container module updates

All built-in container modules updated to use get_host(container) / get_port(container, port):
postgres, mysql, redis, cassandra, ceph, minio, rabbitmq, emqx, kafka, toxiproxy, selenium

PortWaitStrategy and HttpWaitStrategy also updated to use the DinD-aware APIs.

Custom network handling

Containers on custom Docker networks fall back to docker_hostname:mapped_port since the test container can't reach them via internal IP (different network).

Test updates

  • Tests that hardcoded 127.0.0.1 or localhost updated to use get_host(container) / get_port(container, port)
  • Tests incompatible with DooD tagged with @tag :dood_limitation and auto-excluded when running inside a container
  • 11 new unit tests for route parsing and container detection logic

DooD limitations (tagged tests)

These tests are auto-excluded in DooD environments:

  • Custom network tests (toxiproxy integration, network hostname) — test container not on the custom network
  • Kafka integration — advertised.listeners requires the container IP at startup, not dynamically configurable in KRaft mode
  • Host network mode — binds to Docker host, not the test container
  • Selenium / EMQX custom config — slow startup timeouts in nested Docker

Test plan

  • 11 new unit tests for route parsing and container detection
  • DooD verification: all tests pass (with expected exclusions) inside docker run
  • Standard mode: no regressions when running directly on host

🤖 Generated with Claude Code

jarlah and others added 13 commits March 16, 2026 19:22
- Add TESTCONTAINERS_HOST_OVERRIDE env var and tc.host.override property
  to allow explicit host override, bypassing auto-detection
- Add TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE env var to override the
  Docker socket path mounted into Ryuk
- Improve gateway fallback by parsing /proc/net/route when bridge
  gateway inspection fails, instead of falling back to localhost
- Improve container detection by checking /proc/1/cgroup as fallback
  when /.dockerenv doesn't exist (handles more container runtimes)
- Add unit tests for route parsing and container detection logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gen_tcp.connect had no timeout (default :infinity), causing an
indefinite hang when bridge gateway IP is unreachable due to hairpin
NAT issues in DooD environments. Add 5-second connect timeout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When connecting to Ryuk via docker_hostname:mapped_port fails (common
in DooD environments due to hairpin NAT), fall back to connecting via
the container's internal IP on its internal port (8080). Both the test
runner container and Ryuk are on the same bridge network by default,
so direct IP access works reliably.

Also extracts try_tcp_connect/2 helper to reduce duplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When running inside a container with a shared Docker socket (DooD),
the bridge gateway's mapped ports may be unreachable due to hairpin
NAT. This change detects that scenario at startup by probing the
gateway, and switches to "container networking mode" where:

- get_host(container) returns container.ip_address instead of gateway
- get_port(container, port) returns the internal port directly

All built-in container modules (postgres, mysql, redis, kafka, etc.)
now use these DooD-aware APIs, so tests work automatically in both
standard and DooD environments without any configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- PortWaitStrategy now uses get_host(container) and get_port(container)
  at wait time, overriding the IP set at construction. This fixes
  Selenium and EMQX containers in DooD.
- Update tests that hardcoded 127.0.0.1 or localhost to use the
  DooD-aware APIs instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused Container alias from kafka_container_test
- Skip host network test in DooD (inherently incompatible since host
  networking binds to Docker host, not the test container)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
In container_ip mode, containers on custom Docker networks are not
reachable from the test container via internal IP (different network).
Detect this via the container.network field and fall back to the
standard docker_hostname:mapped_port approach for those containers.

Also fix unused alias warnings in port_wait_strategy and
kafka_container_test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kafka: In DooD, clients connect to the container's internal IP so
Kafka must advertise on the internal port. Use a BROKER listener
name in container mode to avoid advertising the unreachable
bridge gateway address.

Toxiproxy test: Use get_host(container) instead of get_host() so
the API URL uses the correct host in DooD.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that require custom Docker networks or have slow container
startup in nested Docker are tagged with @tag :dood_limitation and
automatically excluded when running inside a container.

Affected tests:
- Toxiproxy integration (custom network)
- Network hostname communication (custom network)
- EMQX custom config (slow startup timeout)
- Selenium (slow startup timeout)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kafka needs to advertise an address reachable by clients. In DooD,
the container IP is only known after startup. Use after_start to
run kafka-configs.sh and update the advertised listener to
BROKER://container_ip:internal_port, so KafkaEx clients can resolve
the broker address correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kafka's advertised.listeners cannot be dynamically updated to use
the container's internal IP after startup. This is a known limitation
that testcontainers-java solves with custom startup script injection.
Tag these tests for exclusion in DooD environments.

Also reverts the kafka-configs.sh after_start approach which doesn't
work reliably (not a dynamic config in KRaft mode).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Since Kafka tests are tagged dood_limitation, the DooD-specific
listener config is dead code. Revert to original with_kraft_config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace runtime container check with tag-based exclusion, consistent
with all other DooD-incompatible tests. Restore original 127.0.0.1
since this test only runs outside containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jarlah
Copy link
Member Author

jarlah commented Mar 16, 2026

Results from running docker in docker:

docker run -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/app -w /app elixir:1.17 bash -c "mix deps.get && MIX_ENV=test mix test"

Results:

==> testcontainers
Compiling 124 files (.ex)
Generated testcontainers app

19:41:38.433 [error] backend port not found: :inotifywait

Running ExUnit with seed: 891013, max_cases: 32
Excluding tags: [:dood_limitation]

.............................................................................********************************************************************************
Ryuk has been disabled. This can cause unexpected behavior in your environment.
********************************************************************************

................................................
Finished in 37.4 seconds (28.0s async, 9.4s sync)
132 tests, 0 failures, 7 excluded

@jarlah
Copy link
Member Author

jarlah commented Mar 16, 2026

@gossi take a look if you have time?

@jarlah jarlah added the enhancement New feature or request label Mar 16, 2026
@jarlah jarlah linked an issue Mar 16, 2026 that may be closed by this pull request
…-support

# Conflicts:
#	lib/testcontainers.ex
@jarlah
Copy link
Member Author

jarlah commented Mar 17, 2026

also merging this and releasing a 2.1.0-rc1 soon to test both this DinD and docker compose support

@jarlah jarlah merged commit eb2803e into main Mar 17, 2026
8 checks passed
@jarlah jarlah deleted the feat/docker-in-docker-support branch March 17, 2026 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Does not work with Docker in Docker

1 participant