[WIP] feat: planner reconciler by cbartz · Pull Request #732 · canonical/github-runner-operator

cbartz · 2026-02-20T12:44:25Z

Applicable spec: ISD-4104

Overview

This PR adds the reconciliation mode based on the pressure coming from the planner.

It will be activated upon integration with the planner.

After successful validation in production, there will be a follow-up PR that removes the legacy reconciliation mode.

The PR is based on the work in #705 .

Rationale

With the new design, we will combine the reactive/non-reactive mode into one reconcilation based on a pressure information from th planner.

Juju Events Changes

n/a

Module Changes

Library Changes

n/a

Checklist

The charm style guide was applied.
The contributing guide was applied.
The changes are compliant with ISD054 - Managing Charm Complexity
The documentation for charmhub is updated.
The PR is tagged with appropriate label (urgent, trivial, complex).
The changelog is updated with changes that affects the users of the charm.
The application version number is updated in github-runner-manager/pyproject.toml.

…SD4104

# Conflicts: # docs/changelog.md

- Make planner settings optional in manager config (planner_url, planner_token) so service can start without planner relation. - Wire planner endpoint/token from charm relation into runtime state: - add PlannerConfig in CharmState - resolve planner token from Juju secret - pass planner config through factories.py into ApplicationConfiguration. - Update CLI startup to support dual modes with a shared lock: - planner mode: run PressureReconciler - legacy mode: fall back to start_reconcile_service when planner config is absent - add explicit legacy-support comment in CLI. - Refactor PressureReconciler behavior: - inline create stream loop (remove _streaming_loop) - add reconcile_interval (minutes) and fallback_runners - cache last pressure in create loop - run delete loop on timer using cached pressure (no streaming in delete path) - on planner stream errors, reconcile to fallback_runners - fix delete comparison to use desired_total vs current_total - accept external shared lock. - Update tests: - config unit test for optional planner fields - reconciler unit tests for fallback, cached delete behavior, and delete comparison fix - planner state extraction/clearing tests moved to test_charm_state.py - factory test for planner config wiring - fixture updates for new planner_config field.

- Replaced stream_sequence/stream_interval_seconds with initial_pressure: float = 1.0 - Added _pressure_file_path(port), _read_pressure(path, default), and _pressure_stream_gen(path, default) as module-level helpers (this also brought complexity below the C901 limit) - _make_app now writes initial_pressure to /tmp/planner_stub_{port}_pressure.json at startup; all endpoints read/write that file dynamically - Added POST /control/pressure route - Added PlannerStub.set_pressure(value) method github-runner-manager/tests/integration/conftest.py - Added from openstack.compute.v2.server import Server as OpenstackServer - Added wait_for_runner(...) and wait_for_no_runners(...) as module-level functions (moved from test_debug_ssh.py / new) github-runner-manager/tests/integration/test_debug_ssh.py - Removed local wait_for_runner definition - Added from .conftest import wait_for_runner github-runner-manager/tests/integration/test_planner_runner.py (new) - planner_app fixture: wires PlannerStub(initial_pressure=1.0) + RunningApplication - test_planner_pressure_spawns_and_cleans_runner: waits for runner at pressure=1, sets pressure=0, waits for cleanup tests/integration/test_charm_runner.py - Deleted test_planner_pressure_spawns_and_cleans_single_runner and its exclusive imports (requests, ActiveStatus, Model) tests/integration/conftest.py - Removed mock_planner_http_app and mock_planner_http_unit_ip fixtures (verified no remaining references)

github-actions · 2026-02-20T13:39:13Z

TICS Quality Gate

✔️ Passed

github-runner-operator

See the results in the TICS Viewer

The following files have been checked for this project

github-runner-manager/src/github_runner_manager/cli.py
github-runner-manager/src/github_runner_manager/configuration/base.py
github-runner-manager/src/github_runner_manager/manager/pressure_reconciler.py
github-runner-manager/src/github_runner_manager/manager/runner_manager.py
github-runner-manager/src/github_runner_manager/openstack_cloud/openstack_runner_manager.py
github-runner-manager/src/github_runner_manager/planner_client.py
github-runner-manager/tests/integration/application.py
github-runner-manager/tests/integration/conftest.py
github-runner-manager/tests/integration/factories.py
github-runner-manager/tests/integration/planner_stub.py
github-runner-manager/tests/integration/test_debug_ssh.py
github-runner-manager/tests/integration/test_planner_runner.py
github-runner-manager/tests/unit/test_config.py
github-runner-manager/tests/unit/test_planner_client.py
github-runner-manager/tests/unit/test_pressure_reconciler.py
github-runner-manager/tests/unit/test_runner_scaler.py
lib/charms/data_platform_libs/v0/data_interfaces.py
lib/charms/grafana_agent/v0/cos_agent.py
lib/charms/operator_libs_linux/v1/systemd.py
src/charm_state.py
src/factories.py
tests/integration/helpers/common.py
tests/unit/conftest.py
tests/unit/test_charm_state.py
tests/unit/test_factories.py

.github/workflows/tics.yaml / TICS / TICS GitHub Action

Copilot

Pull request overview

This pull request introduces a planner-driven pressure reconciliation mode for the GitHub runner manager, enabling dynamic scaling of runners based on pressure signals from a planner service. The implementation follows a dual-loop architecture (create and delete) designed to minimize API calls while maintaining low-latency scale-up.

Changes:

Adds PressureReconciler and PlannerClient to enable planner-driven runner reconciliation
Introduces planner_config extraction from Juju relations with endpoint and token handling via secrets
Implements a dual-loop architecture: streaming create loop for low-latency scale-up, timer-based delete loop for cleanup
Extends RunnerManager with cleanup_runners() method that returns cleaned VM IDs
Updates CLI to conditionally start planner mode or legacy mode based on configuration
Adds comprehensive unit tests for planner client and pressure reconciler
Includes integration test infrastructure with a lightweight planner stub server
Documents design decisions in ADR-001

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
github-runner-manager/src/github_runner_manager/planner_client.py	New HTTP client for planner service with flavor/pressure APIs
github-runner-manager/src/github_runner_manager/manager/pressure_reconciler.py	Core pressure reconciler with dual-loop architecture
github-runner-manager/src/github_runner_manager/manager/runner_manager.py	Added cleanup_runners method returning cleaned VM IDs
github-runner-manager/src/github_runner_manager/configuration/base.py	Added optional planner_url and planner_token fields
github-runner-manager/src/github_runner_manager/cli.py	Conditional startup logic for planner vs legacy mode
github-runner-manager/src/github_runner_manager/openstack_cloud/openstack_runner_manager.py	Fixed null reference for runner_proxy_config in cloud-init
src/charm_state.py	Added PlannerConfig extraction from relations with secret handling
src/factories.py	Pass planner_url and planner_token to ApplicationConfiguration
github-runner-manager/tests/unit/test_pressure_reconciler.py	Unit tests for pressure reconciler behavior
github-runner-manager/tests/unit/test_planner_client.py	Unit tests for planner client API interactions
github-runner-manager/tests/integration/planner_stub.py	Lightweight Flask-based planner stub for integration testing
github-runner-manager/tests/integration/test_planner_runner.py	Integration test for full planner-driven lifecycle
tests/integration/helpers/common.py	Refactored wait_for helper to support both sync and async functions
docs/adr/001_pressure_reconciler_design.md	Architecture decision record documenting design rationale
docs/changelog.md	Added entry for planner reconciler feature

Copilot · 2026-02-23T09:45:04Z

github-runner-manager/tests/unit/test_pressure_reconciler.py

+class _FakeManager:
+    """Lightweight runner manager stub for testing the reconciler."""
+
+    def __init__(self, runners_count: int = 0):
+        """Initialize with an optional number of pre-existing runners."""
+        self._runners = [object() for _ in range(runners_count)]
+        self.created_args: list[int] = []
+        self.cleanup_called = 0
+
+    def get_runners(self) -> list[object]:
+        """Return the current list of runners."""
+        return list(self._runners)
+
+    def create_runners(self, num: int, metadata: object):  # noqa: ARG002
+        """Record the creation request and extend the internal runner list."""
+        self.created_args.append(num)
+        if num > 0:
+            self._runners.extend(object() for _ in range(num))
+
+    def cleanup_runners(self):
+        """Increment the cleanup counter."""
+        self.cleanup_called += 1


The _FakeManager test stub is missing the delete_runners method that is called by PressureReconciler._handle_timer_reconcile when scaling down. This will cause test_handle_timer_reconcile_uses_desired_total_not_raw_pressure to fail with an AttributeError when the timer reconcile path attempts to delete runners. Add a delete_runners method to the fake manager that tracks deletion calls.

Copilot · 2026-02-23T09:45:04Z

github-runner-manager/tests/integration/planner_stub.py

+def _pressure_file_path(port: int) -> Path:
+    """Return the path to the pressure state file for the given port.
+
+    Port-namespaced to allow parallel test execution without conflicts.
+    """
+    return Path(f"/tmp/planner_stub_{port}_pressure.json")


The pressure file path uses /tmp which can be subject to race conditions and security issues in multi-user environments. Consider using tempfile.gettempdir() with a secure subdirectory or tempfile.NamedTemporaryFile to ensure the file is created with proper permissions and in a secure location. Additionally, the port-based naming doesn't guarantee uniqueness if multiple test runs with the same port happen concurrently.

Copilot · 2026-02-23T09:45:04Z

github-runner-manager/src/github_runner_manager/manager/pressure_reconciler.py

+        planner_client=PlannerClient(
+            PlannerConfiguration(base_url=config.planner_url, token=config.planner_token)


The build_pressure_reconciler function is called only when config.planner_url and config.planner_token are truthy (line 129 in cli.py), so the Optional fields will not be None at runtime. However, the type system doesn't know this. Consider adding an explicit assertion or type guard at the beginning of build_pressure_reconciler to document this precondition and satisfy type checkers: assert config.planner_url is not None and config.planner_token is not None.

Copilot · 2026-02-23T09:45:05Z

github-runner-manager/src/github_runner_manager/planner_client.py

+                is_disabled=data.get("is_disabled"),
+                minimum_pressure=data.get("minimum_pressure"),
+            )
+        except (requests.RequestException, ValueError, requests.HTTPError) as exc:


The exception handling catches requests.HTTPError separately from requests.RequestException, but HTTPError is a subclass of RequestException. This makes the explicit requests.HTTPError clause redundant since it will never be reached. Remove requests.HTTPError from the except clause on line 112 (and similarly on line 138 if present).

Suggested change

except (requests.RequestException, ValueError, requests.HTTPError) as exc:

except (requests.RequestException, ValueError) as exc:

Copilot · 2026-02-23T09:45:05Z

github-runner-manager/src/github_runner_manager/configuration/base.py

        openstack_configuration: Configuration for authorization to a OpenStack host.
+        planner_url: Base URL of the planner service.
+        planner_token: Bearer token to authenticate against the planner service.
        reconcile_interval: Seconds to wait between reconciliation.


The documentation for reconcile_interval is inconsistent. In ApplicationConfiguration (base.py:55), it's documented as "Seconds to wait between reconciliation", but the implementation in PressureReconcilerConfig (pressure_reconciler.py:47, 143) treats it as minutes (multiplies by 60 to get seconds). The integration test comment (test_planner_runner.py:56-57) also confirms the field is intended to be in minutes. Update the documentation in base.py line 55 to say "Minutes to wait between reconciliation" for consistency.

Suggested change

reconcile_interval: Seconds to wait between reconciliation.

reconcile_interval: Minutes to wait between reconciliation.

cbartz · 2026-02-23T15:07:05Z

Closing, as I will split this PR into two

yanksyoon and others added 30 commits January 22, 2026 01:41

feat: initial planner client & pressure reconciler

a35ac14

fix: remove duplicate get pressure

0c6aa56

test: add docstrings

2b28a13

test: unit tests for pressure reconciler

ac48174

test: integration tests for pressure reconciler

8074715

docs: update changelog

29d86c4

chore: update application version

5156870

Merge remote-tracking branch 'origin' into feat/pressure-reconciler-I…

3a3277b

…SD4104

fix: remove invalid import

e397b16

Merge branch 'main' into feat/pressure-reconciler-ISD4104

9b8712f

test: fix docstring issues & refactor lint

c0e3413

Merge remote-tracking branch 'origin' into feat/pressure-reconciler-I…

0c04da4

…SD4104

Merge branch 'main' into feat/pressure-reconciler-ISD4104

d87d9eb

Merge branch 'main' into feat/planner-reconciler-ISD4104

a3c7b44

# Conflicts: # docs/changelog.md

update test_charm_state

da6b357

add integration test

f727741

REVERT ME: only run test_charm_runner on push

018a344

fix cli

70e5ebe

grant by secret name

41a769b

REVERT me: Outcomment test

83451a4

fix streaming handler

47f3d2a

REVERT ME: add tmate

2bb278f

fix PlannerHandler

a95eaf9

fix helper

304f70b

improvements

a9298fb

revert integration_test.yaml

b946175

REVERT ME: test test_planner_runner on push

6d65f14

fix bug in pressure_reconcile -> scale down if necessary

60bd838

cbartz added 12 commits February 20, 2026 08:48

use longer reconcile in test

5408db3

fix test path

b69ad26

ignore push to test-

ef6b000

fix test

fdfcac3

fix passing aproxy always even without runner_proxy_config

427708d

create build_pressure_reconciler

c32b919

inline session setup

06b5183

lint

cb7b877

remove session.trust_env = False

b4a5c2f

adapt test doc string

399b8c8

revert CI push

eaeb41b

add ADR

93f06bf

cbartz mentioned this pull request Feb 20, 2026

[WIP] feat: pressure reconciler #705

Draft

7 tasks

github-actions bot added the Libraries: Out of sync label Feb 20, 2026

javierdelapuente requested a review from Copilot February 23, 2026 09:39

Copilot started reviewing on behalf of javierdelapuente February 23, 2026 09:40 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

cbartz closed this Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] feat: planner reconciler#732

[WIP] feat: planner reconciler#732
cbartz wants to merge 42 commits intomainfrom
feat/planner-reconciler-ISD4104

cbartz commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

cbartz commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		planner_client=PlannerClient(
		PlannerConfiguration(base_url=config.planner_url, token=config.planner_token)

	except (requests.RequestException, ValueError, requests.HTTPError) as exc:
	except (requests.RequestException, ValueError) as exc:

	reconcile_interval: Seconds to wait between reconciliation.
	reconcile_interval: Minutes to wait between reconciliation.

Conversation

cbartz commented Feb 20, 2026

Overview

Rationale

Juju Events Changes

Module Changes

Library Changes

Checklist

Uh oh!

github-actions bot commented Feb 20, 2026

TICS Quality Gate

✔️ Passed

github-runner-operator

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

cbartz commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants