Skip to content

Add serving endpoint metrics export with Prometheus parser#415

Open
jralfonsog wants to merge 2 commits intodatabricks-solutions:mainfrom
jralfonsog:feat/serving-monitoring
Open

Add serving endpoint metrics export with Prometheus parser#415
jralfonsog wants to merge 2 commits intodatabricks-solutions:mainfrom
jralfonsog:feat/serving-monitoring

Conversation

@jralfonsog
Copy link
Copy Markdown

Summary

  • Metrics export: export_serving_endpoint_metrics() wraps the SDK's export_metrics API, returning both parsed structured dicts and raw Prometheus text
  • Prometheus parser: _parse_prometheus_metrics() handles the full OpenMetrics exposition format — HELP/TYPE metadata, labels, histogram buckets, and optional timestamps
  • Structured output: Each metric becomes {"name", "labels", "value", "help", "type"} so LLMs can reason about endpoint health without parsing raw text

Available metrics

Metric Type Description
cpu_usage_percentage gauge CPU utilization across replicas
mem_usage_percentage gauge Memory utilization across replicas
request_count_total gauge Requests processed per minute
request_4xx_count_total gauge Client error rate
request_5xx_count_total gauge Server error rate
request_latency_ms histogram Round-trip latency (P50/P99 via buckets)
gpu_usage_percentage gauge GPU utilization (GPU workloads only)
gpu_memory_usage_percentage gauge GPU memory (GPU workloads only)

Changes

Layer File What changed
Core serving/endpoints.py +export_serving_endpoint_metrics(), +_parse_prometheus_metrics(), added re import and ResourceDoesNotExist/NotFound for typed error handling
Core serving/__init__.py Export new function
MCP tools/serving.py +1 @mcp.tool(timeout=30) wrapper
Tests tests/unit/test_serving_metrics.py 9 unit tests: parser (gauge, counter, histogram, empty, no-labels, timestamps) + export (success, not-found, empty)

Test plan

  • 9 unit tests pass (parser + export function)
  • Ruff lint + format pass (line-length=120, py311)
  • Integration tested against aws-fe workspace — returned 3 metrics for a scaled-to-zero sklearn endpoint
  • Prometheus timestamp handling verified (real API returns timestamps, parser strips them correctly)
  • Not-found returns clean error via typed SDK exceptions

This pull request was AI-assisted by Isaac

- Docstrings: opening """ on its own line
- Returns sections: bullet list format for dict keys

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant