Skip to content

Add Model Serving CRUD tools (create, update, delete endpoints)#413

Open
jralfonsog wants to merge 5 commits intodatabricks-solutions:mainfrom
jralfonsog:feat/serving-crud
Open

Add Model Serving CRUD tools (create, update, delete endpoints)#413
jralfonsog wants to merge 5 commits intodatabricks-solutions:mainfrom
jralfonsog:feat/serving-crud

Conversation

@jralfonsog
Copy link
Copy Markdown

@jralfonsog jralfonsog commented Apr 2, 2026

Summary

  • Create endpoint: create_serving_endpoint() with sync (wait=True) and async (wait=False) modes for classical ML (~2 min) vs GenAI agents (~15 min)
  • Update endpoint: update_serving_endpoint() to deploy new model versions, change workload size, or modify traffic routing for A/B testing
  • Delete endpoint: delete_serving_endpoint() with idempotent not-found handling (returns status instead of raising)
  • Extended entity config: Full ServedEntityInput support — GPU workload types, environment variables (with secret refs), custom concurrency, provisioned throughput, instance profiles
  • Resource tracking: Integrates with manifest system (track_resource on create, remove_resource on delete) and registers a deleter for cleanup
  • Docstring cleanup: Existing tools trimmed for token efficiency

Supported ServedEntityInput fields

Field Description
entity_name UC model path (e.g. "catalog.schema.model")
entity_version Model version string
workload_size "Small"/"Medium"/"Large" (mutually exclusive with custom concurrency)
scale_to_zero_enabled Scale to zero when idle
workload_type GPU selection: "CPU", "GPU_SMALL", "GPU_MEDIUM", "GPU_LARGE", "MULTIGPU_MEDIUM"
environment_vars Dict of env vars, supports {{secrets/scope/key}} refs
min/max_provisioned_concurrency Custom concurrency (multiples of 4)
min/max_provisioned_throughput Tokens/sec for foundation models
name Custom served entity name
instance_profile_arn AWS IAM instance profile

Changes

Layer File What changed
Core serving/endpoints.py +3 CRUD functions, enhanced _build_served_entity_inputs with all SDK fields, _extract_endpoint_summary surfaces extended fields
Core serving/__init__.py Export new functions
MCP tools/serving.py +3 @mcp.tool wrappers with manifest integration, documented extended entity fields
Tests tests/unit/test_serving.py 13 unit tests: CRUD operations + GPU, env vars, custom concurrency, provisioned throughput

Design decisions

  • Typed SDK exceptions (ResourceDoesNotExist, NotFound) instead of string matching
  • Mutually exclusive scaling modes: workload_size vs custom concurrency auto-detected from input keys
  • Extended fields only included when present: _extract_endpoint_summary omits null extended fields to keep responses clean

Test plan

  • 13 unit tests pass (CRUD + all extended config scenarios)
  • Ruff lint + format pass (line-length=120, py311)
  • Manual verification of existing serving tools unaffected

This pull request was AI-assisted by Isaac

Add GPU workload type, environment variables, custom concurrency,
provisioned throughput, instance profiles, and custom entity names
to the served entity builder. Three scaling modes are now supported
as mutually exclusive options.

Co-authored-by: Isaac
- Docstrings: opening """ on its own line
- MCP module header: add tool listing with _find_endpoint_by_name helper
- Returns sections: bullet list format for dict keys
- Manifest imports: late imports in try blocks
- Identity: create passes get_default_tags() and with_description_footer()
  to tag endpoints as "Built with Databricks AI Dev Kit"
- Idempotent create: rename to create_or_update_serving_endpoint
- Core create_serving_endpoint: new optional tags and description params

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant