Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ Python client for `YDB <https://ydb.tech/>`_ — a fault-tolerant distributed SQ
coordination
scheme

.. toctree::
:hidden:
:caption: Observability

opentelemetry

.. toctree::
:hidden:
:caption: Reference
Expand Down Expand Up @@ -82,7 +88,7 @@ Distributed Coordination
------------------------

The :doc:`coordination` page covers distributed semaphores and leader election. If you
need to limit concurrent access to a shared resource across multiple processes or hosts,
need to limit concurrent access to aЗе shared resource across multiple processes or hosts,
this is the service to use.

Schema Management
Expand All @@ -103,6 +109,15 @@ use the ``@ydb_retry`` decorator. Skipping this section is a common source of pr
incidents.


Observability
-------------

The :doc:`opentelemetry` page explains how to add distributed tracing to your
application using OpenTelemetry. One call to ``enable_tracing()`` instruments
query sessions, transactions, and connection pool operations — so you can
visualize request flow in Jaeger, Grafana, or any OpenTelemetry-compatible backend.


API Reference
-------------

Expand Down
225 changes: 225 additions & 0 deletions docs/opentelemetry.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
ПрOpenTelemetry Tracing
=====================

The SDK provides built-in distributed tracing via `OpenTelemetry <https://opentelemetry.io/>`_.
When enabled, key YDB operations — such as session creation, query execution, transaction
commit/rollback, and driver initialization — produce OpenTelemetry spans. Trace
context is automatically propagated to the YDB server through gRPC metadata using the
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_ standard.

Tracing is **zero-cost when disabled**: the SDK uses no-op stubs by default, so there is
no overhead unless you explicitly opt in.


Installation
------------

OpenTelemetry packages are not included by default. Install the SDK with the
``opentelemetry`` extra:

.. code-block:: sh

pip install ydb[opentelemetry]

This pulls in ``opentelemetry-api``. You will also need ``opentelemetry-sdk`` and an
exporter for your tracing backend, for example:

.. code-block:: sh

# OTLP/gRPC exporter (works with Jaeger, Tempo, and others)
pip install opentelemetry-exporter-otlp-proto-grpc


Enabling Tracing
----------------

Call ``enable_tracing()`` once, **after** configuring your OpenTelemetry tracer provider
and **before** creating a ``Driver``:

.. code-block:: python

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

import ydb
from ydb.opentelemetry import enable_tracing

# 1. Set up OpenTelemetry
resource = Resource(attributes={"service.name": "my-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317"))
)
trace.set_tracer_provider(provider)

# 2. Enable YDB tracing
enable_tracing()

# 3. Use the SDK as usual — spans are created automatically
with ydb.Driver(endpoint="grpc://localhost:2136", database="/local") as driver:
driver.wait(timeout=5)
with ydb.QuerySessionPool(driver) as pool:
pool.execute_with_retries("SELECT 1")

provider.shutdown()

``enable_tracing()`` accepts an optional ``tracer`` argument. If omitted, the SDK
obtains a tracer named ``"ydb.sdk"`` from the global tracer provider.


What Is Instrumented
--------------------

The following operations produce spans:

.. list-table::
:header-rows: 1
:widths: 35 20 45

* - Span Name
- Kind
- Description
* - ``ydb.Driver.Initialize``
- INTERNAL
- Driver wait / endpoint discovery.
* - ``ydb.CreateSession``
- CLIENT
- Creating a new query session.
* - ``ydb.ExecuteQuery``
- CLIENT
- Executing a query (including ``execute_with_retries``).
* - ``ydb.CommitTransaction``
- CLIENT
- Committing an explicit transaction.
* - ``ydb.RollbackTransaction``
- CLIENT
- Rolling back a transaction.

All spans are nested under the currently active span, so wrapping your application
logic in a parent span produces a complete trace tree:

.. code-block:: python

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("handle-request"):
pool.execute_with_retries("SELECT 1")
# ↳ ydb.CreateSession (if a new session is needed)
# ↳ ydb.ExecuteQuery


Span Attributes
---------------

Every YDB span carries these semantic attributes:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Attribute
- Description
* - ``db.system.name``
- Always ``"ydb"``.
* - ``db.namespace``
- Database path (e.g. ``"/local"``).
* - ``server.address``
- Endpoint host.
* - ``server.port``
- Endpoint port.

Additional attributes are set when available:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Attribute
- Description
* - ``ydb.session.id``
- Session identifier.
* - ``ydb.node.id``
- YDB node that handled the request.
* - ``ydb.tx.id``
- Transaction identifier.

On errors, the span also records:

- ``error.type`` — ``"ydb_error"``, ``"transport_error"``, or the Python exception class name.
- ``db.response.status_code`` — the YDB status code name (e.g. ``"SCHEME_ERROR"``).


Trace Context Propagation
-------------------------

When tracing is enabled, the SDK automatically injects trace context headers into
every gRPC call to YDB using the globally configured OpenTelemetry propagator
(``opentelemetry.propagate.inject``). By default, OpenTelemetry uses the
`W3C Trace Context <https://www.w3.org/TR/trace-context/>`_ propagator, which adds
``traceparent`` and ``tracestate`` headers.

YDB server expects W3C Trace Context headers, so the default propagator configuration
works out of the box. This allows the server to correlate client spans with
server-side processing, enabling end-to-end trace visibility across the entire
request path.


Async Usage
-----------

Tracing works identically with the async driver. Call ``enable_tracing()`` once at
startup:

.. code-block:: python

import asyncio
import ydb
from ydb.opentelemetry import enable_tracing

enable_tracing()

async def main():
async with ydb.aio.Driver(
endpoint="grpc://localhost:2136",
database="/local",
) as driver:
await driver.wait(timeout=5)
async with ydb.aio.QuerySessionPool(driver) as pool:
await pool.execute_with_retries("SELECT 1")

asyncio.run(main())



Using a Custom Tracer
---------------------

To use a specific tracer instead of the global one:

.. code-block:: python

from opentelemetry import trace

my_tracer = trace.get_tracer("my.custom.tracer")
enable_tracing(tracer=my_tracer)


Running the Examples
--------------------

The ``examples/opentelemetry/`` directory contains ready-to-run examples with a Docker
Compose setup that starts YDB, an OTLP collector, Tempo, Prometheus, and Grafana:

.. code-block:: sh

cd examples/opentelemetry
docker compose -f compose-e2e.yaml up -d

# Run the example
python example.py

Open `http://localhost:3000 <http://localhost:3000>`_ (Grafana) to explore the
collected traces via the Tempo data source.
61 changes: 61 additions & 0 deletions examples/opentelemetry/compose-e2e.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
version: "3.3"
services:
ydb:
image: ydbplatform/local-ydb:trunk
restart: always
hostname: localhost
platform: linux/amd64
environment:
YDB_DEFAULT_LOG_LEVEL: NOTICE
GRPC_TLS_PORT: "2135"
GRPC_PORT: "2136"
MON_PORT: "8765"
YDB_USE_IN_MEMORY_PDISKS: "true"
command: [ "--config-path", "/ydb_config/ydb-config-with-tracing.yaml" ]
ports:
- "2135:2135"
- "2136:2136"
- "8765:8765"
volumes:
- ./ydb_config:/ydb_config:ro

otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: [ "--config=/etc/otelcol/config.yaml" ]
volumes:
- ./otel-collector-config.yaml:/etc/otelcol/config.yaml:ro
ports:
- "4317:4317"
- "4318:4318"
- "9464:9464"
- "13133:13133"
- "13317:55679"

prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yaml:/etc/prometheus/prometheus.yml:ro
ports:
- "9090:9090"
depends_on: [ otel-collector ]

tempo:
image: grafana/tempo:2.4.1
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo.yaml:/etc/tempo.yaml:ro
ports:
- "3200:3200"
depends_on: [ otel-collector ]

grafana:
image: grafana/grafana:10.4.2
environment:
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin"
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning:ro
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
ports:
- "3000:3000"
depends_on: [ prometheus, tempo ]
65 changes: 65 additions & 0 deletions examples/opentelemetry/example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""Minimal example: OpenTelemetry tracing for YDB Python SDK."""

import asyncio

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

import ydb
from ydb.opentelemetry import enable_tracing

resource = Resource(attributes={"service.name": "ydb-example"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317")))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)
enable_tracing(tracer)

ENDPOINT = "grpc://localhost:2136"
DATABASE = "/local"


def sync_example():
"""Sync: session execute and transaction execute + commit."""
with ydb.Driver(endpoint=ENDPOINT, database=DATABASE) as driver:
driver.wait(timeout=5)

with ydb.QuerySessionPool(driver) as pool:
with tracer.start_as_current_span("sync-example"):
pool.execute_with_retries("SELECT 1")

def tx_callee(session):
with session.transaction() as tx:
list(tx.execute("SELECT 1"))
tx.commit()

pool.retry_operation_sync(tx_callee)


async def async_example():
"""Async: session execute and transaction execute + commit."""
async with ydb.aio.Driver(endpoint=ENDPOINT, database=DATABASE) as driver:
await driver.wait(timeout=5)

async with ydb.aio.QuerySessionPool(driver) as pool:
with tracer.start_as_current_span("async-example"):
await pool.execute_with_retries("SELECT 1")

async def tx_callee(session):
async with session.transaction() as tx:
result = await tx.execute("SELECT 1")
async for _ in result:
pass
await tx.commit()

await pool.retry_operation_async(tx_callee)


sync_example()
asyncio.run(async_example())

provider.shutdown()
5 changes: 5 additions & 0 deletions examples/opentelemetry/grafana/dashboards/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
This folder is intentionally left empty.

Grafana is provisioned with Tempo + Prometheus datasources; use **Explore** to search traces.


Loading
Loading