Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ You can customize the Prometheus metrics endpoint using the following environmen
| `LIGHTDASH_GC_DURATION_BUCKETS` | Buckets for duration histogram in seconds | | `0.001, 0.01, 0.1, 1, 2, 5` |
| `LIGHTDASH_EVENT_LOOP_MONITORING_PRECISION` | Precision for event loop monitoring in milliseconds. Must be greater than zero. | | `10` |
| `LIGHTDASH_PROMETHEUS_LABELS` | Labels to add to all metrics. Must be valid JSON | | |
| `LIGHTDASH_CUSTOM_METRICS_CONFIG_PATH` | Path to a JSON config file for custom event-driven counter metrics | | |

## Available metrics

Expand Down Expand Up @@ -85,22 +86,82 @@ These metrics provide information about the Node.js runtime:

These metrics provide information about the PostgreSQL connection pool:

| Metric | Type | Description |
| :----- | :--- | :---------- |
| `pg_pool_max_size` | gauge | Max size of the PG pool |
| `pg_pool_size` | gauge | Current size of the PG pool |
| `pg_active_connections` | gauge | Number of active connections in the PG pool |
| `pg_idle_connections` | gauge | Number of idle connections in the PG pool |
| `pg_queued_queries` | gauge | Number of queries waiting in the PG pool queue |
| `pg_connection_acquire_time` | histogram | Time to acquire a connection from the PG pool in milliseconds |
| `pg_query_duration` | histogram | Histogram of PG query execution time in milliseconds |
| Metric | Type | Description | Labels |
| :----- | :--- | :---------- | :----- |
| `pg_pool_max_size` | gauge | Max size of the PG pool | |
| `pg_pool_size` | gauge | Current size of the PG pool | |
| `pg_active_connections` | gauge | Number of active connections in the PG pool | |
| `pg_idle_connections` | gauge | Number of idle connections in the PG pool | |
| `pg_queued_queries` | gauge | Number of queries waiting in the PG pool queue | |
| `pg_connection_acquire_time` | histogram | Time to acquire a connection from the PG pool in milliseconds | |
| `pg_query_duration` | histogram | Histogram of PG query execution time in milliseconds | |

### Queue metrics

| Metric | Type | Description |
| :----- | :--- | :---------- |
| `queue_size` | gauge | Number of jobs in the queue |

### Query metrics

These metrics track query execution performance. The `context` label is either `scheduled` or `interactive` based on the execution context.

| Metric | Type | Description | Labels |
| :----- | :--- | :---------- | :----- |
| `lightdash_query_status_total` | counter | Total number of queries by terminal status | `status`, `context` |
| `lightdash_query_state_transitions_total` | counter | Query state transitions | `from`, `to`, `context` |
| `lightdash_query_queue_wait_duration_seconds` | histogram | Time spent waiting in queue before execution | `context` |
| `lightdash_query_total_duration_seconds` | histogram | Total query duration from creation to results ready | `context` |
| `lightdash_query_warehouse_duration_seconds` | histogram | Warehouse query execution duration | `warehouse_type`, `context` |
| `lightdash_query_overhead_duration_seconds` | histogram | Lightdash overhead: total duration minus warehouse execution time | `context` |
| `lightdash_query_cache_hit_total` | counter | Total number of query cache hits and misses | `result`, `context`, `has_pre_aggregate_match` |

### Pre-aggregate metrics

These metrics track the pre-aggregate system, including materialization, DuckDB resolution, and file management:

| Metric | Type | Description | Labels |
| :----- | :--- | :---------- | :----- |
| `lightdash_pre_aggregate_match_total` | counter | Total number of pre-aggregate match attempts | `result`, `miss_reason`, `format` |
| `lightdash_pre_aggregate_materialization_total` | counter | Total number of pre-aggregate materializations by outcome | `status`, `trigger` |
| `lightdash_pre_aggregate_active_materializations` | gauge | Current number of active pre-aggregate materializations | |
| `lightdash_pre_aggregate_materialization_duration_seconds` | histogram | Pre-aggregate materialization duration | `status`, `trigger` |
| `lightdash_pre_aggregate_materialization_poll_duration_seconds` | histogram | Time spent polling for materialization query completion in seconds | `status`, `trigger` |
| `lightdash_pre_aggregate_materialization_warehouse_duration_seconds` | histogram | Warehouse execution time during materialization in seconds | `status`, `trigger` |
| `lightdash_pre_aggregate_materialization_promote_duration_seconds` | histogram | Time to check file size and promote materialization to active in seconds | `status`, `trigger` |
| `lightdash_pre_aggregate_materialization_file_size_bytes` | histogram | File size of pre-aggregate materialization in bytes | `format` |
| `lightdash_pre_aggregate_parquet_conversion_duration_seconds` | histogram | Duration of JSONL to Parquet conversion | `status` |
| `lightdash_pre_aggregate_duckdb_resolution_total` | counter | Total number of DuckDB pre-aggregate resolution attempts | `status`, `reason` |
| `lightdash_pre_aggregate_duckdb_resolution_duration_seconds` | histogram | DuckDB pre-aggregate resolution duration | `status` |
| `lightdash_pre_aggregate_duckdb_query_latency_seconds` | histogram | Total DuckDB query latency in seconds | |
| `lightdash_pre_aggregate_duckdb_parquet_read_duration_seconds` | histogram | Time spent in READ_PARQUET operators in seconds | |
| `lightdash_pre_aggregate_duckdb_bytes_read` | histogram | Bytes read from S3/parquet by DuckDB queries | |
| `lightdash_pre_aggregate_duckdb_scan_amplification` | histogram | Ratio of rows scanned to rows returned in DuckDB queries | |
| `lightdash_pre_aggregate_fallback_total` | counter | Total number of opportunistic pre-aggregate fallbacks to warehouse | `reason` |

### AI agent metrics

These metrics track the performance of the AI agent:

| Metric | Type | Description | Labels |
| :----- | :--- | :---------- | :----- |
| `ai_agent_generate_response_duration_ms` | histogram | AI agent generate response time in milliseconds | |
| `ai_agent_stream_response_duration_ms` | histogram | AI agent stream response time in milliseconds | |
| `ai_agent_stream_first_chunk_ms` | histogram | AI agent time to first chunk (any type) | |
| `ai_agent_ttft_ms` | histogram | AI agent time to first token (TTFT) | `model`, `mode` |

### S3 metrics

| Metric | Type | Description | Labels |
| :----- | :--- | :---------- | :----- |
| `lightdash_s3_results_upload_duration_seconds` | histogram | S3 results upload duration | `source` |

### Custom event metrics

Lightdash supports operator-configurable Prometheus counter metrics that are driven by application events. These are defined via a JSON configuration file specified by the `LIGHTDASH_CUSTOM_METRICS_CONFIG_PATH` environment variable.

Each entry in the config file creates a counter metric that increments when a matching application event fires. This allows you to track custom business-level metrics such as user logins or query executions without modifying the application code.

## Using metrics for monitoring and alerting

You can use these metrics to create dashboards and alerts in your monitoring system. Some common use cases include:
Expand Down
Loading