Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 13 additions & 17 deletions references/pre-aggregates/getting-started.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ If you're using Lightdash YAML instead of dbt model YAML, see the [Lightdash YAM
| `granularity` | No | Time granularity for the `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `quarter`, `year`. Must be paired with `time_dimension`. |
| `max_rows` | No | Maximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer. |
| `refresh` | No | Schedule configuration for automatic re-materialization. See [Scheduling refreshes](#scheduling-refreshes). |
| `materialization_role` | No | Fixed user attributes to use when materializing the pre-aggregate. See [Materialization role](#materialization-role). |
| `materialization_role` | No | Fixed access context to use when materializing the pre-aggregate. This is useful when your model or joined tables use [`required_attributes`](/references/tables#required-attributes) or [`any_attributes`](/references/tables#any-attributes). See [Materialization role](#materialization-role). |

<Note>
If you specify `time_dimension`, you **must** also specify `granularity`, and vice versa.
Expand Down Expand Up @@ -239,11 +239,16 @@ You can set `max_rows` to cap the size of a materialization. If the aggregation

## Materialization role

If your model uses user attributes in SQL, the user who triggers the materialization can affect what gets stored in the pre-aggregate.
`materialization_role` is useful when access to the model depends on [`required_attributes`](/references/tables#required-attributes) or [`any_attributes`](/references/tables#any-attributes).

For example, if the model has `sql_filter: "customers.region IN (${ld.attr.allowed_regions})"`, the materialized table will only contain the regions that the admin who triggered the materialization happens to have access to. If that admin has `allowed_regions: ["EMEA"]`, the pre-aggregate will only contain EMEA rows.
For example, if a joined table is only available to users with `region_access: emea`, then materializing a pre-aggregate without a fixed access context could produce different results depending on who triggered the build.

Use `materialization_role` to define a fixed set of user attributes for materializing the pre-aggregate. At query time, Lightdash ignores `materialization_role` and uses the real viewer user's attributes instead, so a user with `allowed_regions: ["EMEA"]` still only sees EMEA rows.
Use `materialization_role` to make materialization run with a stable set of [user attributes](/references/workspace/user-attributes).

This is intended for access control fields such as:

- [`required_attributes`](/references/tables#required-attributes)
- [`any_attributes`](/references/tables#any-attributes)

<Tabs>
<Tab title="dbt v1.9 and earlier">
Expand All @@ -254,7 +259,6 @@ Use `materialization_role` to define a fixed set of user attributes for material
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
sql_filter: "customers.region IN (${ld.attr.allowed_regions})"
pre_aggregates:
- name: orders_daily_by_region
dimensions:
Expand All @@ -266,10 +270,7 @@ Use `materialization_role` to define a fixed set of user attributes for material
materialization_role:
email: materialize@acme.com
attributes:
allowed_regions:
- EMEA
- APAC
- NA
region_access: emea
```
</Tab>
<Tab title="dbt v1.10+ and Fusion">
Expand All @@ -281,7 +282,6 @@ Use `materialization_role` to define a fixed set of user attributes for material
joins:
- join: customers
sql_on: ${customers.customer_id} = ${orders.customer_id}
sql_filter: "customers.region IN (${ld.attr.allowed_regions})"
pre_aggregates:
- name: orders_daily_by_region
dimensions:
Expand All @@ -293,16 +293,11 @@ Use `materialization_role` to define a fixed set of user attributes for material
materialization_role:
email: materialize@acme.com
attributes:
allowed_regions:
- EMEA
- APAC
- NA
region_access: emea
```
</Tab>
</Tabs>

With this setup, the materialized table always contains EMEA, APAC, and NA rows regardless of who triggered the materialization. When a viewer queries the pre-aggregate, Lightdash applies their own user attributes to the query, not the `materialization_role`.

## Complete example

Here's a full model definition with a pre-aggregate, including joins, scheduling, and row limits:
Expand Down Expand Up @@ -449,4 +444,5 @@ These queries would **not** match and would query the warehouse directly:
- Queries grouped by a dimension not in the pre-aggregate (for example, `customer_id`)
- Queries with hourly granularity (finer than the pre-aggregate's `day`)
- Queries without `status = completed` or with a broader `status` filter
- Queries with custom dimensions, custom metrics, or table calculations
- Queries with [Parameters](/references/lightdash-config-yml#parameters-configuration), [user attributes](/references/workspace/user-attributes) inside SQL, or [`sql_filter`](/references/tables#sql-filter-row-level-security)
- Queries with raw SQL table calculations
8 changes: 4 additions & 4 deletions references/pre-aggregates/monitoring.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ When a query doesn't match any pre-aggregate, Lightdash records the specific rea
| **Filter dimension not in pre-aggregate** | A filter references a dimension not in the pre-aggregate. | Add the filter dimension to the `dimensions` list — even if it's only used for filtering, not grouping. |
| **Pre-aggregate filter not satisfied** | The pre-aggregate definition includes a static filter, but the query is missing it or uses a broader/incompatible filter. | Add the matching filter to the query, narrow the query filter, or create another pre-aggregate for that query pattern. |
| **Non-additive metric** | The query includes a metric type that can't be re-aggregated (for example, `count_distinct` or `median`). | This metric type is not supported. See [supported metric types](/references/pre-aggregates/overview#supported-metric-types). |
| **Custom SQL metric** | A metric uses a custom SQL expression. | Custom SQL metrics are not supported by pre-aggregates. |
| **Custom SQL metric** | The query includes a non-reaggregatable custom SQL metric, such as a `number` metric. | Use a supported metric type, or let the query run against the warehouse. |
| **Granularity too fine** | The query requests a finer time granularity than the pre-aggregate provides (for example, `hour` on a `day` pre-aggregate). | Either lower the pre-aggregate's granularity or accept the warehouse query for this use case. |
| **Custom dimension present** | The query uses a custom dimension. | Custom dimensions are not supported by pre-aggregates. |
| **Custom metric present** | The query uses a custom metric. | Custom metrics are not supported by pre-aggregates. |
| **Table calculation present** | The query includes table calculations. | Table calculations are not supported by pre-aggregates. |
| **Custom dimension present** | The query uses a custom SQL dimension. | Custom SQL dimensions created in the UI are not supported. [Write them back](/guides/developer/dbt-write-back#write-back-dimensions-automatically-replacing-custom-dimensions) to the semantic layer. |
| **Custom metric present** | The query uses a custom metric. | Custom metrics defined in the Explorer are not supported. [Write them back](/guides/developer/dbt-write-back#write-back-dimensions-automatically-replacing-custom-dimensions) to the semantic layer. |
| **Table calculation present** | The query includes a raw SQL table calculation. | Use a formula table calculation instead, or let the query run against the warehouse. |
| **User bypass** | The user explicitly bypassed the pre-aggregate cache. | No action needed — this is intentional. |

## Dashboard pre-aggregate view
Expand Down
29 changes: 23 additions & 6 deletions references/pre-aggregates/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@ When a user runs a query, Lightdash automatically checks if a pre-aggregate can
- Every dimension used in **filters** is included in the pre-aggregate
- If the pre-aggregate itself defines `filters`, the query must include an equivalent or narrower filter
- All metrics use [supported metric types](#supported-metric-types)
- The query does not contain custom dimensions, custom metrics, or table calculations
- The query does not contain raw SQL table calculations
- The query does not use [Parameters](/references/lightdash-config-yml#parameters-configuration), [`sql_filter`](/references/tables#sql-filter-row-level-security), or [user attributes](/references/workspace/user-attributes) inside SQL
- If the query uses a time dimension, the requested granularity must be **equal to or coarser** than the pre-aggregate's granularity

<Tip>
Expand Down Expand Up @@ -114,16 +115,33 @@ Pre-aggregates support metrics that can be re-aggregated from pre-computed resul
Pre-aggregates are **not compatible** with [personal warehouse connections](/references/workspace/personal-warehouse-connections). Materialization always runs under a single user's credentials, so warehouse-level access rules are not applied per viewer. If you rely on personal warehouse connections to enforce data access, use [results caching](/guides/developer/caching) instead.
</Warning>

### Current limitations
## Current limitations

Not all metrics work this way. Consider `count_distinct` with the same daily pre-aggregate from above. If a daily pre-aggregate stores "2 distinct customers on 2024-01-15" and "1 distinct customer on 2024-01-16", you can't sum those to get the monthly distinct count — Alice ordered on both days and would be counted twice:
Pre-aggregates support a narrower subset of the Lightdash semantic layer than regular warehouse queries.

### Not supported

Pre-aggregates do not support:

- [Parameters](/references/lightdash-config-yml#parameters-configuration)
- [`sql_filter`](/references/tables#sql-filter-row-level-security) and [`sql_where`](/references/tables#sql-filter-row-level-security)
- [User attributes](/references/workspace/user-attributes) when referenced from SQL. [`required_attributes`](/references/tables#required-attributes) and [`any_attributes`](/references/tables#any-attributes) are still supported through `materialization_role`.
- [Custom metrics](/guides/custom-fields#custom-metrics) created in the Explorer
- [Custom SQL dimensions](/guides/custom-fields#custom-sql) created in the Explorer ([Custom bin dimensions](/guides/custom-fields#bin) are supported)
- SQL table calculations ([Formula table calculations](/guides/formula-table-calculations#formula-table-calculations) are supported)

### Metrics that can't be pre-aggregated

Pre-aggregates do not support metric types that cannot be re-aggregated from pre-computed results.

For example, consider `count_distinct` on a daily pre-aggregate. If the pre-aggregate stores "2 distinct customers on 2024-01-15" and "1 distinct customer on 2024-01-16", you cannot sum those daily values to get the monthly distinct count, because the same customer can appear on multiple days.

| order_date_day | status | distinct_customers |
|---|---|---|
| 2024-01-15 | shipped | 2 (Alice, Bob) |
| 2024-01-16 | shipped | 1 (Alice) |

Re-aggregating: 2 + 1 = **3**, but the correct monthly answer is **2** (Alice, Bob). The pre-aggregate lost track of *which* customers were counted.
Re-aggregating gives `2 + 1 = 3`, but the correct monthly answer is `2` (`Alice`, `Bob`). The pre-aggregate no longer knows which customers were counted.

We're investigating supporting `count_distinct` through approximation algorithms. [Follow this issue](https://github.com/lightdash/lightdash/issues/21536) for updates.

Expand All @@ -133,7 +151,6 @@ For similar reasons, the following metric types are also not supported:
- `median`, `percentile`
- `percent_of_total`, `percent_of_previous`
- `running_total`
- Custom SQL metrics — [Follow this issue](https://github.com/lightdash/lightdash/issues/21537)
- `number`, `string`, `date`, `timestamp`, `boolean`

For metrics that can't be pre-aggregated, consider using [caching](/guides/developer/caching) instead.
Expand Down Expand Up @@ -181,7 +198,7 @@ A single pre-aggregate can serve many different queries. A daily pre-aggregate w
**Use results caching when:**

- Query patterns are ad-hoc or unpredictable
- You need count_distinct, median, percentile, custom SQL metrics, table calculations, or custom dimensions/metrics
- You need unsupported features listed above, such as `count_distinct`, [Parameters](/references/lightdash-config-yml#parameters-configuration), [`sql_filter`](/references/tables#sql-filter-row-level-security), or raw SQL table calculations
- You're using the SQL runner
- You don't want upfront configuration work

Expand Down
Loading