diff --git a/references/pre-aggregates/getting-started.mdx b/references/pre-aggregates/getting-started.mdx index 71b3ad3e..7d251cae 100644 --- a/references/pre-aggregates/getting-started.mdx +++ b/references/pre-aggregates/getting-started.mdx @@ -80,7 +80,7 @@ If you're using Lightdash YAML instead of dbt model YAML, see the [Lightdash YAM | `granularity` | No | Time granularity for the `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `quarter`, `year`. Must be paired with `time_dimension`. | | `max_rows` | No | Maximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer. | | `refresh` | No | Schedule configuration for automatic re-materialization. See [Scheduling refreshes](#scheduling-refreshes). | -| `materialization_role` | No | Fixed user attributes to use when materializing the pre-aggregate. See [Materialization role](#materialization-role). | +| `materialization_role` | No | Fixed access context to use when materializing the pre-aggregate. This is useful when your model or joined tables use [`required_attributes`](/references/tables#required-attributes) or [`any_attributes`](/references/tables#any-attributes). See [Materialization role](#materialization-role). | If you specify `time_dimension`, you **must** also specify `granularity`, and vice versa. @@ -239,11 +239,16 @@ You can set `max_rows` to cap the size of a materialization. If the aggregation ## Materialization role -If your model uses user attributes in SQL, the user who triggers the materialization can affect what gets stored in the pre-aggregate. +`materialization_role` is useful when access to the model depends on [`required_attributes`](/references/tables#required-attributes) or [`any_attributes`](/references/tables#any-attributes). -For example, if the model has `sql_filter: "customers.region IN (${ld.attr.allowed_regions})"`, the materialized table will only contain the regions that the admin who triggered the materialization happens to have access to. If that admin has `allowed_regions: ["EMEA"]`, the pre-aggregate will only contain EMEA rows. +For example, if a joined table is only available to users with `region_access: emea`, then materializing a pre-aggregate without a fixed access context could produce different results depending on who triggered the build. -Use `materialization_role` to define a fixed set of user attributes for materializing the pre-aggregate. At query time, Lightdash ignores `materialization_role` and uses the real viewer user's attributes instead, so a user with `allowed_regions: ["EMEA"]` still only sees EMEA rows. +Use `materialization_role` to make materialization run with a stable set of [user attributes](/references/workspace/user-attributes). + +This is intended for access control fields such as: + +- [`required_attributes`](/references/tables#required-attributes) +- [`any_attributes`](/references/tables#any-attributes) @@ -254,7 +259,6 @@ Use `materialization_role` to define a fixed set of user attributes for material joins: - join: customers sql_on: ${customers.customer_id} = ${orders.customer_id} - sql_filter: "customers.region IN (${ld.attr.allowed_regions})" pre_aggregates: - name: orders_daily_by_region dimensions: @@ -266,10 +270,7 @@ Use `materialization_role` to define a fixed set of user attributes for material materialization_role: email: materialize@acme.com attributes: - allowed_regions: - - EMEA - - APAC - - NA + region_access: emea ``` @@ -281,7 +282,6 @@ Use `materialization_role` to define a fixed set of user attributes for material joins: - join: customers sql_on: ${customers.customer_id} = ${orders.customer_id} - sql_filter: "customers.region IN (${ld.attr.allowed_regions})" pre_aggregates: - name: orders_daily_by_region dimensions: @@ -293,16 +293,11 @@ Use `materialization_role` to define a fixed set of user attributes for material materialization_role: email: materialize@acme.com attributes: - allowed_regions: - - EMEA - - APAC - - NA + region_access: emea ``` -With this setup, the materialized table always contains EMEA, APAC, and NA rows regardless of who triggered the materialization. When a viewer queries the pre-aggregate, Lightdash applies their own user attributes to the query, not the `materialization_role`. - ## Complete example Here's a full model definition with a pre-aggregate, including joins, scheduling, and row limits: @@ -449,4 +444,5 @@ These queries would **not** match and would query the warehouse directly: - Queries grouped by a dimension not in the pre-aggregate (for example, `customer_id`) - Queries with hourly granularity (finer than the pre-aggregate's `day`) - Queries without `status = completed` or with a broader `status` filter -- Queries with custom dimensions, custom metrics, or table calculations +- Queries with [Parameters](/references/lightdash-config-yml#parameters-configuration), [user attributes](/references/workspace/user-attributes) inside SQL, or [`sql_filter`](/references/tables#sql-filter-row-level-security) +- Queries with raw SQL table calculations diff --git a/references/pre-aggregates/monitoring.mdx b/references/pre-aggregates/monitoring.mdx index 4756b655..6d2d12c8 100644 --- a/references/pre-aggregates/monitoring.mdx +++ b/references/pre-aggregates/monitoring.mdx @@ -50,11 +50,11 @@ When a query doesn't match any pre-aggregate, Lightdash records the specific rea | **Filter dimension not in pre-aggregate** | A filter references a dimension not in the pre-aggregate. | Add the filter dimension to the `dimensions` list — even if it's only used for filtering, not grouping. | | **Pre-aggregate filter not satisfied** | The pre-aggregate definition includes a static filter, but the query is missing it or uses a broader/incompatible filter. | Add the matching filter to the query, narrow the query filter, or create another pre-aggregate for that query pattern. | | **Non-additive metric** | The query includes a metric type that can't be re-aggregated (for example, `count_distinct` or `median`). | This metric type is not supported. See [supported metric types](/references/pre-aggregates/overview#supported-metric-types). | -| **Custom SQL metric** | A metric uses a custom SQL expression. | Custom SQL metrics are not supported by pre-aggregates. | +| **Custom SQL metric** | The query includes a non-reaggregatable custom SQL metric, such as a `number` metric. | Use a supported metric type, or let the query run against the warehouse. | | **Granularity too fine** | The query requests a finer time granularity than the pre-aggregate provides (for example, `hour` on a `day` pre-aggregate). | Either lower the pre-aggregate's granularity or accept the warehouse query for this use case. | -| **Custom dimension present** | The query uses a custom dimension. | Custom dimensions are not supported by pre-aggregates. | -| **Custom metric present** | The query uses a custom metric. | Custom metrics are not supported by pre-aggregates. | -| **Table calculation present** | The query includes table calculations. | Table calculations are not supported by pre-aggregates. | +| **Custom dimension present** | The query uses a custom SQL dimension. | Custom SQL dimensions created in the UI are not supported. [Write them back](/guides/developer/dbt-write-back#write-back-dimensions-automatically-replacing-custom-dimensions) to the semantic layer. | +| **Custom metric present** | The query uses a custom metric. | Custom metrics defined in the Explorer are not supported. [Write them back](/guides/developer/dbt-write-back#write-back-dimensions-automatically-replacing-custom-dimensions) to the semantic layer. | +| **Table calculation present** | The query includes a raw SQL table calculation. | Use a formula table calculation instead, or let the query run against the warehouse. | | **User bypass** | The user explicitly bypassed the pre-aggregate cache. | No action needed — this is intentional. | ## Dashboard pre-aggregate view diff --git a/references/pre-aggregates/overview.mdx b/references/pre-aggregates/overview.mdx index af83e850..32332edc 100644 --- a/references/pre-aggregates/overview.mdx +++ b/references/pre-aggregates/overview.mdx @@ -74,7 +74,8 @@ When a user runs a query, Lightdash automatically checks if a pre-aggregate can - Every dimension used in **filters** is included in the pre-aggregate - If the pre-aggregate itself defines `filters`, the query must include an equivalent or narrower filter - All metrics use [supported metric types](#supported-metric-types) -- The query does not contain custom dimensions, custom metrics, or table calculations +- The query does not contain raw SQL table calculations +- The query does not use [Parameters](/references/lightdash-config-yml#parameters-configuration), [`sql_filter`](/references/tables#sql-filter-row-level-security), or [user attributes](/references/workspace/user-attributes) inside SQL - If the query uses a time dimension, the requested granularity must be **equal to or coarser** than the pre-aggregate's granularity @@ -114,16 +115,33 @@ Pre-aggregates support metrics that can be re-aggregated from pre-computed resul Pre-aggregates are **not compatible** with [personal warehouse connections](/references/workspace/personal-warehouse-connections). Materialization always runs under a single user's credentials, so warehouse-level access rules are not applied per viewer. If you rely on personal warehouse connections to enforce data access, use [results caching](/guides/developer/caching) instead. -### Current limitations +## Current limitations -Not all metrics work this way. Consider `count_distinct` with the same daily pre-aggregate from above. If a daily pre-aggregate stores "2 distinct customers on 2024-01-15" and "1 distinct customer on 2024-01-16", you can't sum those to get the monthly distinct count — Alice ordered on both days and would be counted twice: +Pre-aggregates support a narrower subset of the Lightdash semantic layer than regular warehouse queries. + +### Not supported + +Pre-aggregates do not support: + +- [Parameters](/references/lightdash-config-yml#parameters-configuration) +- [`sql_filter`](/references/tables#sql-filter-row-level-security) and [`sql_where`](/references/tables#sql-filter-row-level-security) +- [User attributes](/references/workspace/user-attributes) when referenced from SQL. [`required_attributes`](/references/tables#required-attributes) and [`any_attributes`](/references/tables#any-attributes) are still supported through `materialization_role`. +- [Custom metrics](/guides/custom-fields#custom-metrics) created in the Explorer +- [Custom SQL dimensions](/guides/custom-fields#custom-sql) created in the Explorer ([Custom bin dimensions](/guides/custom-fields#bin) are supported) +- SQL table calculations ([Formula table calculations](/guides/formula-table-calculations#formula-table-calculations) are supported) + +### Metrics that can't be pre-aggregated + +Pre-aggregates do not support metric types that cannot be re-aggregated from pre-computed results. + +For example, consider `count_distinct` on a daily pre-aggregate. If the pre-aggregate stores "2 distinct customers on 2024-01-15" and "1 distinct customer on 2024-01-16", you cannot sum those daily values to get the monthly distinct count, because the same customer can appear on multiple days. | order_date_day | status | distinct_customers | |---|---|---| | 2024-01-15 | shipped | 2 (Alice, Bob) | | 2024-01-16 | shipped | 1 (Alice) | -Re-aggregating: 2 + 1 = **3**, but the correct monthly answer is **2** (Alice, Bob). The pre-aggregate lost track of *which* customers were counted. +Re-aggregating gives `2 + 1 = 3`, but the correct monthly answer is `2` (`Alice`, `Bob`). The pre-aggregate no longer knows which customers were counted. We're investigating supporting `count_distinct` through approximation algorithms. [Follow this issue](https://github.com/lightdash/lightdash/issues/21536) for updates. @@ -133,7 +151,6 @@ For similar reasons, the following metric types are also not supported: - `median`, `percentile` - `percent_of_total`, `percent_of_previous` - `running_total` -- Custom SQL metrics — [Follow this issue](https://github.com/lightdash/lightdash/issues/21537) - `number`, `string`, `date`, `timestamp`, `boolean` For metrics that can't be pre-aggregated, consider using [caching](/guides/developer/caching) instead. @@ -181,7 +198,7 @@ A single pre-aggregate can serve many different queries. A daily pre-aggregate w **Use results caching when:** - Query patterns are ad-hoc or unpredictable -- You need count_distinct, median, percentile, custom SQL metrics, table calculations, or custom dimensions/metrics +- You need unsupported features listed above, such as `count_distinct`, [Parameters](/references/lightdash-config-yml#parameters-configuration), [`sql_filter`](/references/tables#sql-filter-row-level-security), or raw SQL table calculations - You're using the SQL runner - You don't want upfront configuration work