You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(dbt): update README to reflect actual run.sh workflow
The local debugging section referenced run_dbt_tests.sh, which was
removed during the PR OpenLineage#211 cleanup. Replace with accurate instructions
using docker compose + run.sh directly.
Update the workflow description, test structure layout, and validation
scope (add dataQualityAssertions) to match the current architecture.
Signed-off-by: roller100 (BearingNode) <contact@bearingnode.com>
Copy file name to clipboardExpand all lines: producer/dbt/README.md
+26-32Lines changed: 26 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,23 +13,21 @@ It is important to note that this is a **compatibility validation framework** us
13
13
14
14
## Test Architecture and Workflow
15
15
16
-
The test is orchestrated by the `run_dbt_tests.sh` script and follows a clear, sequential workflow designed for reliability and ease of use. This structure ensures that each component of the integration is validated systematically.
16
+
The test is orchestrated by the scenario's `test/run.sh` script and follows a clear, sequential workflow designed for reliability and ease of use. This structure ensures that each component of the integration is validated systematically.
17
17
18
18
The end-to-end process is as follows:
19
19
20
-
1.**Test Orchestration**: The `run_dbt_tests.sh` script serves as the main entry point. It sets up the environment and initiates over the scenarios folder to execute each test scenario.
20
+
1.**Scenario Execution**: The `test/run.sh` script executes the dbt project defined in the `runner/` directory using `dbt-ol seed`, `dbt-ol run`, and `dbt-ol test`.
21
21
22
-
2.**Scenario Execution**: The test runner executes the dbt project defined in the `runner/` directory. The specific dbt commands to be run (e.g., `dbt seed`, `dbt run`, `dbt test`) are defined in the test scenarios run script (`test/run.sh`).
23
-
24
-
3.**Event Generation and Capture**: During the execution, the `dbt-ol` wrapper intercepts the dbt commands and emits OpenLineage events. The `test/openlineage.yml` configuration directs these events to be captured as a local file (`{directory_input_param}/events.jsonl`) using the `file` transport.
22
+
2.**Event Generation and Capture**: During the execution, the `dbt-ol` wrapper intercepts the dbt commands and emits OpenLineage events. The `test/run.sh` script writes an `openlineage.yml` configuration that directs these events to be captured as a local file (`{output_dir}/events.jsonl`) using the `file` transport.
25
23
26
-
4.**Extract events**: OpenLineage emits events reliable to one file ('append: true' causes overwrites and events to be lost) so it is required to extract the before validation.
24
+
3.**Extract events**: OpenLineage emits all events to one file, so `run.sh` splits them into individual numbered files (`event-1.json`, `event-2.json`, …) before deleting the combined `.jsonl`.
27
25
28
-
5.**Event Validation**: Once the dbt process is complete, the test framework performs a two-stage validation on the generated events:
29
-
***Syntax Validation**: Each event is validated against the official OpenLineage JSON schema (e.g., version `1.40.1`) to ensure it is structurally correct.
30
-
***Semantic Validation**: The content of the events is compared against expected templates. This deep comparison, powered by the `scripts/compare_events.py` utility, verifies the accuracy of job names, dataset identifiers, lineage relationships, and the presence and structure of key facets.
26
+
4.**Event Validation**: Once the dbt process is complete, the shared framework (`scripts/validate_ol_events.py`) performs a two-stage validation on the generated events:
27
+
***Syntax Validation**: Each event is validated against the official OpenLineage JSON schema to ensure it is structurally correct.
28
+
***Semantic Validation**: The content of the events is compared against expected templates in `scenarios/csv_to_postgres/events/`. This comparison, powered by the `scripts/compare_events.py` utility, verifies the accuracy of job names, dataset identifiers, lineage relationships, and the presence and structure of key facets.
31
29
32
-
6.**Reporting**: Upon completion, the test runner generates a standardized JSON report (`dbt_producer_report.json`) that details the results of each validation step. This report is designed to be consumed by higher-level aggregation scripts in a CI/CD environment.
30
+
5.**Reporting**: Upon completion, the framework generates a standardised JSON report that details the results of each validation step for consumption by CI/CD aggregation scripts.
33
31
34
32
## Validation Scope
35
33
@@ -38,6 +36,7 @@ This test validates that the `openlineage-dbt` integration correctly generates O
38
36
#### dbt Operations Covered:
39
37
-`dbt seed`: To load initial data.
40
38
-`dbt run`: To execute dbt models.
39
+
-`dbt test`: To run data quality tests and capture `dataQualityAssertions` facets.
41
40
42
41
#### Validation Checks:
43
42
-**Event Generation**: Correctly creates `START` and `COMPLETE` events for jobs and runs.
@@ -50,6 +49,7 @@ This test validates that the `openlineage-dbt` integration correctly generates O
50
49
-`schema`
51
50
-`dataSource`
52
51
-`columnLineage`
52
+
-`dataQualityAssertions`
53
53
-**Specification Compliance**: Events are validated against the OpenLineage specification schema (version `2-0-2`).
54
54
55
55
## Test Structure
@@ -58,16 +58,14 @@ The test is organized into the following key directories, each with a specific r
58
58
59
59
```
60
60
producer/dbt/
61
-
├── run_dbt_tests.sh # Main test execution script
62
-
├── scenarios/ # Defines the dbt commands and expected outcomes for each test case
63
-
├── output/ # Default output directory for generated OpenLineage events (generated during execution)
61
+
├── scenarios/ # Test scenarios; each defines expected events and a run script
64
62
├── runner/ # A self-contained dbt project used as the test target
65
-
└── specs/ # Stores OpenLineage spcification get from local repository (generated during execution)
63
+
├── versions.json # Supported component and OpenLineage version ranges
64
+
└── maintainers.json # Maintainer contact information
66
65
```
67
66
68
67
-**`runner/`**: A self-contained dbt project with models, seeds, and configuration. This is the target of the `dbt-ol` command.
69
-
-**`scenarios/`**: Defines the dbt commands to be executed and contains the expected event templates for validation.
70
-
-**`output/`**: The default output directory for the generated `events.jsonl` file and extracted events.
68
+
-**`scenarios/`**: Contains one directory per scenario. Each scenario has a `config.json` defining expected event templates, an `events/` directory of expected event JSON files, and a `test/` directory with `run.sh` and `compose.yml`.
71
69
72
70
## How to Run the Tests
73
71
@@ -106,34 +104,30 @@ The GitHub Actions workflow:
106
104
107
105
If you need to debug event generation locally:
108
106
109
-
1.**Start PostgreSQL (Optional)**:
107
+
1.**Start PostgreSQL**:
110
108
```bash
111
-
cd producer/dbt/scenarions/csv_to_postgres/test
112
-
docker compose up
109
+
docker compose -f producer/dbt/scenarios/csv_to_postgres/test/compose.yml up -d
113
110
```
114
111
115
-
2. **Install Python Dependencies**:
112
+
2. **Install dbt and the OpenLineage wrapper** (use a virtual environment outside the repo):
116
113
```bash
117
-
# Activate virtual environment (recommended)
118
-
python -m venv venv
119
-
source venv/bin/activate # On Windows: venv\Scripts\activate
**Note**: Local debugging is entirely optional. All official validation happens in GitHub Actions with PostgreSQL service containers. The test runner (`test/run.sh`) is the same code used by CI/CD, ensuring consistency.
130
+
**Note**: Local debugging is entirely optional. All official validation happens in GitHub Actions with PostgreSQL service containers. The `test/run.sh` script is the same code used by CI/CD, ensuring consistency.
0 commit comments