Skip to content

Commit 9816cb3

Browse files
author
roller100 (BearingNode)
committed
fix(dbt): update README to reflect actual run.sh workflow
The local debugging section referenced run_dbt_tests.sh, which was removed during the PR OpenLineage#211 cleanup. Replace with accurate instructions using docker compose + run.sh directly. Update the workflow description, test structure layout, and validation scope (add dataQualityAssertions) to match the current architecture. Signed-off-by: roller100 (BearingNode) <contact@bearingnode.com>
1 parent 4fdf5f1 commit 9816cb3

1 file changed

Lines changed: 26 additions & 32 deletions

File tree

producer/dbt/README.md

Lines changed: 26 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -13,23 +13,21 @@ It is important to note that this is a **compatibility validation framework** us
1313

1414
## Test Architecture and Workflow
1515

16-
The test is orchestrated by the `run_dbt_tests.sh` script and follows a clear, sequential workflow designed for reliability and ease of use. This structure ensures that each component of the integration is validated systematically.
16+
The test is orchestrated by the scenario's `test/run.sh` script and follows a clear, sequential workflow designed for reliability and ease of use. This structure ensures that each component of the integration is validated systematically.
1717

1818
The end-to-end process is as follows:
1919

20-
1. **Test Orchestration**: The `run_dbt_tests.sh` script serves as the main entry point. It sets up the environment and initiates over the scenarios folder to execute each test scenario.
20+
1. **Scenario Execution**: The `test/run.sh` script executes the dbt project defined in the `runner/` directory using `dbt-ol seed`, `dbt-ol run`, and `dbt-ol test`.
2121

22-
2. **Scenario Execution**: The test runner executes the dbt project defined in the `runner/` directory. The specific dbt commands to be run (e.g., `dbt seed`, `dbt run`, `dbt test`) are defined in the test scenarios run script (`test/run.sh`).
23-
24-
3. **Event Generation and Capture**: During the execution, the `dbt-ol` wrapper intercepts the dbt commands and emits OpenLineage events. The `test/openlineage.yml` configuration directs these events to be captured as a local file (`{directory_input_param}/events.jsonl`) using the `file` transport.
22+
2. **Event Generation and Capture**: During the execution, the `dbt-ol` wrapper intercepts the dbt commands and emits OpenLineage events. The `test/run.sh` script writes an `openlineage.yml` configuration that directs these events to be captured as a local file (`{output_dir}/events.jsonl`) using the `file` transport.
2523

26-
4. **Extract events**: OpenLineage emits events reliable to one file ('append: true' causes overwrites and events to be lost) so it is required to extract the before validation.
24+
3. **Extract events**: OpenLineage emits all events to one file, so `run.sh` splits them into individual numbered files (`event-1.json`, `event-2.json`, …) before deleting the combined `.jsonl`.
2725

28-
5. **Event Validation**: Once the dbt process is complete, the test framework performs a two-stage validation on the generated events:
29-
* **Syntax Validation**: Each event is validated against the official OpenLineage JSON schema (e.g., version `1.40.1`) to ensure it is structurally correct.
30-
* **Semantic Validation**: The content of the events is compared against expected templates. This deep comparison, powered by the `scripts/compare_events.py` utility, verifies the accuracy of job names, dataset identifiers, lineage relationships, and the presence and structure of key facets.
26+
4. **Event Validation**: Once the dbt process is complete, the shared framework (`scripts/validate_ol_events.py`) performs a two-stage validation on the generated events:
27+
* **Syntax Validation**: Each event is validated against the official OpenLineage JSON schema to ensure it is structurally correct.
28+
* **Semantic Validation**: The content of the events is compared against expected templates in `scenarios/csv_to_postgres/events/`. This comparison, powered by the `scripts/compare_events.py` utility, verifies the accuracy of job names, dataset identifiers, lineage relationships, and the presence and structure of key facets.
3129

32-
6. **Reporting**: Upon completion, the test runner generates a standardized JSON report (`dbt_producer_report.json`) that details the results of each validation step. This report is designed to be consumed by higher-level aggregation scripts in a CI/CD environment.
30+
5. **Reporting**: Upon completion, the framework generates a standardised JSON report that details the results of each validation step for consumption by CI/CD aggregation scripts.
3331

3432
## Validation Scope
3533

@@ -38,6 +36,7 @@ This test validates that the `openlineage-dbt` integration correctly generates O
3836
#### dbt Operations Covered:
3937
- `dbt seed`: To load initial data.
4038
- `dbt run`: To execute dbt models.
39+
- `dbt test`: To run data quality tests and capture `dataQualityAssertions` facets.
4140

4241
#### Validation Checks:
4342
- **Event Generation**: Correctly creates `START` and `COMPLETE` events for jobs and runs.
@@ -50,6 +49,7 @@ This test validates that the `openlineage-dbt` integration correctly generates O
5049
- `schema`
5150
- `dataSource`
5251
- `columnLineage`
52+
- `dataQualityAssertions`
5353
- **Specification Compliance**: Events are validated against the OpenLineage specification schema (version `2-0-2`).
5454

5555
## Test Structure
@@ -58,16 +58,14 @@ The test is organized into the following key directories, each with a specific r
5858

5959
```
6060
producer/dbt/
61-
├── run_dbt_tests.sh # Main test execution script
62-
├── scenarios/ # Defines the dbt commands and expected outcomes for each test case
63-
├── output/ # Default output directory for generated OpenLineage events (generated during execution)
61+
├── scenarios/ # Test scenarios; each defines expected events and a run script
6462
├── runner/ # A self-contained dbt project used as the test target
65-
└── specs/ # Stores OpenLineage spcification get from local repository (generated during execution)
63+
├── versions.json # Supported component and OpenLineage version ranges
64+
└── maintainers.json # Maintainer contact information
6665
```
6766

6867
- **`runner/`**: A self-contained dbt project with models, seeds, and configuration. This is the target of the `dbt-ol` command.
69-
- **`scenarios/`**: Defines the dbt commands to be executed and contains the expected event templates for validation.
70-
- **`output/`**: The default output directory for the generated `events.jsonl` file and extracted events.
68+
- **`scenarios/`**: Contains one directory per scenario. Each scenario has a `config.json` defining expected event templates, an `events/` directory of expected event JSON files, and a `test/` directory with `run.sh` and `compose.yml`.
7169

7270
## How to Run the Tests
7371

@@ -106,34 +104,30 @@ The GitHub Actions workflow:
106104

107105
If you need to debug event generation locally:
108106

109-
1. **Start PostgreSQL (Optional)**:
107+
1. **Start PostgreSQL**:
110108
```bash
111-
cd producer/dbt/scenarions/csv_to_postgres/test
112-
docker compose up
109+
docker compose -f producer/dbt/scenarios/csv_to_postgres/test/compose.yml up -d
113110
```
114111

115-
2. **Install Python Dependencies**:
112+
2. **Install dbt and the OpenLineage wrapper** (use a virtual environment outside the repo):
116113
```bash
117-
# Activate virtual environment (recommended)
118-
python -m venv venv
119-
source venv/bin/activate # On Windows: venv\Scripts\activate
114+
python -m venv ~/.venvs/dbt-compat-test
115+
source ~/.venvs/dbt-compat-test/bin/activate
116+
pip install dbt-core==1.8.0 dbt-postgres openlineage-dbt==1.23.0
120117
```
121118

122-
3. **Run Test Scenario**:
119+
3. **Run the scenario**:
123120
```bash
124-
./producer/dbt/run_dbt_tests.sh --openlineage-directory <open_lineage_directory>
121+
mkdir -p /tmp/dbt-events
122+
bash producer/dbt/scenarios/csv_to_postgres/test/run.sh /tmp/dbt-events
125123
```
126124

127-
4. **Inspect Generated Events**:
125+
4. **Inspect generated events**:
128126
```bash
129-
# View events
130-
cat ./producer/dbt/output/csv_to_postgres/event-{id}.json | jq '.'
131-
132-
# check report
133-
cat ./producer/dbt/dbt_producer_report.json | jq '.'
127+
cat /tmp/dbt-events/event-1.json | jq '.'
134128
```
135129

136-
**Note**: Local debugging is entirely optional. All official validation happens in GitHub Actions with PostgreSQL service containers. The test runner (`test/run.sh`) is the same code used by CI/CD, ensuring consistency.
130+
**Note**: Local debugging is entirely optional. All official validation happens in GitHub Actions with PostgreSQL service containers. The `test/run.sh` script is the same code used by CI/CD, ensuring consistency.
137131

138132
## Important dbt Integration Notes
139133

0 commit comments

Comments
 (0)