fix(dbt): update README to reflect actual run.sh workflow

roller100 (BearingNode) · roller100 (BearingNode) · commit 9816cb3be445 · 2026-03-18T13:05:26.000Z
The local debugging section referenced run_dbt_tests.sh, which was removed during the PR OpenLineage#211 cleanup. Replace with accurate instructions using docker compose + run.sh directly. Update the workflow description, test structure layout, and validation scope (add dataQualityAssertions) to match the current architecture. Signed-off-by: roller100 (BearingNode) <contact@bearingnode.com>
diff --git a/producer/dbt/README.md b/producer/dbt/README.md
@@ -13,23 +13,21 @@ It is important to note that this is a **compatibility validation framework** us
 
 ## Test Architecture and Workflow
 
-The test is orchestrated by the `run_dbt_tests.sh` script and follows a clear, sequential workflow designed for reliability and ease of use. This structure ensures that each component of the integration is validated systematically.
+The test is orchestrated by the scenario's `test/run.sh` script and follows a clear, sequential workflow designed for reliability and ease of use. This structure ensures that each component of the integration is validated systematically.
 
 The end-to-end process is as follows:
 
-1.  **Test Orchestration**: The `run_dbt_tests.sh` script serves as the main entry point. It sets up the environment and initiates over the scenarios folder to execute each test scenario.
+1.  **Scenario Execution**: The `test/run.sh` script executes the dbt project defined in the `runner/` directory using `dbt-ol seed`, `dbt-ol run`, and `dbt-ol test`.
 
-2.  **Scenario Execution**: The test runner executes the dbt project defined in the `runner/` directory. The specific dbt commands to be run (e.g., `dbt seed`, `dbt run`, `dbt test`) are defined in the test scenarios run script (`test/run.sh`).
-
-3.  **Event Generation and Capture**: During the execution, the `dbt-ol` wrapper intercepts the dbt commands and emits OpenLineage events. The `test/openlineage.yml` configuration directs these events to be captured as a local file (`{directory_input_param}/events.jsonl`) using the `file` transport.
+2.  **Event Generation and Capture**: During the execution, the `dbt-ol` wrapper intercepts the dbt commands and emits OpenLineage events. The `test/run.sh` script writes an `openlineage.yml` configuration that directs these events to be captured as a local file (`{output_dir}/events.jsonl`) using the `file` transport.
  
-4.  **Extract events**: OpenLineage emits events reliable to one file ('append: true' causes overwrites and events to be lost) so it is required to extract the before validation.
+3.  **Extract events**: OpenLineage emits all events to one file, so `run.sh` splits them into individual numbered files (`event-1.json`, `event-2.json`, …) before deleting the combined `.jsonl`.
 
-5.  **Event Validation**: Once the dbt process is complete, the test framework performs a two-stage validation on the generated events:
-    *   **Syntax Validation**: Each event is validated against the official OpenLineage JSON schema (e.g., version `1.40.1`) to ensure it is structurally correct.
-    *   **Semantic Validation**: The content of the events is compared against expected templates. This deep comparison, powered by the `scripts/compare_events.py` utility, verifies the accuracy of job names, dataset identifiers, lineage relationships, and the presence and structure of key facets.
+4.  **Event Validation**: Once the dbt process is complete, the shared framework (`scripts/validate_ol_events.py`) performs a two-stage validation on the generated events:
+    *   **Syntax Validation**: Each event is validated against the official OpenLineage JSON schema to ensure it is structurally correct.
+    *   **Semantic Validation**: The content of the events is compared against expected templates in `scenarios/csv_to_postgres/events/`. This comparison, powered by the `scripts/compare_events.py` utility, verifies the accuracy of job names, dataset identifiers, lineage relationships, and the presence and structure of key facets.
 
-6.  **Reporting**: Upon completion, the test runner generates a standardized JSON report (`dbt_producer_report.json`) that details the results of each validation step. This report is designed to be consumed by higher-level aggregation scripts in a CI/CD environment.
+5.  **Reporting**: Upon completion, the framework generates a standardised JSON report that details the results of each validation step for consumption by CI/CD aggregation scripts.
 
 ## Validation Scope
 
@@ -38,6 +36,7 @@ This test validates that the `openlineage-dbt` integration correctly generates O
 #### dbt Operations Covered:
 -   `dbt seed`: To load initial data.
 -   `dbt run`: To execute dbt models.
+-   `dbt test`: To run data quality tests and capture `dataQualityAssertions` facets.
 
 #### Validation Checks:
 -   **Event Generation**: Correctly creates `START` and `COMPLETE` events for jobs and runs.
@@ -50,6 +49,7 @@ This test validates that the `openlineage-dbt` integration correctly generates O
     -   `schema`
     -   `dataSource`
     -   `columnLineage`
+    -   `dataQualityAssertions`
 -   **Specification Compliance**: Events are validated against the OpenLineage specification schema (version `2-0-2`).
 
 ## Test Structure
@@ -58,16 +58,14 @@ The test is organized into the following key directories, each with a specific r
 
 ```
 producer/dbt/
-├── run_dbt_tests.sh           # Main test execution script
-├── scenarios/                 # Defines the dbt commands and expected outcomes for each test case
-├── output/                    # Default output directory for generated OpenLineage events (generated during execution)
+├── scenarios/                 # Test scenarios; each defines expected events and a run script
 ├── runner/                    # A self-contained dbt project used as the test target
-└── specs/                     # Stores OpenLineage spcification get from local repository (generated during execution)
+├── versions.json              # Supported component and OpenLineage version ranges
+└── maintainers.json           # Maintainer contact information
 ```
 
 -   **`runner/`**: A self-contained dbt project with models, seeds, and configuration. This is the target of the `dbt-ol` command.
--   **`scenarios/`**: Defines the dbt commands to be executed and contains the expected event templates for validation.
--   **`output/`**: The default output directory for the generated `events.jsonl` file and extracted events.
+-   **`scenarios/`**: Contains one directory per scenario. Each scenario has a `config.json` defining expected event templates, an `events/` directory of expected event JSON files, and a `test/` directory with `run.sh` and `compose.yml`.
 
 ## How to Run the Tests
 
@@ -106,34 +104,30 @@ The GitHub Actions workflow:
 
 If you need to debug event generation locally:
 
-1.  **Start PostgreSQL (Optional)**:
+1.  **Start PostgreSQL**:
     ```bash
-    cd producer/dbt/scenarions/csv_to_postgres/test
-    docker compose up
+    docker compose -f producer/dbt/scenarios/csv_to_postgres/test/compose.yml up -d
     ```
 
-2.  **Install Python Dependencies**:
+2.  **Install dbt and the OpenLineage wrapper** (use a virtual environment outside the repo):
     ```bash
-    # Activate virtual environment (recommended)
-    python -m venv venv
-    source venv/bin/activate  # On Windows: venv\Scripts\activate
+    python -m venv ~/.venvs/dbt-compat-test
+    source ~/.venvs/dbt-compat-test/bin/activate
+    pip install dbt-core==1.8.0 dbt-postgres openlineage-dbt==1.23.0
     ```
     
-3.  **Run Test Scenario**:
+3.  **Run the scenario**:
     ```bash
-    ./producer/dbt/run_dbt_tests.sh  --openlineage-directory <open_lineage_directory>
+    mkdir -p /tmp/dbt-events
+    bash producer/dbt/scenarios/csv_to_postgres/test/run.sh /tmp/dbt-events
     ```
 
-4.  **Inspect Generated Events**:
+4.  **Inspect generated events**:
     ```bash
-    # View events
-    cat ./producer/dbt/output/csv_to_postgres/event-{id}.json | jq '.'
-    
-    # check report
-    cat ./producer/dbt/dbt_producer_report.json | jq '.'
+    cat /tmp/dbt-events/event-1.json | jq '.'
     ```
 
-**Note**: Local debugging is entirely optional. All official validation happens in GitHub Actions with PostgreSQL service containers. The test runner (`test/run.sh`) is the same code used by CI/CD, ensuring consistency.
+**Note**: Local debugging is entirely optional. All official validation happens in GitHub Actions with PostgreSQL service containers. The `test/run.sh` script is the same code used by CI/CD, ensuring consistency.
 
 ## Important dbt Integration Notes