|
| 1 | +# job_conditional_execution |
| 2 | + |
| 3 | +This example demonstrates a Lakeflow Job that uses conditional task execution based on data quality checks. |
| 4 | + |
| 5 | +The Lakeflow Job consists of following tasks: |
| 6 | +1. Checks data quality and calculates bad records |
| 7 | +2. Evaluates if bad records exceed a threshold (100 records) |
| 8 | +3. Routes to different processing paths based on the condition: |
| 9 | + - If bad records > 100: runs `fix_path` task |
| 10 | + - If bad records ≤ 100: runs `skip_path` task |
| 11 | + |
| 12 | +* `src/`: Notebook source code for this project. |
| 13 | + * `src/check_quality.py`: Checks data quality and outputs bad record count |
| 14 | + * `src/fix_path.py`: Handles cases with high bad record count |
| 15 | + * `src/skip_path.py`: The skip path |
| 16 | +* `resources/`: Resource configurations (jobs, pipelines, etc.) |
| 17 | + * `resources/conditional_execution.py`: job definition with conditional tasks |
| 18 | + |
| 19 | +## Documentation |
| 20 | + |
| 21 | +For more information about conditional task execution, see: |
| 22 | +- [Add branching logic to a job with the If/else task](https://docs.databricks.com/aws/en/jobs/if-else) |
| 23 | + |
| 24 | +## Getting started |
| 25 | + |
| 26 | +Choose how you want to work on this project: |
| 27 | + |
| 28 | +(a) Directly in your Databricks workspace, see |
| 29 | + https://docs.databricks.com/dev-tools/bundles/workspace. |
| 30 | + |
| 31 | +(b) Locally with an IDE like Cursor or VS Code, see |
| 32 | + https://docs.databricks.com/vscode-ext. |
| 33 | + |
| 34 | +(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html |
| 35 | + |
| 36 | +If you're developing with an IDE, dependencies for this project should be installed using uv: |
| 37 | + |
| 38 | +* Make sure you have the UV package manager installed. |
| 39 | + It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/. |
| 40 | +* Run `uv sync --dev` to install the project's dependencies. |
| 41 | + |
| 42 | + |
| 43 | +# Using this project using the CLI |
| 44 | + |
| 45 | +The Databricks workspace and IDE extensions provide a graphical interface for working |
| 46 | +with this project. It's also possible to interact with it directly using the CLI: |
| 47 | + |
| 48 | +1. Authenticate to your Databricks workspace, if you have not done so already: |
| 49 | + ``` |
| 50 | + $ databricks configure |
| 51 | + ``` |
| 52 | +
|
| 53 | +2. To deploy a development copy of this project, type: |
| 54 | + ``` |
| 55 | + $ databricks bundle deploy --target dev |
| 56 | + ``` |
| 57 | + (Note that "dev" is the default target, so the `--target` parameter |
| 58 | + is optional here.) |
| 59 | +
|
| 60 | + This deploys everything that's defined for this project. |
| 61 | + For example, this project will deploy a job called |
| 62 | + `[dev yourname] conditional_execution_example` to your workspace. |
| 63 | + You can find that resource by opening your workspace and clicking on **Jobs & Pipelines**. |
| 64 | +
|
| 65 | +3. To run the job, use the "run" command: |
| 66 | + ``` |
| 67 | + $ databricks bundle run conditional_execution_example |
| 68 | + ``` |
0 commit comments