databricks
diff --git a/‎knowledge_base/job_backfill_data/README.md‎
Lines changed: 73 additions & 0 deletions b/‎knowledge_base/job_backfill_data/README.md‎
Lines changed: 73 additions & 0 deletions
diff --git a/‎knowledge_base/job_backfill_data/databricks.yml‎
Lines changed: 19 additions & 0 deletions b/‎knowledge_base/job_backfill_data/databricks.yml‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎knowledge_base/job_backfill_data/pyproject.toml‎
Lines changed: 25 additions & 0 deletions b/‎knowledge_base/job_backfill_data/pyproject.toml‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎knowledge_base/job_backfill_data/resources/__init__.py‎
Lines changed: 16 additions & 0 deletions b/‎knowledge_base/job_backfill_data/resources/__init__.py‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎knowledge_base/job_backfill_data/resources/backfill_data.py‎
Lines changed: 24 additions & 0 deletions b/‎knowledge_base/job_backfill_data/resources/backfill_data.py‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎knowledge_base/job_backfill_data/src/my_query.sql‎
Lines changed: 5 additions & 0 deletions b/‎knowledge_base/job_backfill_data/src/my_query.sql‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎knowledge_base/job_conditional_execution/README.md‎
Lines changed: 68 additions & 0 deletions b/‎knowledge_base/job_conditional_execution/README.md‎
Lines changed: 68 additions & 0 deletions
diff --git a/‎knowledge_base/job_conditional_execution/databricks.yml‎
Lines changed: 19 additions & 0 deletions b/‎knowledge_base/job_conditional_execution/databricks.yml‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎knowledge_base/job_conditional_execution/pyproject.toml‎
Lines changed: 25 additions & 0 deletions b/‎knowledge_base/job_conditional_execution/pyproject.toml‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎knowledge_base/job_conditional_execution/resources/__init__.py‎
Lines changed: 16 additions & 0 deletions b/‎knowledge_base/job_conditional_execution/resources/__init__.py‎
Lines changed: 16 additions & 0 deletions
@@ -0,0 +1,73 @@
+# job_backfill_data
+
+This example demonstrates a Databricks Asset Bundle (DABs) Job that runs a SQL task with a date parameter for backfilling data.
+
+The Job consists of:
+
+1. **run_daily_sql** — A SQL task that runs `src/my_query.sql` with a `run_date` job parameter. The query inserts data from a source table into a target table filtered by `event_date = run_date`, so you can backfill or reprocess specific dates.
+
+* `src/`: SQL and notebook source code for this project.
+  * `src/my_query.sql`: Daily insert query that uses the `:run_date` parameter to filter by event date.
+* `resources/`: Resource configurations (jobs, pipelines, etc.)
+  * `resources/backfill_data.py`: job definition with a parameterized SQL task.
+
+## Job parameters
+
+| Parameter   | Default     | Description                          |
+|------------|-------------|--------------------------------------|
+| `run_date` | `2024-01-01` | Date used to filter data (e.g. `event_date`). |
+
+Before deploying, set `warehouse_id` in `resources/backfill_data.py` to your SQL warehouse ID, and adjust the catalog/schema/table names in `src/my_query.sql` to match your environment.
+
+## Documentation
+
+For more information about job backfills and parameters, see:
+- [Create and run jobs](https://docs.databricks.com/en/jobs/index.html)
+- [Backfill jobs](https://docs.databricks.com/aws/en/jobs/backfill-jobs)
+
+## Getting started
+
+Choose how you want to work on this project:
+
+(a) Directly in your Databricks workspace, see
+    https://docs.databricks.com/dev-tools/bundles/workspace.
+
+(b) Locally with an IDE like Cursor or VS Code, see
+    https://docs.databricks.com/vscode-ext.
+
+(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
+
+If you're developing with an IDE, dependencies for this project should be installed using uv:
+
+*  Make sure you have the UV package manager installed.
+   It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
+*  Run `uv sync --dev` to install the project's dependencies.
+
+## Using this project with the CLI
+
+The Databricks workspace and IDE extensions provide a graphical interface for working
+with this project. You can also use the CLI:
+
+1. Authenticate to your Databricks workspace, if you have not done so already:
+   ```
+   $ databricks configure
+   ```
+
+2. To deploy a development copy of this project, run:
+   ```
+   $ databricks bundle deploy --target dev
+   ```
+   (Note: "dev" is the default target, so `--target` is optional.)
+
+   This deploys everything defined for this project, including the job
+   `[dev yourname] sql_backfill_example`. You can find it under **Workflows** (or **Jobs & Pipelines**) in your workspace.
+
+3. To run the job with the default `run_date`:
+   ```
+   $ databricks bundle run sql_backfill_example
+   ```
+
+4. To run the job for a specific date (e.g. backfill):
+   ```
+   $ databricks bundle run sql_backfill_example --parameters run_date=2024-02-01
+   ```
@@ -0,0 +1,19 @@
+# This is a Databricks asset bundle definition for job backfill data.
+# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
+bundle:
+  name: job_backfill_data
+
+python:
+  venv_path: .venv
+  # Functions called to load resources defined in Python. See resources/__init__.py
+  resources:
+    - "resources:load_resources"
+
+include:
+  - resources/*.yml
+  - resources/*/*.yml
+
+targets:
+  dev:
+    mode: development
+    default: true
@@ -0,0 +1,25 @@
+[project]
+name = "job_backfill_data"
+version = "0.0.1"
+authors = [{ name = "Databricks Field Engineering" }]
+requires-python = ">=3.10,<=3.13"
+dependencies = [
+    # Any dependencies for jobs and pipelines in this project can be added here
+    # See also https://docs.databricks.com/dev-tools/bundles/library-dependencies
+    #
+    # LIMITATION: for pipelines, dependencies are cached during development;
+    # add dependencies to the 'environment' section of pipeline.yml file instead
+]
+
+[dependency-groups]
+dev = [
+    "pytest",
+    "databricks-bundles==0.275.0",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.black]
+line-length = 125
@@ -0,0 +1,16 @@
+from databricks.bundles.core import (
+    Bundle,
+    Resources,
+    load_resources_from_current_package_module,
+)
+
+
+def load_resources(bundle: Bundle) -> Resources:
+    """
+    'load_resources' function is referenced in databricks.yml and is responsible for loading
+    bundle resources defined in Python code. This function is called by Databricks CLI during
+    bundle deployment. After deployment, this function is not used.
+    """
+
+    # the default implementation loads all Python files in 'resources' directory
+    return load_resources_from_current_package_module()
@@ -0,0 +1,24 @@
+from databricks.bundles.jobs import (
+    Job,
+    Task,
+    SqlTask,
+    SqlTaskFile,
+    JobParameterDefinition,
+)
+
+run_daily_sql = Task(
+    task_key="run_daily_sql",
+    sql_task=SqlTask(
+        warehouse_id="<your_warehouse_id>",
+        file=SqlTaskFile(path="src/my_query.sql"),
+        parameters={"run_date": "{{job.parameters.run_date}}"},
+    ),
+)
+
+job = Job(
+    name="sql_backfill_example",
+    tasks=[run_daily_sql],
+    parameters=[
+        JobParameterDefinition(name="run_date", default="2024-01-01"),
+    ],
+)
@@ -0,0 +1,5 @@
+-- referenced by sql_task
+INSERT INTO catalog.schema.target_table
+SELECT *
+FROM catalog.schema.source_table
+WHERE event_date = date(:run_date);
@@ -0,0 +1,68 @@
+# job_conditional_execution
+
+This example demonstrates a Lakeflow Job that uses conditional task execution based on data quality checks.
+
+The Lakeflow Job consists of following tasks:
+1. Checks data quality and calculates bad records
+2. Evaluates if bad records exceed a threshold (100 records)
+3. Routes to different processing paths based on the condition:
+   - If bad records > 100: runs `fix_path` task
+   - If bad records ≤ 100: runs `skip_path` task
+
+* `src/`: Notebook source code for this project.
+  * `src/check_quality.py`: Checks data quality and outputs bad record count
+  * `src/fix_path.py`: Handles cases with high bad record count
+  * `src/skip_path.py`: The skip path
+* `resources/`:  Resource configurations (jobs, pipelines, etc.)
+  * `resources/conditional_execution.py`: job definition with conditional tasks
+
+## Documentation
+
+For more information about conditional task execution, see:
+- [Add branching logic to a job with the If/else task](https://docs.databricks.com/aws/en/jobs/if-else)
+
+## Getting started
+
+Choose how you want to work on this project:
+
+(a) Directly in your Databricks workspace, see
+    https://docs.databricks.com/dev-tools/bundles/workspace.
+
+(b) Locally with an IDE like Cursor or VS Code, see
+    https://docs.databricks.com/vscode-ext.
+
+(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
+
+If you're developing with an IDE, dependencies for this project should be installed using uv:
+
+*  Make sure you have the UV package manager installed.
+   It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
+*  Run `uv sync --dev` to install the project's dependencies.
+
+
+# Using this project using the CLI
+
+The Databricks workspace and IDE extensions provide a graphical interface for working
+with this project. It's also possible to interact with it directly using the CLI:
+
+1. Authenticate to your Databricks workspace, if you have not done so already:
+    ```
+    $ databricks configure
+    ```
+
+2. To deploy a development copy of this project, type:
+    ```
+    $ databricks bundle deploy --target dev
+    ```
+    (Note that "dev" is the default target, so the `--target` parameter
+    is optional here.)
+
+    This deploys everything that's defined for this project.
+    For example, this project will deploy a job called
+    `[dev yourname] conditional_execution_example` to your workspace.
+    You can find that resource by opening your workspace and clicking on **Jobs & Pipelines**.
+
+3. To run the job, use the "run" command:
+   ```
+   $ databricks bundle run conditional_execution_example
+   ```
@@ -0,0 +1,19 @@
+# This is a Databricks asset bundle definition for job conditional execution.
+# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
+bundle:
+  name: job_conditional_execution
+
+python:
+  venv_path: .venv
+  # Functions called to load resources defined in Python. See resources/__init__.py
+  resources:
+    - "resources:load_resources"
+
+include:
+  - resources/*.yml
+  - resources/*/*.yml
+
+targets:
+  dev:
+    mode: development
+    default: true
@@ -0,0 +1,25 @@
+[project]
+name = "job_conditional_execution"
+version = "0.0.1"
+authors = [{ name = "Databricks Field Engineering" }]
+requires-python = ">=3.10,<=3.13"
+dependencies = [
+    # Any dependencies for jobs and pipelines in this project can be added here
+    # See also https://docs.databricks.com/dev-tools/bundles/library-dependencies
+    #
+    # LIMITATION: for pipelines, dependencies are cached during development;
+    # add dependencies to the 'environment' section of pipeline.yml file instead
+]
+
+[dependency-groups]
+dev = [
+    "pytest",
+    "databricks-bundles==0.275.0",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.black]
+line-length = 125
@@ -0,0 +1,16 @@
+from databricks.bundles.core import (
+    Bundle,
+    Resources,
+    load_resources_from_current_package_module,
+)
+
+
+def load_resources(bundle: Bundle) -> Resources:
+    """
+    'load_resources' function is referenced in databricks.yml and is responsible for loading
+    bundle resources defined in Python code. This function is called by Databricks CLI during
+    bundle deployment. After deployment, this function is not used.
+    """
+
+    # the default implementation loads all Python files in 'resources' directory
+    return load_resources_from_current_package_module()