Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ endif()

FetchContent_Declare(miniexpr
GIT_REPOSITORY https://github.com/Blosc/miniexpr.git
GIT_TAG 1bd8d0cfe92b63ad463cd28783e824b5e64afea8
GIT_TAG 92a5a222b034b148f29d8ae2d02c53f444afd36d
# SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../miniexpr
)
FetchContent_MakeAvailable(miniexpr)
Expand Down
2 changes: 2 additions & 0 deletions doc/getting_started/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ and tools in the Python ecosystem, including:
* Excellent integration with Numba and Cython via
`User Defined
Functions <https://www.blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf.html>`_.
* DSL kernels for miniexpr-backed UDF authoring and validation (see
`this tutorial <https://www.blosc.org/python-blosc2/getting_started/tutorials/03.lazyarray-udf-kernels.html>`_).
* By making use of the simple and open
`C-Blosc2 format <https://github.com/Blosc/c-blosc2/blob/main/README_FORMAT.rst>`_
for storing compressed data, Python-Blosc2 facilitates seamless integration with many other
Expand Down
1 change: 1 addition & 0 deletions doc/getting_started/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Tutorials
tutorials/01.ndarray-basics
tutorials/02.lazyarray-expressions
tutorials/03.lazyarray-udf
tutorials/03.lazyarray-udf-kernels
tutorials/04.reductions
tutorials/05.persistent-reductions
tutorials/06.remote_proxy
Expand Down
265 changes: 265 additions & 0 deletions doc/getting_started/tutorials/03.lazyarray-udf-kernels.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c87d8acac9166018",
"metadata": {},
"source": [
"# LazyArray UDF DSL Kernels\n",
"\n",
"`@blosc2.dsl_kernel` lets you write kernels with Python function syntax while executing through the miniexpr DSL path.\n",
"\n",
"Use DSL kernels when you want:\n",
"\n",
"- A vectorized UDF model (operate over NDArray chunks/blocks, not Python scalar loops)\n",
"- Optional JIT compilation via miniexpr backends (for example `tcc`/`cc`) without requiring Numba\n",
"- Early syntax validation and actionable diagnostics for unsupported constructs\n",
"\n",
"This tutorial complements `03.lazyarray-udf.ipynb` (generic Python UDFs).\n",
"\n",
"For the canonical DSL syntax contract, see the miniexpr docs: `doc/dsl-syntax.md`.\n"
]
},
{
"cell_type": "code",
"id": "4743791e5436aa04",
"metadata": {
"ExecuteTime": {
"end_time": "2026-02-16T05:46:57.649941Z",
"start_time": "2026-02-16T05:46:57.358347Z"
}
},
"source": [
"import numpy as np\n",
"\n",
"import blosc2"
],
"outputs": [],
"execution_count": 1
},
{
"cell_type": "markdown",
"id": "c400c3d7e37cda03",
"metadata": {},
"source": [
"## 1. Define a DSL Kernel\n",
"\n",
"A valid DSL kernel can be used with `blosc2.lazyudf(...)` like a regular UDF."
]
},
{
"cell_type": "code",
"id": "8926a0c21237fef3",
"metadata": {
"ExecuteTime": {
"end_time": "2026-02-16T05:46:57.677192Z",
"start_time": "2026-02-16T05:46:57.660322Z"
}
},
"source": [
"@blosc2.dsl_kernel\n",
"def kernel_index_ramp(x):\n",
" # _i* and _n* are reserved DSL index/shape symbols, so disable linter warnings\n",
" return x + _i0 * _n1 + _i1 # noqa: F821"
],
"outputs": [],
"execution_count": 2
},
{
"cell_type": "code",
"id": "fbe9cb59a4515c9c",
"metadata": {
"ExecuteTime": {
"end_time": "2026-02-16T05:46:57.700393Z",
"start_time": "2026-02-16T05:46:57.678344Z"
}
},
"source": [
"shape = (5, 10)\n",
"x = blosc2.ones(shape, dtype=np.float32)\n",
"expr = blosc2.lazyudf(kernel_index_ramp, (x,), dtype=np.float32)\n",
"res = expr[:]\n",
"res"
],
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],\n",
" [11., 12., 13., 14., 15., 16., 17., 18., 19., 20.],\n",
" [21., 22., 23., 24., 25., 26., 27., 28., 29., 30.],\n",
" [31., 32., 33., 34., 35., 36., 37., 38., 39., 40.],\n",
" [41., 42., 43., 44., 45., 46., 47., 48., 49., 50.]], dtype=float32)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 3
},
{
"cell_type": "code",
"id": "3bcf440eef3435f4",
"metadata": {
"ExecuteTime": {
"end_time": "2026-02-16T05:46:58.627822Z",
"start_time": "2026-02-16T05:46:58.610389Z"
}
},
"source": [
"# Optional: request miniexpr JIT backend for this DSL kernel\n",
"expr_jit = blosc2.lazyudf(\n",
" kernel_index_ramp,\n",
" (x,),\n",
" dtype=x.dtype,\n",
" jit=True,\n",
" jit_backend=\"tcc\",\n",
")\n",
"res_jit = expr_jit.compute()\n",
"res_jit[:2, :5]"
],
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 2., 3., 4., 5.],\n",
" [11., 12., 13., 14., 15.]], dtype=float32)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 4
},
{
"cell_type": "markdown",
"id": "2539c7b3c5c828e3",
"metadata": {},
"source": [
"## 2. Preflight Validation (`validate_dsl`)\n",
"\n",
"You can validate a kernel and inspect diagnostics without executing it."
]
},
{
"cell_type": "code",
"id": "e408f3ced12bb48e",
"metadata": {
"ExecuteTime": {
"end_time": "2026-02-16T05:46:58.791626Z",
"start_time": "2026-02-16T05:46:58.683662Z"
}
},
"source": [
"report_ok = blosc2.validate_dsl(kernel_index_ramp)\n",
"report_ok"
],
"outputs": [
{
"data": {
"text/plain": [
"{'valid': True,\n",
" 'dsl_source': 'def kernel_index_ramp(x):\\n # _i* and _n* are reserved DSL index/shape symbols, so disable linter warnings\\n return x + _i0 * _n1 + _i1 # noqa: F821',\n",
" 'input_names': ['x'],\n",
" 'error': None}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": 5
},
{
"cell_type": "markdown",
"id": "f62d5a74a417eb12",
"metadata": {},
"source": [
"## 3. Invalid Syntax Example\n",
"\n",
"Python ternary expressions are not part of the DSL subset.\n",
"`validate_dsl` reports the issue, and `lazyudf(...)` raises early with a detailed message."
]
},
{
"cell_type": "code",
"id": "2cfb6d28ee3cf2d8",
"metadata": {
"ExecuteTime": {
"end_time": "2026-02-16T05:46:58.840100Z",
"start_time": "2026-02-16T05:46:58.818859Z"
}
},
"source": [
"@blosc2.dsl_kernel\n",
"def kernel_invalid_ternary(x):\n",
" return 1 if x else 0\n",
"\n",
"\n",
"report_bad = blosc2.validate_dsl(kernel_invalid_ternary)\n",
"print(report_bad[\"valid\"])\n",
"print(report_bad[\"error\"])"
],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False\n",
"Ternary expressions are not supported in DSL; use where(cond, a, b) at line 2, column 14\n",
"\n",
"DSL kernel source:\n",
"1 | def kernel_invalid_ternary(x):\n",
"2 | return 1 if x else 0\n",
" | ^\n",
"\n",
"See: https://github.com/Blosc/miniexpr/blob/main/doc/dsl-usage.md\n"
]
}
],
"execution_count": 6
},
{
"cell_type": "markdown",
"id": "d8c345f8091b1078",
"metadata": {},
"source": [
"## 4. Advanced Example: Mandelbrot DSL\n",
"\n",
"For a more advanced real-world DSL kernel, see:\n",
"\n",
"- `examples/ndarray/mandelbrot-dsl.ipynb`\n",
"\n",
"GitHub link:\n",
"\n",
"- https://github.com/Blosc/python-blosc2/blob/main/examples/ndarray/mandelbrot-dsl.ipynb"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
14 changes: 14 additions & 0 deletions doc/reference/lazyarray.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ You can get an object following the LazyArray API with any of the following ways
* Any expression that involves one or more NDArray objects. e.g. ``a + b``, where ``a`` and ``b`` are NDArray objects (see `this tutorial <../getting_started/tutorials/03.lazyarray-expressions.html>`_).
* Using the ``lazyexpr`` constructor.
* Using the ``lazyudf`` constructor (see `a tutorial <../getting_started/tutorials/03.lazyarray-udf.html>`_).
* Using ``@dsl_kernel`` and ``lazyudf`` for miniexpr-backed DSL kernels (see `this tutorial <../getting_started/tutorials/03.lazyarray-udf-kernels.html>`_).

The LazyArray object is a thin wrapper around the expression or user-defined function that allows for lazy computation. This means that the expression is not computed until the ``compute`` or ``__getitem__`` methods are called. The ``compute`` method will return a new NDArray object with the result of the expression evaluation. The ``__getitem__`` method will return an NumPy object instead.

Expand Down Expand Up @@ -53,3 +54,16 @@ For getting a LazyUDF object (which is LazyArray-compliant) from a user-defined
This object follows the `LazyArray`_ API for computation, although storage is not supported yet.

.. autofunction:: lazyudf

.. _DSLKernelReference:

DSL Kernels
-----------

For miniexpr-backed kernels, see `the dedicated tutorial <../getting_started/tutorials/03.lazyarray-udf-kernels.html>`_.

.. autofunction:: dsl_kernel

.. autofunction:: validate_dsl

.. autoclass:: DSLSyntaxError
Loading
Loading