-
Notifications
You must be signed in to change notification settings - Fork 275
Speed up Constant @ MatrixExpr
#1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Added numpy to build requirements in pyproject.toml and setup.py, ensuring numpy headers are included during compilation. Refactored matrix.pxi to improve matrix operation support, including custom __array_ufunc__ handling for dot/matmul, utility functions for type checking and array conversion, and a vectorized _core_dot implementation for efficient matrix multiplication.
Introduces a new parameterized test for matrix dot product performance and updates an assertion in test_matrix_matmul_return_type to expect Expr instead of MatrixExpr for 1D @ 1D operations.
Updated type checks throughout expr.pxi to use np.ndarray instead of MatrixExpr, improving compatibility with numpy arrays. Also adjusted matrix.pxi to ensure ufunc results are returned as MatrixExpr views when appropriate.
Deleted unnecessary conversion of the 'out' keyword argument to an array in MatrixExpr.__array_ufunc__, as it is not required for correct operation.
Deleted the overridden __matmul__ method in MatrixExpr, reverting to the default numpy ndarray behavior for matrix multiplication. This simplifies the class and avoids unnecessary type casting.
Joao-Dionisio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things I need clarification on.
Also, changes should be, as much as possible, minimal, in the sense that they should only fix the one issue in the PR, otherwise it gets hard to follow. Not saying you're not doing this, just that there is a bunch of code I don't understand :)
| elif _is_number(other): | ||
| return ExprCons(self, rhs=float(other)) | ||
| elif isinstance(other, MatrixExpr): | ||
| elif isinstance(other, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the type need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matrix.pyi::_ensure_array will unbox any inputs from a subclass of np.ndarray to np.ndarray. This is to prevent an endless loop. Call ufunc → the input is a MatrixExpr → has __array_ufunc__ → the input is a MatrixExpr → ...
Unboxing MatrixExpr to np.ndarray will break this.
So the input for a lower-level Expr operation is np.ndarray
|
|
||
| cimport numpy as cnp | ||
|
|
||
| cnp.import_array() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cimport numpy to launch cnp.ndarray ctype.
For cdef cnp.ndarray[object, ndim=2] res = np.zeros((m, k), dtype=object)
| class MatrixExpr(np.ndarray): | ||
|
|
||
| def __array_ufunc__(self, ufunc, method, *args, **kwargs): | ||
| if method == "__call__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please document what this does? it's very cryptic for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy has a sort of methods to call a ufunc like "reduce" (np.add.reduce), "accumulate" (np.cumsum), "reduceat", "outer", or "at".
__call__ means to use the ufunc directly.
Introduces a new _core_dot function to handle matrix multiplication between constant arrays and arrays of Expr objects, supporting both 1-D and N-D cases. The original _core_dot is renamed to _core_dot_2d and improved for clarity and efficiency. Updates __array_ufunc__ to use the new logic, ensuring correct handling of mixed-type matrix operations.
Adjusted tests in test_matrix_matmul_return_type to check the return type when performing matrix multiplication with numpy arrays and matrix variables, including new cases for ND arrays. Ensures correct type inference for various matrix multiplication scenarios.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1159 +/- ##
==========================================
+ Coverage 55.07% 55.28% +0.21%
==========================================
Files 24 24
Lines 5420 5484 +64
==========================================
+ Hits 2985 3032 +47
- Misses 2435 2452 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Replaces the _is_num_dt helper function with direct dtype.kind checks for numeric types in _core_dot. This simplifies the code and removes an unnecessary inline function.
| @np.vectorize(otypes=[object], signature="(m,n),(n,p)->(m,p)") | ||
| def _core_dot_2d(cnp.ndarray a, cnp.ndarray x) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.vectorize(signature="(m,n),(n,p)->(m,p)") force 2d inputs.
And it's an iteration helper for N-D.
I tried a custom Cython loop. But the performance is equal to np.vectorize.
cdef cnp.ndarray _core_dot_nd(cnp.ndarray a, cnp.ndarray b):
a_2d = a[None, :] if a.ndim == 1 else a
b_2d = b[:, None] if b.ndim == 1 else b
cdef tuple shape = np.broadcast_shapes(a_2d.shape[:-2], b_2d.shape[:-2])
cdef int m = a_2d.shape[-2]
cdef int k = a_2d.shape[-1]
cdef int p = b_2d.shape[-1]
cdef int batch_size = 1
cdef int i
for i in shape: batch_size *= i
cdef cnp.ndarray[object, ndim=1] res_batches = np.empty(batch_size, dtype=object)
cdef cnp.ndarray a_flat = np.broadcast_to(a_2d, shape + (m, k)).reshape(-1, m, k)
cdef cnp.ndarray b_flat = np.broadcast_to(b_2d, shape + (k, p)).reshape(-1, k, p)
for i in range(batch_size): res_batches[i] = _core_dot_2d(a_flat[i], b_flat[i])
cdef cnp.ndarray res = np.stack(res_batches).reshape(shape + (m, p))
if a.ndim == 1: res = res.squeeze(-2)
if b.ndim == 1: res = res.squeeze(-1)
return res
cdef cnp.ndarray _core_dot_2d(cnp.ndarray a, cnp.ndarray x):
if not a.flags.c_contiguous or a.dtype != np.float64:
a = np.ascontiguousarray(a, dtype=np.float64)
cdef const double[:, :] a_2d = a
cdef int m = a.shape[0], n = a.shape[1]
cdef int k = x.shape[1] if x.ndim > 1 else 1
cdef cnp.ndarray x_2d = x.reshape(n, k)
cdef cnp.ndarray[object, ndim=2] res = np.zeros((m, k), dtype=object)
cdef Py_ssize_t[:] nz_idx = np.empty(n, dtype=np.intp)
cdef double[:] nz_val = np.empty(n, dtype=np.float64)
cdef int i, j, l, nz_count
cdef list exprs
for i in range(m):
nz_count = 0
for l in range(n):
if a_2d[i, l] != 0:
nz_idx[nz_count] = l
nz_val[nz_count] = a_2d[i, l]
nz_count += 1
if nz_count == 0:
continue
for j in range(k):
if nz_count == 1:
res[i, j] = nz_val[0] * x_2d[nz_idx[0], j]
else:
exprs = []
for l in range(nz_count):
exprs.append(nz_val[l] * x_2d[nz_idx[l], j])
res[i, j] = quicksum(exprs)
return res
closes to #1156