Skip to content

fix(python): cast uint64 row IDs to int64 before torch.from_numpy#6491

Open
fightBoxing wants to merge 2 commits intolance-format:mainfrom
fightBoxing:fix-torch-uint64-rowid
Open

fix(python): cast uint64 row IDs to int64 before torch.from_numpy#6491
fightBoxing wants to merge 2 commits intolance-format:mainfrom
fightBoxing:fix-torch-uint64-rowid

Conversation

@fightBoxing
Copy link
Copy Markdown

Summary

Fix TypeError: can't convert np.ndarray of type numpy.uint64 when using LanceDataset with with_row_id=True.

Closes #2803

Root Cause

The _rowid column from Lance scanner is pa.uint64(). In _to_tensor(), the code converts Arrow arrays to PyTorch tensors via:

tensor = torch.from_numpy(arr.to_numpy(zero_copy_only=False))
if uint64_as_int64 and tensor.dtype == torch.uint64:
    tensor = tensor.to(torch.int64)

On PyTorch versions that don't support torch.uint64 (< 2.1), torch.from_numpy() raises TypeError before reaching the cast logic.

Fix

Cast uint64 to int64 in numpy space before calling torch.from_numpy():

nparr = arr.to_numpy(zero_copy_only=False)
if uint64_as_int64 and nparr.dtype == np.uint64:
    nparr = nparr.astype(np.int64)
tensor = torch.from_numpy(nparr)

This works on all supported PyTorch versions (>= 2.0).

Testing

Added test_row_id_uint64_converts_to_int64 to verify _rowid column converts to torch.int64 when with_row_id=True.

PyTorch versions before 2.1 do not support torch.from_numpy with
numpy.uint64 arrays, raising TypeError. The existing code tried to
handle this by checking tensor.dtype after conversion, but the
conversion itself fails before reaching that check.

Move the uint64 -> int64 cast into numpy space (before
torch.from_numpy) so it works on all supported PyTorch versions.

Closes lance-format#2803
@github-actions github-actions bot added bug Something isn't working python labels Apr 13, 2026
Add README_zh_CN.md with full Chinese translation and add
English/Chinese language switch links to both README files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lance.torch.data.LanceDataset row ids broken on main

1 participant