fix(python): cast uint64 row IDs to int64 before torch.from_numpy#6491
Open
fightBoxing wants to merge 2 commits intolance-format:mainfrom
Open
fix(python): cast uint64 row IDs to int64 before torch.from_numpy#6491fightBoxing wants to merge 2 commits intolance-format:mainfrom
fightBoxing wants to merge 2 commits intolance-format:mainfrom
Conversation
PyTorch versions before 2.1 do not support torch.from_numpy with numpy.uint64 arrays, raising TypeError. The existing code tried to handle this by checking tensor.dtype after conversion, but the conversion itself fails before reaching that check. Move the uint64 -> int64 cast into numpy space (before torch.from_numpy) so it works on all supported PyTorch versions. Closes lance-format#2803
Add README_zh_CN.md with full Chinese translation and add English/Chinese language switch links to both README files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix
TypeError: can't convert np.ndarray of type numpy.uint64when usingLanceDatasetwithwith_row_id=True.Closes #2803
Root Cause
The
_rowidcolumn from Lance scanner ispa.uint64(). In_to_tensor(), the code converts Arrow arrays to PyTorch tensors via:On PyTorch versions that don't support
torch.uint64(< 2.1),torch.from_numpy()raisesTypeErrorbefore reaching the cast logic.Fix
Cast
uint64toint64in numpy space before callingtorch.from_numpy():This works on all supported PyTorch versions (>= 2.0).
Testing
Added
test_row_id_uint64_converts_to_int64to verify_rowidcolumn converts totorch.int64whenwith_row_id=True.