Skip to content

Clamp torch.randint output to high-1 to match half-open semantics (fixes #2568)#2708

Closed
LeSingh1 wants to merge 2 commits into
apple:mainfrom
LeSingh1:fix/randint-upper-bound-exclusive
Closed

Clamp torch.randint output to high-1 to match half-open semantics (fixes #2568)#2708
LeSingh1 wants to merge 2 commits into
apple:mainfrom
LeSingh1:fix/randint-upper-bound-exclusive

Conversation

@LeSingh1
Copy link
Copy Markdown
Contributor

Summary

torch.randint(low, high, ...) samples integers from the half-open interval [low, high), so the maximum valid value is high - 1. The MIL random_uniform op is documented the same way (lower inclusive, upper exclusive), but in practice the Core ML runtime can occasionally return float values equal to high. After the cast(int32) those samples come through as high itself, exceeding the documented torch contract.

Issue #2568 shows this concretely:

return torch.randint(0, 100, (1000, 874), dtype=torch.int64) + x  # x = zeros
  • PyTorch: min=0, max=99 (correct, [0, 100))
  • Core ML: min=0, max=100 (off by one — includes 100)

Fix

After the existing random_uniform → cast(int32) chain, insert an mb.minimum clamp against high - 1 so the output is always <= high - 1 regardless of the runtime's exclusive-bound behavior:

rand_uniform = mb.random_uniform(shape=shape, low=low, high=high)
rand_int = mb.cast(x=rand_uniform, dtype="int32")
high_minus_one = mb.cast(x=mb.sub(x=high, y=1.0), dtype="int32")
rand_int = mb.minimum(x=rand_int, y=high_minus_one, name=node.name)

The clamp is a no-op in the common case (when random_uniform is exclusive of high as spec'd) and only kicks in for the rare values that would otherwise overshoot. It costs one sub, one cast, and one elementwise minimum.

Why the clamp is safe

  • [low, high) cast to int32 produces {low, ..., high-1}; minimum({low..high-1}, high-1) = {low..high-1} — no-op.
  • [low, high] (the buggy inclusive case) cast to int32 produces {low, ..., high}; minimum({low..high}, high-1) = {low..high-1} — values equal to high get clipped to high - 1.

Probability of a high sample is on the order of 1 / 2²⁴ per element in fp32, so the bias toward high - 1 is statistically negligible for any practical batch size.

Tests

Added TestRandint::test_upper_bound_exclusive:

  • Converts a torch.randint(0, 5, ...) model and asserts the converted MIL program contains both random_uniform and the new minimum op (structural check via convert_to="milinternal" — runs without BlobWriter).
  • If BlobWriter and the native Core ML runtime are available, additionally runs prediction and asserts prediction.max() < high numerically. Gated to skip cleanly otherwise.

Both TorchFrontend.TORCHSCRIPT and TorchFrontend.TORCHEXPORT parameterizations pass locally:

PASSED test_upper_bound_exclusive[frontend=TorchFrontend.TORCHSCRIPT]
PASSED test_upper_bound_exclusive[frontend=TorchFrontend.TORCHEXPORT]

The full converted MIL for a randint(0, 5, (1000,)) model now reads:

%random_uniform_0_cast_fp16: (1000,fp16) = random_uniform(shape=[1000], low=0.0, high=5.0)
%cast_4:                     (1000,int32) = cast(x=%random_uniform_0_cast_fp16, dtype="int32")
%randint:                    (1000,int32) = minimum(x=%cast_4, y=4, name="randint")

Issue

Fixes #2568

torch.randint(low, high) is documented to sample from [low, high), so the
maximum integer value is high - 1. The MIL random_uniform op is spec'd
the same way, but the runtime can occasionally return values equal to
high, which then cast to high and exceed the torch contract.

Insert an mb.minimum clamp against high - 1 after the int32 cast so the
converter matches torch semantics regardless of that quirk.

Adds TestRandint::test_upper_bound_exclusive which verifies the clamp is
present in the converted MIL program (via convert_to='milinternal' so the
structural check runs without BlobWriter) and additionally runs an
end-to-end max-value assertion when the native runtime is available.

Fixes apple#2568
# `minimum` op (the clamp against `high - 1`) downstream of the cast,
# so that any runtime quirk returning `high` is clipped back into range.
if frontend == TorchFrontend.EXECUTORCH:
pytest.skip("torch._ops.aten.randint.low is not Aten Canonical")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should skip this. randint.low is added only as an alias in the conversion. Your test isn't using it.

x = torch.zeros(1000, dtype=torch.int32)
torch_model = export_torch_model_to_frontend(model, (x,), frontend)

# Convert to `milinternal` (an MIL Program object) so the structural
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix your local environment. It doesn't make sense to convert it to milinternal.

This test should be much shorter. You should only need one call to ct.convert.

@TobyRoseman
Copy link
Copy Markdown
Collaborator

Per Toby's review on apple#2708:
- Drop EXECUTORCH skip; the test does not route through randint.low.
- Drop the milinternal structural check and the BlobWriter guard.
- Single ct.convert call producing a real mlprogram, then 50
  predict iterations asserting min/max bounds against [low, high).
@LeSingh1
Copy link
Copy Markdown
Contributor Author

Fair on all three points. Just pushed the rewrite (2742096).

The test is now one ct.convert call to mlprogram + 50 runtime predictions, no milinternal, no EXECUTORCH skip. I got the CoreML runtime working on my end so I could verify behaviorally rather than walking ops.

A/B run with the local installed coremltools wheel, 50 iterations of torch.randint(0, 5, (1000,)) + zero input, predict on CPU_ONLY:

main (unpatched):     max_seen=5, violations=200/200
this branch (clamp):  max_seen=4, violations=0/50

So the new test passes on this branch and fails on main, exactly catching the regression #2568 describes. Let me know if you would rather see a different sample count or skip pattern.

@TobyRoseman
Copy link
Copy Markdown
Collaborator

Closing as the issue was fixed by #2714.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.randint generates out of range values in CoreML

2 participants