Skip to content

Remove the modulo operations in spsc#652

Open
sgued wants to merge 1 commit intorust-embedded:mainfrom
sgued:rem-perf
Open

Remove the modulo operations in spsc#652
sgued wants to merge 1 commit intorust-embedded:mainfrom
sgued:rem-perf

Conversation

@sgued
Copy link
Copy Markdown
Contributor

@sgued sgued commented Apr 2, 2026

These modulo operations used to be well optimized when N was a power of 2 However, the consumer and producer now use view-types that make N runtime dependant, preventing the compiler from optimizing these modulo operations even when N is always a power of 2.

This patch leverages the fact that head and tail are always kept lower than N to replace the modulo operations with a simple if, which gets optimized pretty well by the compiler and no branch is left.

Closes #650

@sgued
Copy link
Copy Markdown
Contributor Author

sgued commented Apr 2, 2026

I wish we had some benchmarking infra in place. This would help me make sure this actually solves the problem and would have helped detect it in the first place.

@sgued
Copy link
Copy Markdown
Contributor Author

sgued commented Apr 2, 2026

Clippy lint fixes are already included in #644

@sgued
Copy link
Copy Markdown
Contributor Author

sgued commented Apr 2, 2026

@823984418 does this fix your performance issues?

These modulo operations used to be well optimized when N was a power of 2
However, the consumer and producer now use view-types that make N runtime
dependant, preventing the compiler from optimizing these modulo operations
even when N is always a power of 2.

This patch leverages the fact that `head` and `tail` are always kept lower
than N to replace the modulo operations with a simple if, which gets optimized
pretty well by the compiler and no branch is left.

Closes rust-embedded#650
let head = self.rb.head.load(Ordering::Relaxed);

let i = (head + self.index) % self.rb.n();
let i = head + self.index;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this (theoretically) overflow? If n is larger than half of the usize::MAX and the queue already wrapped around so head is close to the end of the underlying array, and index is large?

(Very unlikely in practice, and I only had this idea because I wondered why QueueInner::len uses wrapping_add/wrapping_sub).

Copy link
Copy Markdown
Contributor Author

@sgued sgued Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

This would be a bug even with the original implementation if n is not a divisor of usize::MAX.

I don't think this is the same as for len. in len we're doing wrapping operations because we "know" it's going to go negative (thus wrap) since current_head > current_tail. We could in theory remove the wrapping operation by changing the order (but then it's sensible to the same bug, where it could in theory overflow if N is too close to usize::MAX).

I'll take a look at fixing this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could in theory remove the wrapping operation by changing the order (but then it's sensible to the same bug, where it could in theory overflow if N is too close to usize::MAX)

Yes, sorry, my comment about len was a bit short. Of course, as it's written currently, the wrapping_sub/add calls are required. I was thinking about changing that by changing the order. But then:
a) as you noticed, it would have the same potential overflow issue
b) wrapping operations might be more efficient, because they don't require any overflow checks (if they are enabled, e.g. in debug mode or by setting overflow-checks = true in [profile.release])

So I decided to not suggest removing the wrapping operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The %(rem) operation seems to cause unexpected overhead.

2 participants