-
Notifications
You must be signed in to change notification settings - Fork 151
cpubits: initial crate #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
916240a to
1e64cdf
Compare
|
@tarcieri |
|
Yes |
As discussed in #824, adds a crate with the intent of it providing heuristics for selecting the optimal word size to use for a particular target CPU, which may differ from its address size. It's implemented as `macro_rules` to avoid a build script. Currently there's no ability to override it, though we could consider adding something like `cfg(cpubits = "64")`.
|
Alright, finally got around to this after so many years. I decided against going the build script route, at least for now, and implemented it entirely in terms of It has support for deciding between 16-bit, 32-bit, and 64-bit word sizes, with the ability to group 16-bit and 32-bit together, which is how e.g. It has support for specifying a |
| #[enable_64bit( | ||
| // `cfg` selector for 64-bit targets (implicitly `any`) | ||
| target_family = "wasm", | ||
| )] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the "heuristic" part. This just includes the WASM target family for now, but I know we've had others requested in the past (e.g. ARMv7). See also: RustCrypto/crypto-bigint#973
Ideally I think we'd use a benchmark-driven approach to decide which targets go here. The somewhat annoying part is adding anything here is effectively a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrose-signal random ping but long ago I think you had suggested trying 64-bit codegen on ARMv7. Am I remembering correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we've been using 64-bit codegen for 32-bit Android for years, it performed better for curve25519-dalek on…well, at least one phone, I'm sure it's configuration-dependent in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened a PR here, and hopefully gave a decent enough rationale. Thanks for confirming!
ARMv7 is one of the main architectures for which we've received requests for 64-bit overrides in the past (see discussion on #826). Though natively 32-bit, ARMv7 supports certain "doubleword" instructions which model 64-bit values as a pair of 32-bit registers, e.g. `ADDS`/`ADC` and `SUBS`/`SBC` for 2x32-bit addition/subtraction, as well as `UMULL`/`SMULL` for widening multiplication with 64-bit outputs. Many ARMv7 CPUs internally fetch 64-bits of instruction at once and can move 64-bits of data via `LDRD`/`STRD` in one cycle on optimized paths. Some high-performance ARMv7 CPUs internally combine the barrel shifter + ALU to speed multi-word shifts. If we use 64-bit implementations when targeting ARMv7, codegen is able to leverage these optimizations.
ARMv7 is one of the main architectures for which we've received requests for 64-bit overrides in the past (see discussion on #826). Though natively 32-bit, ARMv7 supports certain "doubleword" instructions which model 64-bit values as a pair of 32-bit registers, e.g. `ADDS`/`ADC` and `SUBS`/`SBC` for 2x32-bit addition/subtraction, as well as `UMULL`/`SMULL` for widening multiplication with 2x32-bit outputs. Many ARMv7 CPUs internally fetch 64-bits of instruction at once and can move 64-bits of data via `LDRD`/`STRD` in one cycle on optimized paths. Some high-performance ARMv7 CPUs internally combine the barrel shifter + ALU to speed multi-word shifts. If we use 64-bit implementations when targeting ARMv7, codegen is able to leverage these optimizations.
As discussed in #824, adds a crate with the intent of it providing heuristics for selecting whether 32-bit or 64-bit backends have optimal codegen for a given target, with optional overrides.
The intended use of this crate is in
build-dependencies, where it can emit acfgattribute (e.g.--cfg cpubits="64") if one hasn't been explicitly specified already, and all gating on 32-bit vs 64-bit backends can simply use thecfgattribute.