feat(ops): add LinearBitNet — ternary weight GEMV with zero-skip#477
Open
eriirfos-eng wants to merge 1 commit into
Open
feat(ops): add LinearBitNet — ternary weight GEMV with zero-skip#477eriirfos-eng wants to merge 1 commit into
eriirfos-eng wants to merge 1 commit into
Conversation
Adds LinearBitNet alongside the existing Linear struct in ops.rs.
Weights are stored as i8 in {-1, 0, +1} and quantized from f32 at load
time using an absolute threshold. The forward pass skips any multiply-
accumulate where the weight is zero — exact, not approximate. At typical
ternary sparsity levels (50-70% zeros in BitNet b1.58 and similar schemes)
this cuts active MACs by roughly half with no loss in output fidelity.
- from_f32(): quantize an f32 matrix at a given threshold
- forward(): sparse GEMV, zero-weight skipping in inner loop
- sparsity(): reports fraction of zero weights (useful for benchmarking)
Three tests added alongside the existing ops tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this adds
A
LinearBitNetstruct incrates/ruvector-sparse-inference/src/ops.rs, alongside the existingLinear.The difference: weights are stored as
i8in{−1, 0, +1}rather thanf32. The forward pass skips every multiply-accumulate where the weight is zero — no approximation, no special hardware. At the sparsity levels typical of BitNet b1.58 and similar ternary-quantized models (50–70% zeros), this halves the active MACs in the linear layer.Why it belongs here
ruvector-sparse-inferencealready has aSparseFfnthat exploits activation sparsity (skipping neurons that activate to zero).LinearBitNetis the complementary case: weight sparsity, where the zeros are baked into the model at quantization time rather than determined at runtime.API
thresholdis the absolute value below which a weight becomes 0. The BitNet b1.58 paper uses the mean absolute value of the weight matrix as the threshold, which works well in practice.Changes
crates/ruvector-sparse-inference/src/ops.rs:LinearBitNetstruct withfrom_f32,forward,sparsityWhat it does not change
Existing
Linear,Embedding,RMSNorm,LayerNormand all existing tests are untouched. All 91 existing tests pass.