Pass irep_idt by value (C++ Core Guidelines F.16) + CI checker#9014
Open
tautschnig wants to merge 24 commits into
Open
Pass irep_idt by value (C++ Core Guidelines F.16) + CI checker#9014tautschnig wants to merge 24 commits into
tautschnig wants to merge 24 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #9014 +/- ##
========================================
Coverage 80.63% 80.64%
========================================
Files 1713 1713
Lines 189403 189478 +75
Branches 73 73
========================================
+ Hits 152732 152796 +64
- Misses 36671 36682 +11 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Collaborator
|
Does this actually yield a speed-up? |
61d5136 to
8d5bf63
Compare
Collaborator
Author
Now confirmed by measurement (and this is now part of the changes): there is no speed-up, so we have several options:
|
After the F.16 cleanup, several functions now take irep_idt by value
instead of by const reference. gcc-14's -Wdangling-reference heuristic
then warns at call sites where the result is bound to a const
reference, because it can't see that those functions return a
reference into long-lived storage (a static array, the symbol table,
or a parent expression in the namespace) regardless of the by-value
arguments.
Bind the result to a value (or, where appropriate, restructure the
expression to avoid a named reference binding altogether):
- get_sort_and_join_table_entry: returns into a static table.
saj_tablet is 11 irep_idts (44 bytes); the copy is cheap.
- get_mode_from_identifier: returns into a symbol or to ID_unknown.
- get_field_init_expr / get_field_init_type: return into the
shadow_memory state. The intermediate binding in
get_field_init_type is removed entirely by chaining.
- dfcc_utilst::get_function_symbol: returns into the symbol table.
- get_contract: returns into the symbol table; the range-based
`for(auto &target : get_contract(...).c_assigns())` is split
into a named contract value to keep the iterator's referent alive
(and silence the heuristic).
- to_history_expr(...).expression(): returns into the parent expr.
- require_goto_statements::require_declaration_of_name and
get_unique_non_null_expression_assigned_to_symbol (jbmc unit
helpers): return into the entry-point instructions vector.
The value-binding form was chosen over the alternatives -- a localised
`#pragma GCC diagnostic ignored "-Wdangling-reference"`, a project-wide
`-Wno-dangling-reference`, or a `[[gnu::no_dangling]]` attribute on the
returning functions -- to keep the change source-local and free of
compiler-specific machinery. The cost is negligible: irept-based types
(symbolt, exprt, code_with_contract_typet, ...) share their internal
storage by reference count, so each "copy" introduced here is a small,
fixed handful of refcount bumps rather than a deep copy.
There is no semantic change.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Adds scripts/check_pass_by_value.cpp, a Clang LibTooling AST-matcher
program that flags function parameters of cheap-to-copy types declared
as 'const T &'. This is a violation of C++ Core Guidelines F.16
("for in parameters, pass cheaply-copied types by value and others by
reference to const").
The set of cheap types is configured in CHEAP_TYPES at the top of the
file. It currently lists dstringt (the canonical name for irep_idt,
which holds a single unsigned table index, so passing it by value is
strictly cheaper than passing by const reference).
Skips:
* parameters of copy/move constructors and copy/move assignment
operators (the language requires their signature)
* parameters of function-template instantiations
* member functions of class-template specialisations
* parameters whose source-as-written type is a (substituted)
template type parameter or otherwise dependent
Build with the same clang-18 flags used by check_irep_moves.cpp; the
resulting scripts/check-pass-by-value binary is .gitignored.
Add --fix mode to the F.16 pass-by-value checker
With --fix, the tool runs as a clang::tooling::RefactoringTool and
emits a Replacement for each finding that rewrites the parameter's
type from `const T &` to `T `, preserving the author's spelling
(`irep_idt` vs. `dstringt`). The replacement range spans from the
parameter declaration's BeginLoc (start of `const`) to the parameter
name's Location, which both covers the `const` keyword and absorbs
any whitespace between `&` and the name.
Limitations: only ParmVarDecls are touched. Function-pointer
typedefs, std::function template arguments, and friend declarations
that mention `const irep_idt &` are not rewritten and remain in the
baseline as a follow-up.
Improve --fix robustness: --path-filter, manual save, own-param check
Three fixes uncovered while applying the rewriter to src/util:
* Add --path-filter so we can run on the full TU set (needed to ensure
every header gets visited) but only modify files whose path contains
the given substring. Headers are sometimes only included from .cpps
outside their owning module.
* Apply replacements manually rather than via runAndSave: the latter
bails out without saving when any TU has a parse error, which we hit
for optional solver back-ends with missing third-party headers.
* Skip ParmVarDecls that are not in their enclosing FunctionDecl's
getParamDecl() list. These show up for parameter names inside
function-type template arguments such as
`std::function<void(const irep_idt &k)>`, and rewriting them would
desynchronise the in-class declaration from its out-of-class
definition.
F.16 checker: warn on non-fixable parameter sites
The earlier --fix mode was deliberately strict about which parameters
to rewrite, returning early on parameters that are not directly owned
by their enclosing FunctionDecl (e.g. parameter names inside a
function-type template argument like
\`std::function<void(const irep_idt &k)>\`). That early-return also
suppressed the warning, which means CI would not catch regressions
where someone added a new such site.
Split the logic so:
* The "is this fixable in place" decision (parameter is in FD's own
getParamDecl() list) only suppresses the Replacement, not the
warning.
* All other guards (system header, copy/move ctor or assignment,
function-template instantiation, member of class-template
specialisation) continue to suppress the warning entirely.
* The dependent-type filter now walks the type-sugar chain rather
than only the top-level type, so e.g. \`const value_type &\` --
where value_type is a typedef inside a class template instantiated
with a cheap key type -- is still recognised as template-derived
and skipped.
Also documents one remaining limitation: anonymous parameters in
non-instantiated typedefs of std::function (e.g.
\`using cb = std::function<void(const irep_idt &)>\` at namespace
scope) don't produce a ParmVarDecl in Clang's AST and so are not
caught. A future TypeLoc-based walker could close this gap.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Records the 2049 currently-known sites where parameters of cheap-to-copy
types (dstringt / irep_idt) are passed by const reference. These are
all real F.16 violations to be fixed incrementally; the baseline
mechanism allows CI to gate against introducing new ones without
requiring the wholesale fix in a single change.
Regenerate after a clean-up commit with:
scripts/run_pass_by_value_check.sh build --regenerate-baseline
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Adds scripts/run_pass_by_value_check.sh, which:
* (re)builds scripts/check-pass-by-value if needed (mirrors the build
recipe used for scripts/check-irep-moves on origin/irept-copies),
* runs the tool over src/ and jbmc/src/ in parallel via xargs, with
each worker writing to its own per-pid temp file to avoid stdout
interleaving across workers,
* normalises absolute paths in the output to repo-relative paths so
that the baseline is portable,
* diffs the result against scripts/pass_by_value_baseline.txt:
* new entries cause an exit code of 1 with a clear error message,
* removed entries are reported as a notice (no failure) with the
regenerate command,
* supports --regenerate-baseline as the second positional argument.
Wall-clock cost on a 36-core box is ~55 s.
CI: gate PRs on F.16 pass-by-value regressions
Adds a check-pass-by-value job to .github/workflows/syntax-checks.yaml
that builds scripts/check-pass-by-value and runs the runner script on
every pull request to develop. PRs introducing new F.16 violations
(parameters of cheap-to-copy types passed by const reference) will
fail this job; PRs that reduce the number of violations will pass and
emit a notice inviting the contributor to regenerate the baseline.
Mirrors the existing check-irep-copies job in shape and runtime budget.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Document, reproducibly, the performance question raised on the F.16 cleanup: does passing irep_idt by value rather than by const reference yield a speed-up? The benchmark calls two otherwise-identical noinline functions -- one taking irep_idt by value, one by const reference -- ~3e8 times over an array of real irep_idts and times each loop. noinline forces the calling convention to be honoured (by value passes the 4-byte index in a register; by const reference passes a pointer the callee dereferences) rather than inlining the difference away. Result (ubuntu-24.04, g++ -O2, 5 runs, stable): by-value: ~443 ms by-ref: ~443 ms (difference < 0.3 %, within run-to-run noise) So the change is performance-neutral at the calling-convention level; the justification is correctness (it exposed and fixed a latent dangling-reference bug) and C++ Core Guidelines F.16 alignment, not a measurable speed-up. The file header records the methodology and the build/run command. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Mechanical conversion of `const irep_idt &` parameters to pass-by-value via scripts/check_pass_by_value.cpp --fix. Aligns with C++ Core Guidelines F.16 (cheap-to-copy types should be passed by value). The author's chosen spelling (irep_idt vs. dstringt) is preserved. Formatting was applied via git-clang-format-15 so unrelated lines remain untouched. Manual fix folded in: jbmc/unit/java-testing-utils/require_goto_statements.cpp helpers `get_unique_non_null_expression_assigned_to_symbol` and `try_get_unique_symbol_assigned_to_symbol` previously returned `const exprt &` and `const symbol_exprt *` respectively, both pointing into a temporary `pointer_assignment_locationt` returned by value from `find_pointer_assignments`. On `develop` this was already UB but happened to point into memory that survived long enough for the caller `get_ultimate_source_symbol` to read the identifier. After the F.16 cleanup, the caller's `const auto &expr =` was changed to `const auto expr =` (correctly forcing a copy of the dangling reference), but the returned `const symbol_exprt *` then pointed into a function-local variable, dangling immediately on return and causing a SIGSEGV under java-unit on the `generic_bases_test.cpp:48` scenario in the coverage CI runner. Switch the helper to return `std::optional<irep_idt>` (the only piece of data the caller uses) and have `get_unique_non_null_expression_assigned_to_symbol` return `exprt` by value, eliminating the chain of dangling references. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
After the cleanup commits, scripts/check-pass-by-value reports zero violations across src/, jbmc/src/, unit/ and jbmc/unit/. The baseline is regenerated to an empty file, so the CI check effectively becomes "fail on any new pass-by-value violation of a cheap-to-copy type". Also: the runner no longer treats an empty tool output as an error. That state was previously unreachable (the baseline always had ~2,000 entries) and now is the steady state. The baseline mechanism itself is retained for forward compatibility: when CHEAP_TYPES is extended (for example to add goto_programt::const_targett), the new findings can be temporarily baselined while the corresponding cleanup PR is staged. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
irep_idtis adstringt, which holds a singleunsignedtable index. Passing it byconst irep_idt &was historically defensible becauseirep_idtcould be aliased tostd::stringviaUSE_STD_STRING-- but that compatibility was removed in4d935bf14f("Remove support for irep_idt-as-std::string"). Withdstringtas the only target, by-value aligns with [C++ Core Guidelines F.16].Why this is worth doing (correctness, not speed)
@kroening asked whether this yields a speed-up. Honestly: no measurable one, and that's fine -- the motivation is correctness and style, not performance.
require_goto_statements(a jbmc unit-test helper) that happened to work by accident ondevelop: a helper returned a reference into the entry-point instructions vector, which a caller bound to aconst &whose lifetime no longer covered the use once the initialising parameter became by-value. This is fixed here (folded into36dd78bc33), and is the strongest concrete argument for the change. The same class of latent issue is what gcc-14's-Wdangling-referenceflags (see5ef06bd40a).const irep_idt &advertises an indirection that no longer exists.Microbenchmark
Passing
irep_idtto anoinlinefunction ~3×10⁸ times, comparing the two calling conventions on otherwise-identical work (return a == b;):So at the call-convention level the change is performance-neutral; the cost of the by-reference indirection is lost in the noise (and is dominated by anything the callee actually does, e.g. a
symbol_tablelookup). The case rests on the correctness fix above and on Core-Guideline alignment.What's in this PR
scripts/check_pass_by_value.cpp+scripts/run_pass_by_value_check.sh+.github/workflows/syntax-checks.yaml, modelled on the existingcheck-irep-copiesjob) that flags any newparmVarDeclofconst T &whereTis in a configurable cheap-types list (dstringtfor now). Matching is on the fully-qualified record name (getQualifiedNameAsString()), so it won't over-match unrelated types sharing a simple name.--fixrewrites findings in place.regression/cbmc-pass-by-value/(positive, by-value, non-cheap-by-reference, a namespaced-dstringtnegative that locks in the qualified-name behaviour, and a--fixrewrite test), run from the same CI job.unit/andjbmc/unit/; a liveness canary (scripts/pass_by_value_canary.cpp) asserts the freshly built tool finds a known violation before scanning (so a broken toolchain can't pass silently against the empty baseline); and per-TU parse errors are counted and surfaced rather than discarded. (Note: a handful of generated/amalgamated parser TUs underunit/don't parse standalone under clang and are reported as parse errors; they contribute no findings, which the surfaced count makes visible.)--fixand formatted withgit-clang-format-15so unrelated lines stay untouched.Stats
527 files, +2,996/−2,694. Findings: 2,049 → 0.
Testing
unit(551 cases),java-unit(110 cases), andctest -L CORE(96 tests) all pass.git-clang-format-15clean. The checker self-test corpus passes, and the checker reports 0 findings acrosssrc/,jbmc/src/,unit/andjbmc/unit/.Notes for reviewers
const irep_idt &member declarations that would have become dangling references after their initialising parameter switched to by-value, and therequire_goto_statementshelper described above.std::function<…(const irep_idt &…)…>typedefs were updated tostd::function<…(irep_idt …)…>in lockstep with the lambdas/callables bound to them.-Wdangling-referenceworkaround (5ef06bd40a) binds results to values rather than using a pragma or-Wno-dangling-reference; the commit message now documents why (source-local, compiler-agnostic, copies are cheap refcount bumps).Limitations / follow-ups
std::function(Clang doesn't synthesise aParmVarDeclfor them). None exist ondevelop; aFunctionProtoTypeLocwalker would close the gap if any reappear.CHEAP_TYPESto nested STL types (e.g.goto_programt::const_targett) requires the fully-qualified, libstdc++-specific record name, not just the simple name -- documented in the source. A fuller positive/negative fixture corpus per category is a reasonable follow-up.ABI
Source-compatible for callers; out-of-tree consumers of
libcbmcneed a recompile (the mangled names off(irep_idt)andf(const irep_idt &)differ).