Skip to content

Escape object/data names in Yul serializer#16517

Merged
msooseth merged 9 commits intodevelopfrom
fix-reparse-bug
Mar 18, 2026
Merged

Escape object/data names in Yul serializer#16517
msooseth merged 9 commits intodevelopfrom
fix-reparse-bug

Conversation

@msooseth
Copy link
Copy Markdown
Contributor

@msooseth msooseth commented Mar 12, 2026

Fix: Unescaped object/data names in Yul serializer cause crash on reparse

Fixes: #16516

Problem

Object::toString() and Data::toString() concatenated the name field directly into the serialized output:

"object \"" + name + "\" {\n"

The name is stored as the raw unescaped string value from the scanner, so a name containing " (e.g. parsed from object "foo\"bar" { ... }) would produce malformed Yul like:

object "foo"bar" {

After optimization, YulStack::reparse() serializes the AST via print() and re-parses it. The malformed output caused a fatal parser error: "Expected end of source but got identifier".

Fix

Replace manual quoting with util::escapeAndQuoteString(name) in both Object::toString() and Data::toString(). This escapes ", \, and non-printable characters before re-serialization, making the output round-trip correctly through the parser.

formatUseSrcComment() in the same file already used escapeAndQuoteString correctly for source names — this applies the same pattern to object and data names.

Output on crashing input

$ ./solc/solc --strict-assembly --optimize-yul crash-9bba37f2d0573d9282916d92152e7d9a23571560

======= crash-9bba37f2d0573d9282916d92152e7d9a23571560 (EVM) =======

Pretty printed source:
object "3\"{code{}}e0003" {
    code { { } }
}


Binary representation:
00

Text representation:
    /* "crash-9bba37f2d0573d9282916d92152e7d9a23571560":37:44   */
  stop

Disclosures

Code generated by Caude Claude. Reviewed by myself.

@msooseth msooseth changed the title Fix reparse bug Fix name escape bug Mar 12, 2026
@msooseth msooseth requested review from clonker and r0qs March 12, 2026 13:15
Comment thread libyul/Object.cpp
@cameel cameel changed the title Fix name escape bug Escape object/data names in Yul serializer Mar 12, 2026
Comment thread Changelog.md Outdated
// to prevent a reparse failure during optimization (exercises Data::toString).
object "a" {
code { {} }
data "my\"data" hex"deadbeef"
Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add datasize("my\"data") in there to prove that you can successfully refer to such data, not just declare it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like we can, I updated the test :) Yay. Let me know what you think!

@@ -0,0 +1,13 @@
// Object name containing a backslash must be escaped in serialization
// to prevent a reparse failure during optimization.
object "foo\\bar" {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use a name that has all the possible escapes? There's only a handful of them:

if (c == '\\')
out += "\\\\";
else if (c == '"')
out += "\\\"";
else if (c == '\n')
out += "\\n";
else if (c == '\r')
out += "\\r";
else if (c == '\t')
out += "\\t";
else if (!isPrint(c))
{
std::ostringstream o;
o << "\\x" << std::hex << std::setfill('0') << std::setw(2) << (unsigned)(unsigned char)(c);
out += o.str();
}

And please use datasize() here as well.

Copy link
Copy Markdown
Contributor Author

@msooseth msooseth Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test case has them all: test/libyul/yulSyntaxTests/objects/object_name_with_all_special_chars.yul :) Let me know what you think!

Copy link
Copy Markdown
Collaborator

@cameel cameel Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure seeing assembly outputs adds much to these tests. It shows that they compile, but we only expect issues in the parser anyway. Maybe these would be better off in test/libyul/yulSyntaxTests/objects/ instead?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also allow us to add some coverage for invalid cases. For example I think we should have tests covering names that include an unescaped newline, tab, or unicode character. For example something like this should produce an error:

object "A" {
    code {
        sstore(0, datasize("B\nC"))
    }
    data "B
C" hex"1234"
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test test/libyul/yulSyntaxTests/objects/data_name_with_literal_newline.yul to address this concern.

I also moved all the files into test/libyul/yulSyntaxTests/objects/ you are right, it makes a ton more sense. Sorry, I am not very familiar with the testing setup in Solidity :S

Copy link
Copy Markdown
Member

@clonker clonker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits need squashing and the branch needs a rebase. We try avoiding merge commits in the history and an up-to-date rebase ensures that the changes integrate well with the current develop as well as keeps the history neat and linear.
Changes and test coverage are good!

@msooseth msooseth enabled auto-merge (squash) March 18, 2026 10:03
@msooseth msooseth merged commit 4e281a7 into develop Mar 18, 2026
83 checks passed
@msooseth msooseth deleted the fix-reparse-bug branch March 18, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unescaped object/data names in Yul serializer cause crash on reparse

3 participants