Skip to content

Add warning about backslash corruption when building CQL with string formatting in the doc#1292

Open
howiezhao wants to merge 1 commit intoapache:trunkfrom
howiezhao:patch-1
Open

Add warning about backslash corruption when building CQL with string formatting in the doc#1292
howiezhao wants to merge 1 commit intoapache:trunkfrom
howiezhao:patch-1

Conversation

@howiezhao
Copy link
Copy Markdown

Update Getting Started documentation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Getting Started guide to warn users against building CQL by interpolating Python values directly into query strings, and to steer them toward prepared statements. It fits the codebase by documenting a driver-specific correctness pitfall around collection string representations and CQL literal handling.

Changes:

  • Adds a warning explaining why Python string formatting should not be used to construct CQL from query results.
  • Adds an example showing how backslashes can be corrupted when collection values are interpolated into CQL text.
  • Adds a prepared-statement example showing the safe binary-protocol path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/getting_started.rst Outdated
@bschoening
Copy link
Copy Markdown
Contributor

@howiezhao Does the problem occur when you round-trip data? Read from the database, and then insert it back into the database? For the getting started docs, that might be a bit of a corner case.

@howiezhao
Copy link
Copy Markdown
Author

@howiezhao Does the problem occur when you round-trip data? Read from the database, and then insert it back into the database? For the getting started docs, that might be a bit of a corner case.

Hi @bschoening yes, if the map data contains backslashes, reading it in and writing it using Python string formatting will cause backslashes to be added, as I mentioned in the example, so I recommend using a prepared statement.

@bschoening
Copy link
Copy Markdown
Contributor

Would prefer something more concise, maybe something like...

`.. warning::

Avoid "Round-Tripping" data using string formatting.

Never use f-strings or % to insert data—especially driver-returned collections (maps, sets, lists)—back into a CQL query.

The Gotcha: The driver's collection objects use Python's repr() formatting for nested values. This automatically doubles backslashes (e.g., \ becomes \\). Because CQL does not use backslashes as escape characters, Cassandra will store those extra backslashes literally, corrupting your data.

The Fix: Always use prepared statements. They transmit data in a binary format that bypasses Python's string serialization entirely.

.. code-block:: python

# BAD: f-strings cause double-escaping/corruption
session.execute(f"UPDATE t SET my_map={row.my_map} WHERE id=1")

# GOOD: Prepared statements preserve data exactly
stmt = session.prepare("UPDATE t SET my_map=? WHERE id=1")
session.execute(stmt, [row.my_map])`

@howiezhao
Copy link
Copy Markdown
Author

Would prefer something more concise, maybe something like...

`.. warning::

Avoid "Round-Tripping" data using string formatting.

Never use f-strings or % to insert data—especially driver-returned collections (maps, sets, lists)—back into a CQL query.

The Gotcha: The driver's collection objects use Python's repr() formatting for nested values. This automatically doubles backslashes (e.g., \ becomes \\). Because CQL does not use backslashes as escape characters, Cassandra will store those extra backslashes literally, corrupting your data.

The Fix: Always use prepared statements. They transmit data in a binary format that bypasses Python's string serialization entirely.

.. code-block:: python

# BAD: f-strings cause double-escaping/corruption
session.execute(f"UPDATE t SET my_map={row.my_map} WHERE id=1")

# GOOD: Prepared statements preserve data exactly
stmt = session.prepare("UPDATE t SET my_map=? WHERE id=1")
session.execute(stmt, [row.my_map])`

Hi @bschoening good suggestion, updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants