FEAT: Introduce IdentifierFilters to allow generic DB queries on identifier… by behnam-o · Pull Request #1557 · microsoft/PyRIT

behnam-o · 2026-04-01T20:49:43Z

Introduce IdentifierFilters which allow us to query memory entities by matching properties of their referenced identifiers
Replace some existing identifier-based queries with the new generic functions
Add tests
[stretch but along the same lines] introduces a few similar internal methods that operate on identifiers, abstracting the entity specificities from DB interface implementations (for example, we have a method on the interface to get unique values out of an array that holds converter classes, found somewhere in an attack identifier property. Today, each implementation of memory (AzureSQL/SQLite) has a hard-coded query that does only that by explicitly referencing the attack result table, doing some json magic, and extracting the values. With these new methods, the interface requires the implementations to implement mechansims to extract unique values out of any array in a JSON column, so 1) they don't need to hard-code queries on specific component types and 2) it can be reused on any desired property/identifier without having to update the memory implementations

… properties

pyrit/memory/azure_sql_memory.py

pyrit/memory/identifier_filters.py

pyrit/memory/memory_interface.py

bashirpartovi · 2026-04-02T15:31:55Z

pyrit/memory/identifier_filters.py

+class AttackIdentifierFilter(IdentifierFilter[AttackIdentifierProperty]):
+    """
+    Immutable filter definition for matching JSON-backed attack identifier properties.
+
+    Args:
+        property_path: The JSON path of the property to filter on.
+        value_to_match: The value to match against the property.
+        partial_match: Whether to allow partial matches (default: False).
+    """
+
+
+@dataclass(frozen=True)
+class TargetIdentifierFilter(IdentifierFilter[TargetIdentifierProperty]):
+    """Immutable filter definition for matching JSON-backed target identifier properties."""
+
+
+@dataclass(frozen=True)
+class ConverterIdentifierFilter(IdentifierFilter[ConverterIdentifierProperty]):
+    """Immutable filter definition for matching JSON-backed converter identifier properties."""
+
+
+@dataclass(frozen=True)
+class ScorerIdentifierFilter(IdentifierFilter[ScorerIdentifierProperty]):
+    """Immutable filter definition for matching JSON-backed scorer identifier properties."""


Here you have a _StrEnum base, a Generic[T] ABC IdentifierFilter, then 4 *Property enums, and 4 *Filter subclasses, but every filter subclass is an empty body, they add zero behavior. I think the IdentifierFilter type hierarchy is unnecessary.

The Generic[T] bound gives you type-time safety on which *Property enum you pass, but at runtime __post_init__ immediately calls str(self.property_path), erasing the enum type entirely. So a caller can pass an any string and it works fine.

For 4 empty subclasses are only different in the type parameter, I think this is a little bit of over-engineering. You're essentially encoding which JSON column to query in the type of the filter, but the actual column is still chosen by the caller at the call site (e.g. json_column=ScoreEntry.scorer_class_identifier). The type hierarchy doesn't prevent users misusing it, nothing stops you from passing a ScorerIdentifierFilter with json_column=AttackResultEntry.atomic_attack_identifier.

I think a single IdentifierFilter dataclass with a flat property_path: str would be simpler, equally extensible, and more transparent about what the runtime actually does.

Here is what I propose:

@dataclass(frozen=True) class IdentifierFilter: property_path: str value_to_match: str partial_match: bool = False

That's it. The *Property enums are fine to keep as constants (or even a flat module-level class IdentifierPaths namespace), but there's no need for them to constrain the filter type generically. The column binding already happens at the call site in memory_interface.py, so the filter is purely about what path, what value, plus the exact or partial condition.

This removes most of the classes you proposed here and is equally type safe because the real safety comes from which json_column you pass, not the filter type, It is also extensible since new properties are just new enum values and no new filter class is needed.

And I think for _get_condition_json_array_match, you should add a sub_path: str | None = None parameter (that mimics _get_unique_json_array_values), and use it instead of hardcoding '$.class_name'.

Here is an example of how this would work:

results = memory.get_attack_results( identifier_filter=IdentifierFilter( property_path=AttackIdentifierProperty.ATTACK_CLASS_NAME, value_to_match="Crescendo", partial_match=True, ), ) results = memory.get_scores( identifier_filter=IdentifierFilter( property_path=ScorerIdentifierProperty.CLASS_NAME, value_to_match="SelfAskLikertScorer", ), )

Then inside memory_interface.py, the method signature pins which column the filter applies to:

def get_scores( self, *, scorer_identifier_filter: IdentifierFilter | None = None, ... ) -> Sequence[Score]: if scorer_identifier_filter: conditions.append( self._get_condition_json_property_match( # this is where the column is bound --- json_column=ScoreEntry.scorer_class_identifier, # --- property_path=scorer_identifier_filter.property_path, value_to_match=scorer_identifier_filter.value_to_match, partial_match=scorer_identifier_filter.partial_match, ) )

The main reason I introduced all those ***IdentifierProperty.XYZ was to limit what path on an identifier can be queried ... I agree it's not a bad idea to keep it free form, especially since our identifiers are constructed in somewhat of a free-form manner where keys are arbitrary strings.

maybe down the road, we want to have our identifiers statically typed, and then it might make sense to also have filters enforce that.

for now, made the property_path to allow a free-form string

@bashirpartovi @hannahwestra25 @ValbuenaVC Thanks for your comments, I think you all touched on this free-form vs. restricted property_path pattern, and I agree it is a bit of an over-engineering with no "real" benefit. Please let me know if we should have any follow-ups on this.

pyrit/memory/azure_sql_memory.py

pyrit/memory/identifier_filters.py

ValbuenaVC

This PR (#1451) might be a useful reference since it also handled filtering concerns, although for datasets.

pyrit/memory/azure_sql_memory.py

pyrit/memory/identifier_filters.py

hannahwestra25 · 2026-04-06T15:11:03Z

pyrit/memory/memory_interface.py

        not_data_type: Optional[str] = None,
        converted_value_sha256: Optional[Sequence[str]] = None,
+        attack_identifier_filter: Optional[IdentifierFilter] = None,
+        prompt_target_identifier_filter: Optional[IdentifierFilter] = None,


i think we would also want to filter by converter identifier

Right now every identifier-bearing field needs its own dedicated parameter on the MemoryInterface query methods, which means adding a new stored identifier later requires changing public method signatures again. so like this method and get_attack_results, and get_scenario_results could potentially have an infinite amount of filter parameters (not really but you might see my point)

could we switch from one-parameter-per-identifier-field to a generic identifier_filters collection per query method, and map logical filter targets to concrete columns internally. That would make adding new identifier-bearing fields mostly a matter of extending a field map instead of changing every public API again.

hannahwestra25 · 2026-04-06T15:17:05Z

pyrit/memory/azure_sql_memory.py

        """
        return self._get_metadata_conditions(prompt_metadata=metadata)[0]

+    def _get_condition_json_property_match(


could you add doc strings to explain the funciton / parameters

hannahwestra25 · 2026-04-06T15:17:43Z

pyrit/memory/azure_sql_memory.py

-
-        Args:
-            endpoint (str): The endpoint URL substring to filter by (case-insensitive).
+        Insert a list of message pieces into the memory storage.


add args here

and returns

hannahwestra25 · 2026-04-06T15:21:36Z

tests/unit/memory/memory_interface/test_interface_attack_results.py

    assert len(results) == 0


-def test_get_attack_results_by_attack_class_case_sensitive(sqlite_instance: MemoryInterface):


does this fail ? / why are we removing it ? want to make sure we have back compat so this shouldn't fail and it looks like it's not replaced

Behnam Ousat added 4 commits April 1, 2026 13:47

Introduce IdentifierFilters to allow generic DB queries on identifier…

6ec9e80

… properties

forgot formatting

01aaa15

return str

e77b43c

fix method name

a06b506

hannahwestra25 reviewed Apr 2, 2026

View reviewed changes

pyrit/memory/azure_sql_memory.py Show resolved Hide resolved

hannahwestra25 reviewed Apr 2, 2026

View reviewed changes

pyrit/memory/identifier_filters.py Outdated Show resolved Hide resolved

hannahwestra25 reviewed Apr 2, 2026

View reviewed changes

pyrit/memory/memory_interface.py Outdated Show resolved Hide resolved

add back public methods

9d3cb5f

bashirpartovi reviewed Apr 2, 2026

View reviewed changes

Behnam Ousat added 2 commits April 2, 2026 11:42

custom subpath for array match and make all matches case insensitive

5389a9f

format

3fa0713

ValbuenaVC reviewed Apr 2, 2026

View reviewed changes

pyrit/memory/azure_sql_memory.py Show resolved Hide resolved

pyrit/memory/identifier_filters.py Outdated Show resolved Hide resolved

Behnam Ousat added 6 commits April 2, 2026 14:26

allow free-form paths in identifier filters

24f61d1

unncecessary post-init

39361af

fix exact match in azsql

d2191a2

use bind_param in new methods to avoid sql injection

fd22ab8

prevent text collisions using a uuid for bind_params

227e7e5

format

7b3b5c1

hannahwestra25 reviewed Apr 6, 2026

View reviewed changes

		assert len(results) == 0


		def test_get_attack_results_by_attack_class_case_sensitive(sqlite_instance: MemoryInterface):

Conversation

behnam-o commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValbuenaVC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

behnam-o commented Apr 1, 2026 •

edited

Loading