Skip to content

FEAT: Conversation Scorer and Psychosocial Harms Red Teaming Automation#1138

Merged
jbolor21 merged 57 commits intomicrosoft:mainfrom
jbolor21:users/bjagdagdorj/psychosocial
Dec 12, 2025
Merged

FEAT: Conversation Scorer and Psychosocial Harms Red Teaming Automation#1138
jbolor21 merged 57 commits intomicrosoft:mainfrom
jbolor21:users/bjagdagdorj/psychosocial

Conversation

@jbolor21
Copy link
Copy Markdown
Contributor

@jbolor21 jbolor21 commented Oct 17, 2025

Description

Adding notebook for red teaming for psyschosocial harms using a multi-step approach of modeling user behaviors, contexts, and evaluations

  • Created new conversation scorer to score the entire conversation
  • Refactored lookbackscorer notebook to use this conversation scorer instead of the seperate lookbackscorer
  • Added a toy dataset with sample multi-turn conversations
  • Added a sample attack strategy yaml file modeling a user escalation towards crisis

Tests and Documentation

Ran notebooks, added unit tests

@jbolor21 jbolor21 marked this pull request as draft October 17, 2025 17:57
@jbolor21 jbolor21 changed the title [DRAFT] Psychosocial Harms Red Teaming Automation FEAT: Psychosocial Harms Red Teaming Automation Oct 20, 2025
@jbolor21 jbolor21 marked this pull request as ready for review October 20, 2025 21:25
Comment thread pyrit/score/conversation_scorer.py
Comment thread pyrit/datasets/executors/crescendo/escalation_crisis.yaml
@rlundeen2 rlundeen2 self-assigned this Oct 24, 2025
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
Comment thread pyrit/score/true_false/float_scale_threshold_scorer.py Outdated
Comment thread pyrit/score/float_scale/conversation_scorer.py Outdated
@jbolor21 jbolor21 changed the title FEAT: Psychosocial Harms Red Teaming Automation FEAT: Conversation Scorer and Psychosocial Harms Red Teaming Automation Dec 9, 2025
@jbolor21 jbolor21 force-pushed the users/bjagdagdorj/psychosocial branch 2 times, most recently from 8f97794 to e50bc05 Compare December 9, 2025 23:51
@jbolor21 jbolor21 merged commit 2337fce into microsoft:main Dec 12, 2025
19 checks passed
@jbolor21 jbolor21 deleted the users/bjagdagdorj/psychosocial branch December 12, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants