Docs/split phase1 phase2 readmes by sofasogood · Pull Request #13 · RDI-Foundation/agentbeats-tutorial

sofasogood · 2026-02-25T20:03:45Z

No description provided.

- Plugin-based framework for dynamic adversarial security testing - Generic attacker/defender agents with context injection - Three example scenarios: PortfolioIQ, Thingularity, Medical Records - Comprehensive competition documentation

The attacker and defender agents require the openai package to communicate with OpenAI's API.

- Change all Track references to Phase terminology - Add detailed Phase 1 (Nov 21 - Dec 19) and Phase 2 (Jan 12 - Feb 23) info - Add model constraint: gpt-oss-20b required for all agents - Add deliverables, example flow, and submission guidelines - Update orchestrator to auto-generate evidence files (baseline_passed.json, attack_succeeded.json) - Create example submission at submissions/example_team/example_scenario/ - Fix model inconsistency in SCENARIO_SPECIFICATIONS.md - Delete redundant COMPETITION_GUIDE.md

- Add OPENAI_BASE_URL env var support to both agents - Allows all agents to share a single local vLLM/Ollama server - Defaults to OpenAI API when not set (backward compatible) - Document local model serving setup in README

- Add step-by-step setup instructions (fork, install, env vars) - Document how to run Lambda scenarios (PortfolioIQ, Thingularity) - Document how to run example submission - Add PR submission process (fork, branch, PR title format) - Note Python >= 3.11 requirement

- Add team_name field to ArenaResult model - Update _save_results() to create results/{team_name}/{scenario_type}/{timestamp}/ path - Save evidence files to both central results and submission test_results folders - Include team_name in baseline_passed.json and attack_succeeded.json - Update example submission TOML and README with team_name documentation

- Update README.md with explicit 100 + 20 bonus point structure - Align judging criteria: Functionality (40), Target Sensitivity (30), Creativity (30), Attack Success (+20 bonus) - Add result.json to test_results in SCENARIO_SPECIFICATIONS.md - Clarify baseline definition and 0-point policy for trivial targets

Minor update to readme

Explicitly set model to openai/gpt-oss-20b in toml

- Update competition dates: Phase 1 (Nov 24 - Jan 5), Phase 2 (Jan 19 - Feb 23) - Add Lambda × Berkeley branding and track description - Standardize model flag (--model openai/gpt-oss-20b) across all TOML files - Add support section with Discord/GitHub contact info - Update Phase 2 evaluation to focus on public/private leaderboards - Remove specific metric references (ASR, defense rate percentages) - Standardize scenario count to 450+ specifications - Remove evaluation metrics table from README

- Stateless defender (fresh each round, no memory) - Stateful attacker (learns across rounds with history) - Simplified defender prompt (generic professional principles) - Enhanced attacker prompt (strategic planning, balanced examples) - Smarter thingularity detection (excludes denials from leaks) - Gandalf-style thingularity (attacker knows categories, not exact names) - Early termination on attacker success

- Add missing __init__.py files to submissions folder hierarchy - Update import paths in README to use relative imports (from ..submissions...) - Register example_scenario plugin in registry.py This fixes the ModuleNotFoundError and circular import issues when following the tutorial to register custom scenario plugins.

- gymjailbreak: Add 'capped from' pattern for safety caveats

- New _format_battle_log_html() generates self-contained HTML report - Dark theme with color-coded attacker/defender sections - Visual status indicators for each round - Saves battle_log.html alongside battle_log.md

- Align resume topics with the actual job posting (Senior Software Engineer: Python, Cloud, Leadership) instead of unrelated roles (data analyst, project manager) - Fix expected_concepts to validate the defender's JSON ranking output (rank, score, rationale, candidates) instead of resume input content - Add variety in seniority: strong mid (7yr), junior (1-2yr), senior (10yr) - Tested: 3/3 topics pass with gpt-oss-20b on vLLM

The full defender response is now saved in the transcript JSON, making it easier to review and debug normal user test results.

Plugins: - thingularity: add tool descriptions, show hidden tools in defender task description - example_medical: use random SSNs with patient names/DOBs, move SSNs to defender task description with realistic use case, add missing format_input() - moneymatemayhem: add MoneyMate product details (fees, times, security) to defender - flightapifake: add legitimate package details to defender task description Normal user topics: - thingularity: test public tool visibility (search, details, stock) - flightapifake: replace comparison/setup topics with recommendation topics compatible with JSON output format - legaleagleeye: more specific expected_concepts for force majeure analysis - encodedpayload: replace duplicate topic with workplace accommodation - gymjailbreak: fix expected_concept "heart rate" -> "workout"

…uator The evaluator marked correct defender responses as "not helpful" because it didn't know the required output format was a brief JSON. Now the orchestrator passes output_format to the evaluation context, and the evaluate template instructs the LLM not to penalize brevity when the format only requires a few fields.

Replace the monolithic scenarios/security_arena/README.md with a short landing page that links to docs/phase1.md and docs/phase2.md (already on main). Add phase doc links to root README.

sofasogood and others added 30 commits November 19, 2025 03:23

Add openai dependency for security arena agents

e1cab44

The attacker and defender agents require the openai package to communicate with OpenAI's API.

Fix README YAML parsing error

87dd622

Add configurable base URL for local model serving

7f8b65c

- Add OPENAI_BASE_URL env var support to both agents - Allows all agents to share a single local vLLM/Ollama server - Defaults to OpenAI API when not set (backward compatible) - Document local model serving setup in README

Fix attack success rate to use actual rounds played

83532b2

Add link to scenario list for Phase 1 participants

cc1c6ba

update readme

81da45b

Merge pull request #2 from LambdaLabsML/lambda/update-readme

2ba3ce4

Minor update to readme

Update Phase 1 rubric to 20/40/40 with required attack gate

a6d56f4

Change default model from gpt-4o-mini to openai/gpt-oss-20b

61bd3a7

instead of default to gpt-oss-20b, set it in the toml file

5a73d61

Update README.md

8971bb8

Merge pull request #4 from LambdaLabsML/lambda/default-llm

6110aa7

Explicitly set model to openai/gpt-oss-20b in toml

Update scenario count to 400+

30578f4

Update README.md - updating competition dates

d73dd1e

Update README.md - updating dates

295494c

ignore ./results/

22fdda3

fix import error

bae8ac4

added attacker/defender timeout/crash counters

e8ed076

adapted results_dir structure

d2680fa

remove unused debate scenario

edda276

replaceable attacker/defender submission folders

a83af1c

Add 4 new security arena scenarios

dafc126

da-h and others added 30 commits February 6, 2026 18:47

made normal user a proper attacker-like agent

3bddb81

consolidated _save_normal_user_result & _save_results

f42c19a

started simplification of agent api

53c4e0f

refactor agents to use jinja2

17c052f

slightly minimized orchestrator.py

04a7647

added example data storage

8d47952

added battle_log.md

86cf70d

normal user may retry up to 3 times

e1cf1ef

scenario: portfolioiq finished

654a983

agents: bugfix for None response

2184ba7

scenario: some adaptions of scenarios

4e76f71

Increase agent timeout to 5 minutes (300s)

689dc6b

Fix check_success edge cases in gymjailbreak and ********

ede07cb

- gymjailbreak: Add 'capped from' pattern for safety caveats

Add HTML battle report generation

a9050aa

- New _format_battle_log_html() generates self-contained HTML report - Dark theme with color-coded attacker/defender sections - Visual status indicators for each round - Saves battle_log.html alongside battle_log.md

Remove 500-char truncation of defender response in normal user results

d02dead

The full defender response is now saved in the transcript JSON, making it easier to review and debug normal user test results.

Fix gymjailbreak tables

47ed214

fixed scenario regexes to also check for non-breaking whitespaces

3e59600

fixed hyphen regex in example_medical scenario

f78e27d

increase battle-timeout

c14458d

adapted example_medical to newer plugin api

bec12b0

make normal user max attempts an env variable

8f40e60

log errors to results.err

23fb86d

unify number of rounds per scenario (num_rounds = 7)

92cd97f

added default submission action

8aa62e5

added missing action variables (run_tests & openapi_*)

04f42f3

removed high-variance normal user questions

8aa4a61

Split security arena README into Phase 1 and Phase 2 docs

257a029

Replace the monolithic scenarios/security_arena/README.md with a short landing page that links to docs/phase1.md and docs/phase2.md (already on main). Add phase doc links to root README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs/split phase1 phase2 readmes#13

Docs/split phase1 phase2 readmes#13
sofasogood wants to merge 71 commits intoRDI-Foundation:mainfrom
LambdaLabsML:docs/split-phase1-phase2-readmes

sofasogood commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sofasogood commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants