Docs/split phase1 phase2 readmes#13
Open
sofasogood wants to merge 71 commits intoRDI-Foundation:mainfrom
Open
Conversation
- Plugin-based framework for dynamic adversarial security testing - Generic attacker/defender agents with context injection - Three example scenarios: PortfolioIQ, Thingularity, Medical Records - Comprehensive competition documentation
The attacker and defender agents require the openai package to communicate with OpenAI's API.
- Change all Track references to Phase terminology - Add detailed Phase 1 (Nov 21 - Dec 19) and Phase 2 (Jan 12 - Feb 23) info - Add model constraint: gpt-oss-20b required for all agents - Add deliverables, example flow, and submission guidelines - Update orchestrator to auto-generate evidence files (baseline_passed.json, attack_succeeded.json) - Create example submission at submissions/example_team/example_scenario/ - Fix model inconsistency in SCENARIO_SPECIFICATIONS.md - Delete redundant COMPETITION_GUIDE.md
- Add OPENAI_BASE_URL env var support to both agents - Allows all agents to share a single local vLLM/Ollama server - Defaults to OpenAI API when not set (backward compatible) - Document local model serving setup in README
- Add step-by-step setup instructions (fork, install, env vars) - Document how to run Lambda scenarios (PortfolioIQ, Thingularity) - Document how to run example submission - Add PR submission process (fork, branch, PR title format) - Note Python >= 3.11 requirement
- Add team_name field to ArenaResult model
- Update _save_results() to create results/{team_name}/{scenario_type}/{timestamp}/ path
- Save evidence files to both central results and submission test_results folders
- Include team_name in baseline_passed.json and attack_succeeded.json
- Update example submission TOML and README with team_name documentation
- Update README.md with explicit 100 + 20 bonus point structure - Align judging criteria: Functionality (40), Target Sensitivity (30), Creativity (30), Attack Success (+20 bonus) - Add result.json to test_results in SCENARIO_SPECIFICATIONS.md - Clarify baseline definition and 0-point policy for trivial targets
Minor update to readme
Explicitly set model to openai/gpt-oss-20b in toml
- Update competition dates: Phase 1 (Nov 24 - Jan 5), Phase 2 (Jan 19 - Feb 23) - Add Lambda × Berkeley branding and track description - Standardize model flag (--model openai/gpt-oss-20b) across all TOML files - Add support section with Discord/GitHub contact info - Update Phase 2 evaluation to focus on public/private leaderboards - Remove specific metric references (ASR, defense rate percentages) - Standardize scenario count to 450+ specifications - Remove evaluation metrics table from README
- Stateless defender (fresh each round, no memory) - Stateful attacker (learns across rounds with history) - Simplified defender prompt (generic professional principles) - Enhanced attacker prompt (strategic planning, balanced examples) - Smarter thingularity detection (excludes denials from leaks) - Gandalf-style thingularity (attacker knows categories, not exact names) - Early termination on attacker success
- Add missing __init__.py files to submissions folder hierarchy - Update import paths in README to use relative imports (from ..submissions...) - Register example_scenario plugin in registry.py This fixes the ModuleNotFoundError and circular import issues when following the tutorial to register custom scenario plugins.
- gymjailbreak: Add 'capped from' pattern for safety caveats
- New _format_battle_log_html() generates self-contained HTML report - Dark theme with color-coded attacker/defender sections - Visual status indicators for each round - Saves battle_log.html alongside battle_log.md
- Align resume topics with the actual job posting (Senior Software Engineer: Python, Cloud, Leadership) instead of unrelated roles (data analyst, project manager) - Fix expected_concepts to validate the defender's JSON ranking output (rank, score, rationale, candidates) instead of resume input content - Add variety in seniority: strong mid (7yr), junior (1-2yr), senior (10yr) - Tested: 3/3 topics pass with gpt-oss-20b on vLLM
The full defender response is now saved in the transcript JSON, making it easier to review and debug normal user test results.
Plugins: - thingularity: add tool descriptions, show hidden tools in defender task description - example_medical: use random SSNs with patient names/DOBs, move SSNs to defender task description with realistic use case, add missing format_input() - moneymatemayhem: add MoneyMate product details (fees, times, security) to defender - flightapifake: add legitimate package details to defender task description Normal user topics: - thingularity: test public tool visibility (search, details, stock) - flightapifake: replace comparison/setup topics with recommendation topics compatible with JSON output format - legaleagleeye: more specific expected_concepts for force majeure analysis - encodedpayload: replace duplicate topic with workplace accommodation - gymjailbreak: fix expected_concept "heart rate" -> "workout"
…uator The evaluator marked correct defender responses as "not helpful" because it didn't know the required output format was a brief JSON. Now the orchestrator passes output_format to the evaluation context, and the evaluate template instructs the LLM not to penalize brevity when the format only requires a few fields.
Replace the monolithic scenarios/security_arena/README.md with a short landing page that links to docs/phase1.md and docs/phase2.md (already on main). Add phase doc links to root README.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.