Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
DynaGuard: A Dynamic Guardrail Model With User-Defined Policies Paper • 2509.02563 • Published Sep 2, 2025 • 20