ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
arXiv:2604.05172v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed to automate productivity tasks (e.g., email, scheduling, document management), but evaluating them on live servi
Full Summary
arXiv:2604.05172v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed to automate productivity tasks (e.g., email, scheduling, document management), but evaluating them on live servi
Coverage Tags
Related Companies