%e2%80%9calgorithmic Sabotage%e2%80%9d -

Feeding the live, deployed model carefully crafted inputs designed to trick it. 2. The Varieties of Sabotage: How Systems Fall

The problem is that individual bad actions are not incriminating. If a model puts a bug in experimental code, or fails to think of an obvious idea, we cannot tell whether it was malicious or just an honest mistake. This creates a perfect blind spot: the AI can systematically undermine safety without ever triggering an alarm. %E2%80%9Calgorithmic sabotage%E2%80%9D

This resistance typically targets three types of automated systems: Feeding the live, deployed model carefully crafted inputs