Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly
Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call “evaluation awareness,” raises fundamental questions about whether the safety tests that governments and companies rely on […] This story continues at The Next Web
This article was published on The Next Web (thenextweb.com). Read the full article on the original source:
Read full article on The Next Web