Apr 24, 2024
OpenAI’s new ‘instruction hierarchy’ could make AI models harder to fool
Posted by Dan Kummer in category: robotics/AI
1/ OpenAI researchers have proposed a new instruction hierarchy approach to reduce the vulnerability of large language models (LLMs) to prompt injection attacks and jailbreaks.
OpenAI researchers propose an instruction hierarchy for AI language models. It is intended to reduce vulnerability to prompt injection attacks and jailbreaks. Initial results are promising.
Language models (LLMs) are vulnerable to prompt injection attacks and jailbreaks, where attackers replace the model’s original instructions with their own malicious prompts.
Continue reading “OpenAI’s new ‘instruction hierarchy’ could make AI models harder to fool” »