Prompt Leaking Attack
Jump to navigation
Jump to search
A Prompt Leaking Attack is an LLM security attack in which an LLM attacker extracts LLM hidden system instructions during or after LLM interaction.
- AKA: Prompt Extraction.
- Context:
- It can (typically) occur in conversational AI systems, where the underlying prompt or instruction set for the model is unintentionally revealed to the user.
- It can (often) expose proprietary prompts, user-specific instructions, or safety mechanisms embedded in the model.
- ...
- Example(s):
- Counter-Example(s):
- LLM Jailbreaking, which focuses on bypassing content restrictions rather than leaking instructions.
- See: Prompt Injection, Adversarial Attack on AI Models, Model Robustness