Prompt Leaking Attack: Difference between revisions
Jump to navigation
Jump to search
(ContinuousReplacement) Tag: continuous replacement |
m (Text replacement - "]]↵----" to "]]. ----") |
||
Line 10: | Line 10: | ||
* <B>Counter-Example(s):</B> | * <B>Counter-Example(s):</B> | ||
** [[LLM Jailbreaking]], which focuses on bypassing content restrictions rather than leaking instructions. | ** [[LLM Jailbreaking]], which focuses on bypassing content restrictions rather than leaking instructions. | ||
* <B>See:</B> [[Prompt Injection]], [[Adversarial Attack on AI Models]], [[Model Robustness]] | * <B>See:</B> [[Prompt Injection]], [[Adversarial Attack on AI Models]], [[Model Robustness]]. | ||
---- | ---- | ||
---- | ---- |
Latest revision as of 08:23, 12 November 2024
A Prompt Leaking Attack is an LLM security attack in which an LLM attacker extracts LLM hidden system instructions during or after LLM interaction.
- AKA: Prompt Extraction.
- Context:
- It can (typically) occur in conversational AI systems, where the underlying prompt or instruction set for the model is unintentionally revealed to the user.
- It can (often) expose proprietary prompts, user-specific instructions, or safety mechanisms embedded in the model.
- ...
- Example(s):
- Counter-Example(s):
- LLM Jailbreaking, which focuses on bypassing content restrictions rather than leaking instructions.
- See: Prompt Injection, Adversarial Attack on AI Models, Model Robustness.