Prompt Leaking Attack: Difference between revisions

Latest revision as of 08:23, 12 November 2024

AKA: Prompt Extraction.
Context:
- It can (typically) occur in conversational AI systems, where the underlying prompt or instruction set for the model is unintentionally revealed to the user.
- It can (often) expose proprietary prompts, user-specific instructions, or safety mechanisms embedded in the model.
- ...
Example(s):
- Ignore previous instruction attacks or Reveal your training data attacks.
- ...
Counter-Example(s):
- LLM Jailbreaking, which focuses on bypassing content restrictions rather than leaking instructions.
See: Prompt Injection, Adversarial Attack on AI Models, Model Robustness.

@@ Line 10: / Line 10: @@
 * <B>Counter-Example(s):</B>
 ** [[LLM Jailbreaking]], which focuses on bypassing content restrictions rather than leaking instructions.
-* <B>See:</B> [[Prompt Injection]], [[Adversarial Attack on AI Models]], [[Model Robustness]]
+* <B>See:</B> [[Prompt Injection]], [[Adversarial Attack on AI Models]], [[Model Robustness]].
 ----
 ----