LLM-based System Security Attack

Context:
- It can (often) involve Prompt Injection to alter the behavior of the LLM by inserting malicious instructions.
- It can (often) target the exposure of sensitive information via attacks like Prompt Leaking, where hidden system instructions are revealed.
- ...
- It can range from being an external threat, such as an Adversarial Attack on AI Models, to internal vulnerabilities, such as flawed LLM Training Data.
- ...
- It can affect the integrity and trustworthiness of outputs.
- It can compromise data privacy by extracting personal or proprietary information from the LLM's output.
- ...
Example(s):
- a Prompt Injection attack where a user manipulates the model into generating harmful or false outputs.
- a Data Poisoning Attack where malicious data is introduced during the training phase of the LLM, leading to corrupted model outputs.
- a Model Inversion Attack where an attacker reconstructs sensitive training data from the model’s responses.
- ...
Counter-Example(s):
- LLM Tuning Issues related to model performance, which are due to poor optimization rather than active malicious interference.
- LLM Misalignment, where the model’s behavior diverges from expected outputs but is not the result of an attack.
See: Prompt Injection, Adversarial Attack on AI Models, Data Privacy in AI

References