LLM-based System Security Attack

From GM-RKB
Jump to navigation Jump to search

An LLM-based System Security Attack is an adversarial software system attack designed to exploit LLM vulnerabilities.

  • Context:
    • It can (often) involve Prompt Injection to alter the behavior of the LLM by inserting malicious instructions.
    • It can (often) target the exposure of sensitive information via attacks like Prompt Leaking, where hidden system instructions are revealed.
    • ...
    • It can range from being an external threat, such as an Adversarial Attack on AI Models, to internal vulnerabilities, such as flawed LLM Training Data.
    • ...
    • It can affect the integrity and trustworthiness of outputs.
    • It can compromise data privacy by extracting personal or proprietary information from the LLM's output.
    • ...
  • Example(s):
    • a Prompt Injection attack where a user manipulates the model into generating harmful or false outputs.
    • a Data Poisoning Attack where malicious data is introduced during the training phase of the LLM, leading to corrupted model outputs.
    • a Model Inversion Attack where an attacker reconstructs sensitive training data from the model’s responses.
    • ...
  • Counter-Example(s):
    • LLM Tuning Issues related to model performance, which are due to poor optimization rather than active malicious interference.
    • LLM Misalignment, where the model’s behavior diverges from expected outputs but is not the result of an attack.
  • See: Prompt Injection, Adversarial Attack on AI Models, Data Privacy in AI


References