Prompt Injection Detector
Our Prompt Injection Detector provides robust defense against adversarial input manipulations aimed at large language models (LLMs). By promptly identifying and neutralizing these attacks, our detector ensures the LLM operates securely, preventing it from falling victim to injection attacks.
Vulnerability
Injection attacks, particularly within LLM contexts, can prompt the model to execute unintended actions. Attackers typically exploit these vulnerabilities in two main ways:
-
Direct Injection: Involves overwriting system prompts directly.
-
Indirect Injection: Involves altering inputs sourced from external channels.
Info
As specified by the OWASP Top 10 LLM attacks
, this vulnerability is categorized under:
LLM01: Prompt Injection - Manipulating LLMs via crafted inputs can lead to unauthorized access, data breaches, and compromised decision-making.
Configuration
from pegasi.shield.input_detectors import PromptInjection
pegasi = Shield()
input_detectors = [PromptInjection(threshold=0.7)]
sanitized_prompt, valid_results, risk_score = pegasi.scan_input(prompt, input_detectors)