Skip to content

feat: add ai-lakera-guard plugin for prompt injection and PII detection #13291

@nic-6443

Description

@nic-6443

Description

Add an ai-lakera-guard plugin that integrates with the Lakera Guard API for LLM request/response security scanning — covering prompt injection, jailbreak detection, PII leakage, and content moderation.

Background:

APISIX currently has ai-prompt-guard (regex-based allow/deny), ai-aws-content-moderation (AWS Comprehend), and ai-aliyun-content-moderation (Aliyun SaaS). However, there is no integration with Lakera Guard, one of the most widely adopted commercial prompt security services. Kong already ships an ai-lakera-guard plugin; LiteLLM also supports Lakera as a guardrail backend.

Lakera Guard provides ML-based detection that goes beyond regex patterns — it catches semantic prompt injection, jailbreak attempts, PII entities, profanity, and unknown/malicious links through a single API call.

Proposed design (for reference, open to adjustment during implementation):

plugins:
  ai-lakera-guard:
    direction: both              # input / output / both
    action: block                # block / alert (log only)

    endpoint:
      url: "https://api.lakera.ai/v2/guard"
      api_key_ref: "$secret://vault/kv/lakera/..."
      timeout_ms: 1000

    categories:
      - prompt_injection
      - jailbreak
      - pii
      - profanity
      - unknown_links

    reveal_failure_categories: true   # include matched categories in error response (for debugging)

    fail_open: false             # reject requests if Lakera API is unavailable (safety-first default)

    on_block:
      status: 400
      body: "Request blocked by security guard"

Key design points:

  1. Follows APISIX's existing pattern of one plugin per external service (same as ai-aws-content-moderation, ai-aliyun-content-moderation)
  2. Supports bidirectional scanning (both input prompts and LLM responses)
  3. Users bring their own Lakera account and API key — the plugin is just the connector
  4. fail_open: false as the safety-first default; users can override for availability-first scenarios
  5. reveal_failure_categories helps debugging in dev; can be disabled in production
  6. Category selection lets users enable only the checks they need
  7. Integrates with APISIX secret management ($secret://) for API key storage

This plugin is part of a broader effort to build out APISIX's AI security plugin ecosystem. Other plugins in the family (like regex-based PII sanitizer, Presidio integration, Llama Guard integration) can be proposed separately. This issue focuses specifically on the Lakera Guard integration.

Happy to submit a PR if this direction makes sense.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status

📋 Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions