Guardrails

Content safety filters for agent inputs and outputs.

Decorators

`guardrail`

Mark a class as a DI-injectable guardrail and register it as a provider.

Applying @guardrail() to a class does two things:

Sets GUARDRAIL_CLASS_META on the class (a GuardrailClassMeta instance) so the framework knows it is a guardrail implementation.
Calls @injectable(scope=scope) from the Lauren framework, registering the class as a DI singleton (or the requested scope) so it can be injected into other components via the DI container.

Must be called with parentheses. Bare @guardrail raises DecoratorUsageError.

Example — a custom DI-injectable input guardrail:

python

from lauren_ai import guardrail, GuardrailDecision, GuardrailContext

@guardrail(kind="input")
class ProfanityFilter:
    """Block messages containing profanity."""

    async def check(
        self, message: str, context: GuardrailContext
    ) -> GuardrailDecision:
        if any(w in message.lower() for w in ("badword",)):
            return GuardrailDecision(
                action="block",
                violation="Profanity detected.",
                guardrail_name="ProfanityFilter",
            )
        return GuardrailDecision(
            action="pass", guardrail_name="ProfanityFilter"
        )

The class is now resolvable from the Lauren DI container and can be injected by type into other providers or wiring classes:

python

@injectable(scope=Scope.SINGLETON)
class GuardrailWiring:
    def __init__(
        self,
        profanity_filter: ProfanityFilter,
        my_agent: MyAgent,
    ) -> None:
        # Attach the DI-resolved filter instance to the agent at startup
        meta = getattr(my_agent, USE_GUARDRAILS_META, None)
        if meta:
            meta.input_guardrails.append(profanity_filter)

Parameters:

Name	Type	Description
`kind`	`Literal['input', 'output', 'any']`	Hint for which position this guardrail is intended — `"input"` (runs before the model call), `"output"` (runs after), or `"any"` (either position). Does not affect runtime behaviour; used for documentation and static analysis only.
`scope`	`Any`	The DI scope to register the class under. Defaults to `Scope.SINGLETON` (the `scope` is resolved lazily from `lauren.Scope` to avoid a hard import at module load time).

Raises:

Exception	Description
`DecoratorUsageError`	When called without parentheses (bare `@guardrail`).

`use_guardrails`

Attach input/output guardrail instances to an @agent()-decorated class.

Analogous to @use_guards() in the Lauren framework — attaches pre-built guardrail objects to the agent so the runner can execute them before and after each LLM call.

Must be applied below @agent() (closer to the class body):

python

@agent(model="claude-haiku-4-5")
@use_guardrails(
    input=[TopicFilter(allowed_topics=["cooking"])],
    output=[PIIRedactor(entities=["EMAIL"])],
)
class CookingAssistant: ...

None entries are silently dropped, enabling conditional selection:

python

@agent(model="claude-opus-4-6")
@use_guardrails(
    input=[
        PromptInjectionFilter(),
        TopicFilter(allowed_topics=topics) if topics else None,
    ],
)
class DynamicAgent: ...

Input guardrails run before each LLM call. If any guardrail returns action="block" the model is never called and the violation message is returned to the caller. A "modify" decision replaces the user message before it is sent to the model.

Output guardrails run after the LLM response. A "block" decision raises GuardrailViolated. A "modify" decision replaces the response content before it reaches the caller.

Must be called with parentheses. Bare @use_guardrails raises DecoratorUsageError.

Parameters:

Name	Type	Description
`input`	`list[Any] \| None`	List of `InputGuardrail` instances (or `None` entries which are silently dropped) to run before each LLM call.
`output`	`list[Any] \| None`	List of `OutputGuardrail` instances (or `None` entries which are silently dropped) to run after each LLM call.

Raises:

Exception	Description
`DecoratorUsageError`	When called without parentheses (bare `@use_guardrails`).

Decision types

`GuardrailDecision`

Result of a guardrail check.

`GuardrailContext`

Per-call context passed to each guardrail check.

`GuardrailViolated`

Signal emitted when a guardrail fires.

`InputGuardrail`

Protocol for input guardrails -- check messages before LLM call.

`OutputGuardrail`

Protocol for output guardrails -- check/modify LLM responses.

Built-in guardrails

`TopicFilter`

Block messages not related to allowed topics using keyword/pattern matching.

For production use, pass embed_fn for embedding-based similarity. Without embed_fn, uses simple keyword matching.

Usage:

python

guard = TopicFilter(
    allowed_topics=["cooking", "recipes", "food"],
    violation_message="I only discuss cooking.",
)

`PIIRedactor`

Redact PII patterns from LLM outputs.

Uses regex patterns for EMAIL, PHONE, SSN, and CREDIT_CARD.

Usage:

python

guard = PIIRedactor(entities=["EMAIL", "PHONE"], replacement="[REDACTED]")

`LengthFilter`

Block messages outside min/max length limits.

Usage:

python

guard = LengthFilter(min_chars=1, max_chars=2000)

`PromptInjectionFilter`

Detect common prompt injection patterns in user input.

Usage:

python

guard = PromptInjectionFilter(violation_message="Prompt injection detected.")

`LLMGuardrail`

Use a secondary LLM call to judge whether content is safe.

The prompt must contain {content} which will be replaced with the text being evaluated.

Parameters:

Name	Type	Description
`llm`	`Any`	An `LLMService` (or any object with a compatible `.complete()` method) used to run the judgment call.
`prompt`	`str`	Judgment prompt; must contain the `{content}` placeholder.
`block_if`	`str`	String that, when found in the LLM's response (case-insensitive), triggers the guardrail action.
`violation_message`	`str`	Text returned to the caller on a trigger. When `action="modify"` this becomes the replacement content.
`action`	`Literal['block', 'modify']`	What to do when the guardrail triggers.

"block" (default) — returns a GuardrailDecision(action="block", ...) which causes the runner to raise GuardrailViolated.

"modify" — returns a GuardrailDecision(action="modify", modified_content=violation_message, ...) which replaces the agent's response without raising; useful for graceful redirects. | | system | str | None | Optional system prompt passed to the judgment call. Use this to set concise instructions such as "Answer YES or NO only." without baking them into the main prompt template. | | max_tokens | int | None | Maximum tokens for the judgment response. Set to a small value (e.g. 5) when you only need a YES/NO answer — significantly reduces cost and latency. | | temperature | float | None | Sampling temperature for the judgment call. 0.0 produces deterministic YES/NO answers. | | guardrail_name | str | Label attached to every GuardrailDecision emitted by this instance. Defaults to "LLMGuardrail" (previously was type(self).__name__).

Example:

python

guard = LLMGuardrail(
    llm=llm_service,
    prompt="Is this response off-topic?\n\n{content}\n\nAnswer YES or NO.",
    block_if="YES",
    action="modify",
    violation_message="I can't help with that. Let me redirect you.",
    system="Answer with YES or NO only.",
    max_tokens=5,
    temperature=0.0,
    guardrail_name="OffTopicGuard",
) |