🤖lauren-ai
← Home
Export this page

Configuration

Frozen dataclasses that configure the LLM provider and agent behaviour.

LLMConfig

python
class LLMConfig(provider: Literal['anthropic', 'openai', 'ollama', 'litellm'], model: str, api_key: str | None = None, base_url: str | None = None, max_tokens: int = 4096, temperature: float = 1.0, timeout: float = 60.0, max_retries: int = 3, cache_system_prompt: bool = False, cache_tools: bool = False, embed_model: str | None = None, embed_dimensions: int | None = None)

Immutable configuration for an LLM provider connection.

Parameters:

NameTypeDescription
providerLiteral['anthropic', 'openai', 'ollama', 'litellm']The backend provider. One of "anthropic", "openai", "ollama", or "litellm".
modelstrThe model identifier, e.g. "claude-opus-4-6" or "gpt-4o".
api_keystr | NoneProvider API key. When None the provider-specific environment variable is used (e.g. ANTHROPIC_API_KEY).
base_urlstr | NoneOverride the provider's default base URL. Useful for proxies, self-hosted deployments, or Ollama.
max_tokensintMaximum tokens to generate per completion call.
temperaturefloatSampling temperature (0.0–2.0 for most providers).
timeoutfloatHTTP request timeout in seconds.
max_retriesintMaximum number of automatic retries on transient errors.
cache_system_promptboolEnable Anthropic prompt caching for the system prompt. No-op on other providers.
cache_toolsboolEnable Anthropic prompt caching for the tool definitions. No-op on other providers.
embed_modelstr | NoneModel to use for embedding calls. Defaults to model when None.
embed_dimensionsint | NoneDesired embedding dimensionality. Passed to providers that support truncated embeddings.

LLMConfig.for_anthropic

python
def for_anthropic(cls, model: str = 'claude-opus-4-6', api_key: str | None = None, kwargs: Any = {}) -> LLMConfig

Create a config pre-wired for Anthropic.

The API key is read from the ANTHROPIC_API_KEY environment variable when api_key is None.

Parameters:

NameTypeDescription
modelstrAnthropic model identifier. Defaults to "claude-opus-4-6".
api_keystr | NoneAnthropic API key. Falls back to os.environ["ANTHROPIC_API_KEY"].
kwargsAnyAdditional keyword arguments forwarded verbatim to the LLMConfig constructor.

Returns: LLMConfig — A fully-initialised LLMConfig for Anthropic.

LLMConfig.for_openai

python
def for_openai(cls, model: str = 'gpt-4o', api_key: str | None = None, kwargs: Any = {}) -> LLMConfig

Create a config pre-wired for OpenAI.

The API key is read from the OPENAI_API_KEY environment variable when api_key is None.

Parameters:

NameTypeDescription
modelstrOpenAI model identifier. Defaults to "gpt-4o".
api_keystr | NoneOpenAI API key. Falls back to os.environ["OPENAI_API_KEY"].
kwargsAnyAdditional keyword arguments forwarded verbatim to the LLMConfig constructor.

Returns: LLMConfig — A fully-initialised LLMConfig for OpenAI.

LLMConfig.for_ollama

python
def for_ollama(cls, model: str = 'llama3.2', base_url: str = 'http://localhost:11434', kwargs: Any = {}) -> LLMConfig

Create a config pre-wired for a local Ollama server.

No API key is required. The default base_url points to a locally-running Ollama instance.

Parameters:

NameTypeDescription
modelstrOllama model tag, e.g. "llama3.2" or "mistral". Defaults to "llama3.2".
base_urlstrOllama server URL. Defaults to "http://localhost:11434".
kwargsAnyAdditional keyword arguments forwarded verbatim to the LLMConfig constructor.

Returns: LLMConfig — A fully-initialised LLMConfig for Ollama.

LLMConfig.for_testing

python
def for_testing(cls) -> tuple[LLMConfig, MockTransport]

Create a test config paired with a MockTransport.

No network calls will ever be made. Queue deterministic responses on the returned MockTransport instance before running your code under test.

Returns: `tuple[LLMConfig, MockTransport]

Example:

python
cfg, mock = LLMConfig.for_testing()
mock.queue_response(
    Completion(
        id="test-1",
        model="mock-model",
        content="Hello!",
        tool_calls=[],
        stop_reason="end_turn",
        usage=TokenUsage(input_tokens=10, output_tokens=5),
    )
)` — A 2-tuple of `(LLMConfig, MockTransport)`.

AgentConfig

python
class AgentConfig(system_prompt: str = 'You are a helpful assistant.', max_turns: int = 10, max_tokens_per_turn: int = 4096, temperature: float = 1.0, memory_window_tokens: int = 40000, max_cost_usd: float | None = None, parallel_tool_calls: bool = False, tool_error_policy: Literal['raise', 'return_error', 'skip'] = 'return_error', thinking: bool = False, thinking_budget_tokens: int = 8000, reasoning_effort: Literal['low', 'medium', 'high'] | None = None, include_reasoning_in_response: bool = False, summarize_at: float | None = None, summary_model: str | None = None)

Immutable configuration for an agent's runtime behaviour.

Parameters:

NameTypeDescription
system_promptstrThe system prompt sent to the LLM at the start of every turn.
max_turnsintMaximum number of agentic loop iterations before AgentMaxTurnsError is raised.
max_tokens_per_turnintMaximum output tokens requested per turn.
temperaturefloatSampling temperature for this agent. Overrides the LLMConfig temperature when set.
memory_window_tokensintSliding-window size in tokens for conversation history passed to the model.
max_cost_usdfloat | NoneHard cost budget in USD. The runner checks after each turn and raises AgentBudgetExceededError when exceeded. None means unlimited.
parallel_tool_callsboolWhen True all tool calls in a single model turn are executed concurrently. Defaults to False to preserve deterministic ordering guarantees.
tool_error_policyLiteral['raise', 'return_error', 'skip']How to handle a tool execution error:
  • "raise" — re-raise the exception immediately.
  • "return_error" — send the error message back to the model as a tool result so it can decide how to proceed.
  • "skip" — silently omit the failing tool result. | | thinking | bool | Enable extended thinking (Anthropic only). | | thinking_budget_tokens | int | Token budget for the thinking phase when thinking=True. | | reasoning_effort | Literal['low', 'medium', 'high'] | None | OpenAI reasoning effort for o1/o3 models ("low", "medium", or "high"). None means the provider default. | | include_reasoning_in_response | bool | When True thinking blocks are included in the Completion response. | | summarize_at | float | None | Fraction of memory_window_tokens at which the runner triggers an automatic context-window summarisation. For example, 0.8 means "summarise when 80 % of the token budget is consumed". Older turns are compressed into a single block that is prepended to the system prompt so it is always preserved. None (the default) disables summarisation — older messages are silently dropped as before. | | summary_model | str | None | Model identifier to use for the summarisation LLM call. Set this to a cheaper / faster model (e.g. "claude-haiku-4-5") to reduce cost while keeping the agent's main model for reasoning. None (the default) reuses the same model as the agent. |