🤖lauren-ai
← Home
Export this page

Cost & Rate Tracking

Token budgets, cost estimation, and rate limiting.

Pricing

ModelPricing

Per-token pricing for a specific model.

All per_1k fields express USD cost per 1,000 tokens. The per_m aliases expose the same values scaled to per-million tokens for backward compatibility.

Parameters:

NameTypeDescription
input_per_1kfloatUSD cost per 1 000 input tokens.
output_per_1kfloatUSD cost per 1 000 output tokens.
cache_read_per_1kfloatUSD cost per 1 000 prompt-cache read tokens.

CostEstimate

USD cost breakdown for a set of token usages.

Parameters:

NameTypeDescription
input_usdfloatUSD cost for input tokens.
output_usdfloatUSD cost for output tokens.
cache_read_usdfloatUSD cost for prompt-cache read tokens.
cache_write_usdfloatUSD cost for prompt-cache write tokens.

PricingTable

Mapping of model name to ModelPricing for cost estimation.

Usage:

python
table = PricingTable(models={
    "claude-haiku-4-5": ModelPricing(input_per_m=0.80, output_per_m=4.00),
})
estimate = table.estimate("claude-haiku-4-5", usage)

Parameters:

NameTypeDescription
modelsdict[str, ModelPricing] | NoneMapping of model identifier to ModelPricing.

default_pricing_table

Return the built-in pricing table with current model prices.

Cost tracker

CostTracker

Injectable service that accumulates token usage from ModelCallComplete signals.

Usage:

python
# Register in module
@module(providers=[use_class(CostTracker, scope=Scope.SINGLETON)])
class AppModule: ...

# In a controller
async with self.cost.session(conversation_id=cid, user_id=uid) as session:
    result = await self.runner.run(agent, message)
    print(f"Cost: ${session.total_estimate.total_usd:.6f}")

CostSession

Context manager result from CostTracker.session().

CostReport

Aggregated cost report for a user or conversation.

Budgets & limits

TokenBudget

Per-conversation and per-user token/cost budget limits.

Checked BEFORE each LLM call; raises BudgetExceededError if the estimated next call would exceed the limit.

Usage:

python
budget = TokenBudget(
    max_tokens_per_conversation=50_000,
    max_usd_per_conversation=0.50,
)
config = LLMConfig(..., budget=budget)

BudgetExceededError

Raised before an LLM call that would exceed the configured budget.

Parameters:

NameTypeDescription
messagestrHuman-readable description of the exceeded limit.
limit_typestrCategory of limit (e.g. "tokens_per_conversation").
limitfloatThe configured budget ceiling.
currentfloatThe actual usage at the point the budget was exceeded. Also available as used for API compatibility.

RateLimiter

Token-bucket rate limiter with automatic retry on HTTP 429.

Usage:

python
config = LLMConfig(
    model="claude-haiku-4-5",
    rate_limiter=RateLimiter(
        requests_per_minute=60,
        tokens_per_minute=100_000,
        max_retries=5,
    ),
)

RateLimitExhaustedError

Raised when the rate limiter's max_retries is exhausted.

Parameters:

NameTypeDescription
messagestrHuman-readable description of the exhaustion.
limitintThe configured requests-per-minute limit (0 when no per-minute limit is configured).
retry_afterfloatSuggested number of seconds to wait before retrying, if known (0.0 otherwise).