Rate limits

Mneva runs a soft per-tenant cap rather than hard rejection — the brain stays open, and the agent learns when it's pushing the limits.

Soft cap

The reserve calculation in state is anchored against 60 requests per minute per tenant. That's the soft cap. We don't return 429 at 60 — we widen the allostatic state to overloaded and let the agent decide whether to slow down.

In practice nothing prevents you from going above 60 sustained. We log heavy use; we may follow up with high-volume customers about scale plans or self-hosting.

The harder cap is per-endpoint, not per-tenant: if any one tool call takes longer than 30 seconds, it times out at the engine layer and returns an error. This is a circuit breaker for stuck calls, not a throttle.

Allostatic feedback

The state tool (state) is the canonical place to see your current load. The intended pattern:

Agent calls state() periodically (start of session, before a heavy operation).
If state === "overloaded", the agent batches less, verifies more, defers low-priority work.
Reserve approaches zero → either step back, or accept being rate-pressured.

Mneva does not enforce this — it surfaces the signal and the agent has the autonomy to respect it. A power user can blow through the soft cap; that's a choice, not a bug.

Wander cost

Nightly wander runs at most once per tenant per day. The cost is bounded:

~~1 Anthropic Haiku call per tenant per night (~~$0.0005)
~1 local Ollama embedding per night (free)

There is no charge per tool call. Pricing is per-tier (see Plans).

If you self-host the engine (see Self-hosting), you choose your own LLM cost model — point OLLAMA_BASE and ANTHROPIC_API_KEY at whatever you like.