AgentUQ Docs

Docs v0.1.0

Runtime reliability gates for LLM agents

AgentUQ turns provider-native token logprobs into localized runtime actions for agent steps. It does not claim to know whether an output is true. It tells you where a generation looked brittle or ambiguous and whether the workflow should continue, annotate the trace, regenerate a risky span, retry the step, dry-run verify, ask for confirmation, or block execution.

The documentation in this site is the web-first presentation layer. Canonical docs stay in this repo under docs/, while the Docusaurus app lives in website/.

AgentUQ is open source under the MIT License.

Localized risk, not one opaque score Catch brittle SQL clauses, tool arguments, selectors, URLs, paths, shell flags, and JSON leaves before execution.
Probability honesty over fake certainty Use canonical scoring only for genuinely greedy runs and realized scoring for the path that actually came out.
Cheap first-pass gating Put a lightweight reliability layer in front of slower systems such as retrieval, semantic verification, sandboxing, or human review.

Where to start

Install and orient Set up the package, understand the repo layout, and get the baseline runtime contract into your head quickly.

Run the minimal loop Capture a model response, analyze it, and branch on result.decision.action.

Pick your surface Go straight to OpenAI, OpenRouter, LiteLLM, Gemini, Fireworks, Together, LangChain, LangGraph, or OpenAI Agents. OpenAI Responses is stable; the other integration surfaces are preview.

What decisions AgentUQ can trigger

continue: proceed normally.
continue_with_annotation: proceed, but attach the result to logs, traces, or monitoring.
regenerate_segment: retry only the risky leaf or clause when structured repair is available.
retry_step / retry_step_with_constraints: rerun the whole model step.

dry_run_verify: run a safe validator before execution.
ask_user_confirmation: pause before a side effect.
block_execution: fail closed before anything external happens.
custom, emit_webhook, and escalate_to_human: dispatch through your own policy layer.

For concrete routing patterns, see Acting on decisions. For the statistical framing, read Research grounding.

Documentation map

Quickstarts Surface-specific setup and minimal code paths for providers and frameworks.

Concepts The mental model behind scoring modes, segmentation, capability honesty, policy, and tuning.

Reference Split API docs for the root surface, configs, results, adapters, integrations, and public errors.

Read this site as a DX layer, not as a separate source of truth

The rendered docs optimize for scanning, navigation, and professionalism on the web. The underlying Markdown files in the repo remain canonical, auditable, and versionable.