AgentUQ Docs
Runtime reliability gates for LLM agents
AgentUQ turns provider-native token logprobs into localized runtime actions for agent steps. It does not claim to know whether an output is true. It tells you where a generation looked brittle or ambiguous and whether the workflow should continue, annotate the trace, regenerate a risky span, retry the step, dry-run verify, ask for confirmation, or block execution.
The documentation in this site is the web-first presentation layer. Canonical
docs stay in this repo under docs/, while the Docusaurus app lives
in website/.
AgentUQ is open source under the MIT License.
Localized risk, not one opaque score Catch brittle SQL clauses, tool arguments, selectors, URLs, paths, shell flags, and JSON leaves before execution.
Probability honesty over fake certainty Use canonical scoring only for genuinely greedy runs and realized scoring for the path that actually came out.
Cheap first-pass gating Put a lightweight reliability layer in front of slower systems such as retrieval, semantic verification, sandboxing, or human review.
Where to start
Install and orient Set up the package, understand the repo layout, and get the baseline runtime contract into your head quickly.
Run the minimal loop
Capture a model response, analyze it, and branch on result.decision.action.
Pick your surface Go straight to OpenAI, OpenRouter, LiteLLM, Gemini, Fireworks, Together, LangChain, LangGraph, or OpenAI Agents. OpenAI Responses is stable; the other integration surfaces are preview.
What decisions AgentUQ can trigger
continue: proceed normally.continue_with_annotation: proceed, but attach the result to logs, traces, or monitoring.regenerate_segment: retry only the risky leaf or clause when structured repair is available.retry_step/retry_step_with_constraints: rerun the whole model step.
dry_run_verify: run a safe validator before execution.ask_user_confirmation: pause before a side effect.block_execution: fail closed before anything external happens.custom,emit_webhook, andescalate_to_human: dispatch through your own policy layer.
For concrete routing patterns, see Acting on decisions. For the statistical framing, read Research grounding.
Documentation map
The rendered docs optimize for scanning, navigation, and professionalism on the web. The underlying Markdown files in the repo remain canonical, auditable, and versionable.