Plan API
The plan endpoint provides multi-model orchestrated reasoning. It coordinates multiple LLM calls using different strategies to produce higher-quality outputs than a single model call.
Endpoint: POST /v1/plan
Request Format
{
"request": {
"messages": [
{"role": "user", "content": "Design a REST API for a task management app"}
]
},
"orchestration": {
"mode": "adversarial",
"iterations": 2,
"primary_model_id": "claude-opus",
"review_model_id": "gpt-4",
"primary_min_weight": 5,
"review_min_weight": 8,
"return_plan_only": false,
"output_schema": "{\"type\":\"object\"}"
}
}
Orchestration Modes
Adversarial
A three-phase plan-critique-refine loop:
- Plan: Primary model generates an initial plan
- Critique: Review model analyzes the plan and provides feedback
- Refine: Primary model improves the plan based on the critique
The critique-refine cycle repeats for the configured number of iterations.
{
"orchestration": {
"mode": "adversarial",
"iterations": 2
}
}
Response:
{
"negotiated_model": "claude-opus",
"estimated_cost_usd": 0.15,
"routing_reason": "adversarial-orchestration",
"response": {
"initial_plan": "Here is the initial API design...",
"critique": "The design has these issues: ...",
"refined_plan": "Here is the improved design addressing the feedback..."
}
}
Vote
Multiple models respond independently, then a judge model selects the best:
- N models (voters) each generate a response to the same prompt
- A judge model reviews all responses and selects the best one
{
"orchestration": {
"mode": "vote"
}
}
Response:
{
"negotiated_model": "gpt-4",
"estimated_cost_usd": 0.08,
"routing_reason": "vote-orchestration",
"response": {
"responses": [
{"model": "gpt-4", "content": "Response A...", "selected": true},
{"model": "claude-sonnet", "content": "Response B...", "selected": false},
{"model": "gpt-3.5-turbo", "content": "Response C...", "selected": false}
],
"selected": 0,
"judge": "claude-opus"
}
}
Refine
A single model iteratively improves its own response:
- Model generates an initial response
- Model reviews and refines its own response (repeats for N iterations)
{
"orchestration": {
"mode": "refine",
"iterations": 3
}
}
Response:
{
"negotiated_model": "claude-opus",
"estimated_cost_usd": 0.12,
"routing_reason": "refine-orchestration",
"response": {
"refined_response": "Final refined response...",
"iterations": 3,
"model": "claude-opus"
}
}
Planning
Simple single-route with the planning weight profile (prioritizes capable models):
{
"orchestration": {
"mode": "planning"
}
}
Orchestration Fields
| Field | Type | Default | Range | Description |
|---|---|---|---|---|
mode | string | planning | See above | Orchestration strategy |
iterations | int | 1-2 | 0-10 | Number of refinement iterations |
primary_model_id | string | — | — | Explicit model for primary phase |
review_model_id | string | — | — | Explicit model for review/judge phase |
primary_min_weight | int | 0 | 0-10 | Minimum weight for primary model |
review_min_weight | int | 0 | 0-10 | Minimum weight for review model |
return_plan_only | bool | false | — | Return plan without executing refinement |
output_schema | string | — | — | JSON Schema for structured output validation |
Explicit Model Selection
By default, TokenHub selects models using its routing engine. You can override this with explicit model IDs:
{
"orchestration": {
"mode": "adversarial",
"primary_model_id": "claude-opus",
"review_model_id": "gpt-4"
}
}
Alternatively, use primary_min_weight and review_min_weight to set capability floors without specifying exact models:
{
"orchestration": {
"mode": "adversarial",
"primary_min_weight": 7,
"review_min_weight": 9
}
}
Error Responses
| Status | Body | Cause |
|---|---|---|
| 400 | "messages required" | Empty messages array |
| 400 | "iterations must be between 0 and 10" | Invalid iteration count |
| 400 | "unknown orchestration mode" | Unrecognized mode value |
| 401 | "missing or invalid api key" | Authentication failure |
| 403 | "scope not allowed" | API key lacks plan scope |
| 502 | Error message | Orchestration failed (all models failed) |
Cost Considerations
Orchestration modes make multiple LLM calls. Approximate cost multipliers:
| Mode | Calls per Request | Typical Cost Multiplier |
|---|---|---|
| Planning | 1 | 1x |
| Adversarial (2 iter) | 5 (plan + 2x(critique + refine)) | 5x |
| Vote (3 voters) | 4 (3 voters + 1 judge) | 4x |
| Refine (3 iter) | 4 (initial + 3 refinements) | 4x |
Budget accordingly when setting max_budget_usd in your policy.