Model Management
Models are the LLM model definitions that TokenHub uses for routing decisions. Each model is associated with a provider and has properties that affect routing: capability weight, context window size, and pricing.
Default Models
TokenHub registers these default models at startup:
| Model ID | Provider | Weight | Context | Input $/1K | Output $/1K |
|---|---|---|---|---|---|
gpt-4 | openai | 8 | 128,000 | $0.010 | $0.030 |
gpt-3.5-turbo | openai | 3 | 16,385 | $0.0005 | $0.0015 |
claude-opus | anthropic | 10 | 200,000 | $0.015 | $0.075 |
claude-sonnet | anthropic | 7 | 200,000 | $0.003 | $0.015 |
Defaults are overridden if persisted models exist in the database or are registered via the credentials file.
API Operations
Create or Update a Model
curl -X POST http://localhost:8080/admin/v1/models \
-H "Content-Type: application/json" \
-d '{
"id": "gpt-4-turbo",
"provider_id": "openai",
"weight": 7,
"max_context_tokens": 128000,
"input_per_1k": 0.01,
"output_per_1k": 0.03,
"enabled": true
}'
Or with tokenhubctl:
tokenhubctl model add '{"id":"gpt-4-turbo","provider_id":"openai","weight":7,"max_context_tokens":128000,"input_per_1k":0.01,"output_per_1k":0.03,"enabled":true}'
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Model identifier (must match provider's model name) |
provider_id | string | Yes | ID of the registered provider |
weight | int | Yes | Capability weight (0-10); higher = more capable |
max_context_tokens | int | Yes | Maximum context window in tokens |
input_per_1k | float | Yes | Cost per 1,000 input tokens in USD |
output_per_1k | float | Yes | Cost per 1,000 output tokens in USD |
enabled | bool | Yes | Whether the model is available for routing |
Model IDs can contain slashes (e.g., Qwen/Qwen2.5-Coder-32B-Instruct, nvidia/openai/gpt-oss-20b). The API handles them correctly.
List Models
curl http://localhost:8080/admin/v1/models
tokenhubctl model list
The tokenhubctl model list command merges models from both the persistent store and the runtime engine, so models registered via environment variables or the credentials file are also shown.
Patch a Model
Update individual fields without resending the full configuration:
curl -X PATCH http://localhost:8080/admin/v1/models/gpt-4o \
-H "Content-Type: application/json" \
-d '{
"weight": 9,
"enabled": true,
"input_per_1k": 0.012
}'
Or:
tokenhubctl model edit gpt-4o '{"weight":9}'
Patchable fields: weight, enabled, input_per_1k, output_per_1k, max_context_tokens.
Runtime-only models (those registered via env vars or credentials file but not in the store) can also be patched. The first patch creates a store record seeded from the engine's runtime data.
Enable / Disable a Model
Quick shortcuts via tokenhubctl:
tokenhubctl model enable gpt-4o
tokenhubctl model disable gpt-4o-legacy
Delete a Model
curl -X DELETE http://localhost:8080/admin/v1/models/gpt-4-legacy
tokenhubctl model delete gpt-4-legacy
Weight Guidelines
The model weight is the primary indicator of model capability used in routing decisions:
| Weight | Intended For |
|---|---|
| 1-3 | Simple tasks, low cost (e.g., GPT-3.5 Turbo) |
| 4-6 | General purpose (e.g., GPT-4 Turbo, Claude Sonnet) |
| 7-8 | Complex reasoning (e.g., GPT-4, Claude Opus) |
| 9-10 | Highest capability (e.g., next-gen frontier models) |
Different routing modes weight the capability score differently:
cheapmode barely considers weight (0.1 factor)high_confidenceandplanningmodes heavily favor higher weights (0.6-0.7 factor)normalmode balances weight equally with cost, latency, and reliability (0.25 each)
Context Window
The max_context_tokens field tells the router whether a model can handle a given request size. The router applies a 15% headroom buffer — a model with 128,000 tokens can handle requests estimated up to ~108,000 tokens.
Token estimation uses estimated_input_tokens from the request if provided, otherwise falls back to a characters / 4 heuristic.
Pricing
Model pricing is used for:
- Cost estimation: Returned in the response as
estimated_cost_usd - Budget filtering: Models exceeding the request's
max_budget_usdare excluded - Cost scoring: In routing modes that consider cost (especially
cheapmode)
Keep pricing up to date as providers change their rates.
Audit Trail
Model mutations are logged:
model.upsert— Model created or updatedmodel.patch— Model partially updatedmodel.delete— Model removed