Model Management

Models are the LLM model definitions that TokenHub uses for routing decisions. Each model is associated with a provider and has properties that affect routing: capability weight, context window size, and pricing.

Default Models

TokenHub registers these default models at startup:

Model ID	Provider	Weight	Context	Input $/1K	Output $/1K
`gpt-4`	openai	8	128,000	$0.010	$0.030
`gpt-3.5-turbo`	openai	3	16,385	$0.0005	$0.0015
`claude-opus`	anthropic	10	200,000	$0.015	$0.075
`claude-sonnet`	anthropic	7	200,000	$0.003	$0.015

Defaults are overridden if persisted models exist in the database or are registered via the credentials file.

API Operations

Create or Update a Model

curl -X POST http://localhost:8080/admin/v1/models \
  -H "Content-Type: application/json" \
  -d '{
    "id": "gpt-4-turbo",
    "provider_id": "openai",
    "weight": 7,
    "max_context_tokens": 128000,
    "input_per_1k": 0.01,
    "output_per_1k": 0.03,
    "enabled": true
  }'

Or with tokenhubctl:

tokenhubctl model add '{"id":"gpt-4-turbo","provider_id":"openai","weight":7,"max_context_tokens":128000,"input_per_1k":0.01,"output_per_1k":0.03,"enabled":true}'

Field	Type	Required	Description
`id`	string	Yes	Model identifier (must match provider's model name)
`provider_id`	string	Yes	ID of the registered provider
`weight`	int	Yes	Capability weight (0-10); higher = more capable
`max_context_tokens`	int	Yes	Maximum context window in tokens
`input_per_1k`	float	Yes	Cost per 1,000 input tokens in USD
`output_per_1k`	float	Yes	Cost per 1,000 output tokens in USD
`enabled`	bool	Yes	Whether the model is available for routing

Model IDs can contain slashes (e.g., Qwen/Qwen2.5-Coder-32B-Instruct, nvidia/openai/gpt-oss-20b). The API handles them correctly.

List Models

curl http://localhost:8080/admin/v1/models
tokenhubctl model list

The tokenhubctl model list command merges models from both the persistent store and the runtime engine, so models registered via environment variables or the credentials file are also shown.

Patch a Model

Update individual fields without resending the full configuration:

curl -X PATCH http://localhost:8080/admin/v1/models/gpt-4o \
  -H "Content-Type: application/json" \
  -d '{
    "weight": 9,
    "enabled": true,
    "input_per_1k": 0.012
  }'

Or:

tokenhubctl model edit gpt-4o '{"weight":9}'

Patchable fields: weight, enabled, input_per_1k, output_per_1k, max_context_tokens.

Runtime-only models (those registered via env vars or credentials file but not in the store) can also be patched. The first patch creates a store record seeded from the engine's runtime data.

Enable / Disable a Model

Quick shortcuts via tokenhubctl:

tokenhubctl model enable gpt-4o
tokenhubctl model disable gpt-4o-legacy

Delete a Model

curl -X DELETE http://localhost:8080/admin/v1/models/gpt-4-legacy
tokenhubctl model delete gpt-4-legacy

Weight Guidelines

The model weight is the primary indicator of model capability used in routing decisions:

Weight	Intended For
1-3	Simple tasks, low cost (e.g., GPT-3.5 Turbo)
4-6	General purpose (e.g., GPT-4 Turbo, Claude Sonnet)
7-8	Complex reasoning (e.g., GPT-4, Claude Opus)
9-10	Highest capability (e.g., next-gen frontier models)

Different routing modes weight the capability score differently:

cheap mode barely considers weight (0.1 factor)
high_confidence and planning modes heavily favor higher weights (0.6-0.7 factor)
normal mode balances weight equally with cost, latency, and reliability (0.25 each)

Context Window

The max_context_tokens field tells the router whether a model can handle a given request size. The router applies a 15% headroom buffer — a model with 128,000 tokens can handle requests estimated up to ~108,000 tokens.

Token estimation uses estimated_input_tokens from the request if provided, otherwise falls back to a characters / 4 heuristic.

Pricing

Model pricing is used for:

Cost estimation: Returned in the response as estimated_cost_usd
Budget filtering: Models exceeding the request's max_budget_usd are excluded
Cost scoring: In routing modes that consider cost (especially cheap mode)

Keep pricing up to date as providers change their rates.

Audit Trail

Model mutations are logged:

model.upsert — Model created or updated
model.patch — Model partially updated
model.delete — Model removed

TokenHub Documentation