Prometheus Metrics

TokenHub exports Prometheus metrics at the /metrics endpoint.

Available Metrics

tokenhub_requests_total

Type: Counter

Total number of requests processed.

Labels:

LabelValuesDescription
modecheap, normal, high_confidence, planning, adversarial, thompsonRouting mode used
modelgpt-4, claude-opus, etc.Model that handled the request
provideropenai, anthropic, vllmProvider adapter
statusok, errorRequest outcome

Examples:

# Total successful requests
tokenhub_requests_total{status="ok"}

# Request rate by provider
rate(tokenhub_requests_total[5m])

# Error rate
sum(rate(tokenhub_requests_total{status="error"}[5m]))
  /
sum(rate(tokenhub_requests_total[5m]))

tokenhub_request_latency_ms

Type: Histogram

Request latency distribution in milliseconds.

Labels:

LabelValuesDescription
modecheap, normal, etc.Routing mode
modelgpt-4, etc.Model ID
provideropenai, etc.Provider ID

Buckets: 10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120 ms (exponential, base 2)

Examples:

# Median latency
histogram_quantile(0.5, rate(tokenhub_request_latency_ms_bucket[5m]))

# P95 latency
histogram_quantile(0.95, rate(tokenhub_request_latency_ms_bucket[5m]))

# P99 latency by model
histogram_quantile(0.99, sum(rate(tokenhub_request_latency_ms_bucket[5m])) by (model, le))

# Average latency
rate(tokenhub_request_latency_ms_sum[5m]) / rate(tokenhub_request_latency_ms_count[5m])

tokenhub_cost_usd_total

Type: Counter

Cumulative estimated cost in USD.

Labels:

LabelValuesDescription
modelgpt-4, etc.Model ID
provideropenai, etc.Provider ID

Examples:

# Total cost in the last hour
increase(tokenhub_cost_usd_total[1h])

# Cost rate (USD per second)
rate(tokenhub_cost_usd_total[5m])

# Cost per hour by model
rate(tokenhub_cost_usd_total[1h]) * 3600

# Most expensive model
topk(3, sum(rate(tokenhub_cost_usd_total[1h])) by (model))

Grafana Dashboard

Suggested Panels

PanelQueryVisualization
Request Ratesum(rate(tokenhub_requests_total[5m]))Time series
Error RateError rate formula aboveGauge (0-100%)
P95 LatencyP95 formula aboveTime series
Cost per HourCost rate * 3600Stat
Requests by Modelsum by (model) (rate(tokenhub_requests_total[5m]))Pie chart
Latency Heatmaptokenhub_request_latency_ms_bucketHeatmap

Scrape Configuration

# prometheus.yml
scrape_configs:
  - job_name: tokenhub
    scrape_interval: 15s
    metrics_path: /metrics
    static_configs:
      - targets: ['tokenhub:8080']

For Docker Compose, use the service name as the target.