This benchmark measures raw API response speed across 15 models available through a unified OpenAI-compatible endpoint. All tests conducted on May 20, 2026, from us-east-1 (AWS), identical prompts, no caching, cold starts for every request.
Methodology
Each model tested with 3 prompt lengths (50, 500, 2000 tokens) generating 200 tokens output. 100 runs each, discard top/bottom 10, average middle 80. Network latency subtracted.
TTFT Results (milliseconds)
| Rank | Model | 50-tok | 500-tok | 2000-tok | Avg |
|---|---|---|---|---|---|
| 1 | DeepSeek V4 Flash | 87 | 142 | 310 | 180 |
| 2 | Qwen3-8B | 92 | 148 | 325 | 188 |
| 3 | Ga-Standard | 95 | 155 | 340 | 197 |
| 4 | Hunyuan-Turbo | 105 | 168 | 355 | 209 |
| 5 | Qwen3-32B | 110 | 175 | 370 | 218 |
Output Speed (tokens/second)
| Rank | Model | tok/s | Price $/M out | Speed/$ |
|---|---|---|---|---|
| 1 | Qwen3-8B | 156 | $0.01 | 15,600 |
| 2 | DeepSeek V4 Flash | 142 | $0.25 | 568 |
| 3 | Qwen3-32B | 128 | $0.28 | 457 |
Key Findings
Finding 1: DeepSeek V4 Flash is the speed king. 180ms avg TTFT at $0.25/M — unbeatable for latency-sensitive workloads.
Finding 2: Qwen3-8B has insane speed-per-dollar. 156 tok/s at $0.01/M — batch processing champion.
Finding 3: Ga-Standard routing adds only 17ms overhead. The smart routing layer is essentially free.
All tests via Global API. Raw data available on request.