Inference Provider Benchmark
SambaNova SN40L vs Major Inference Providers
A focused comparison across MiniMax M2.5, MiniMax M2.7, DeepSeek V3.1, and gpt-oss-120b, highlighting throughput, latency, and agent-readiness for production open-weight inference.
Built for the inference bottleneck: moving large models fast.
SambaNova's SN40L RDU uses dataflow execution and a three-tier memory design to improve data locality and reduce repeated model-load overhead. The result is a platform that is especially strong when agentic workloads need sustained tokens, long context, model switching, and OpenAI-compatible deployment paths.
424.8 output tokens/s in the latest Artificial Analysis provider snapshot.
691.7 output tokens/s and top-five first-answer latency among 22 providers.
187.8 output tokens/s on non-reasoning mode, within striking distance of first place.
Run planner and executor models behind a single OpenAI-compatible SambaCloud interface.
Output Throughput
Tokens per second measures generation speed after the model starts responding. Higher is better for coding agents, long answers, and multi-step workflows.
MiniMax M2.7
gpt-oss-120b (high)
DeepSeek V3.1 non-reasoning
First Answer Latency
Time to first answer token matters when a user or agent is waiting for the next step. Lower is better.
MiniMax M2.7
8.47sSambaNova is second only to Together.ai on first-answer latency while taking the top throughput position.
gpt-oss-120b (high)
4.17sSambaNova lands in the top-five latency group while also sitting in the top speed tier.
DeepSeek V3.1
21.02sFor reasoning mode, SambaNova is #2 for first-answer latency and #2 for output speed.
Agent Readiness
For production agents, speed is only one part of the stack. Context length, tool use, JSON mode, and model routing determine how useful the platform is.
MiniMax M2.7
192k context on SambaCloud. Strong fit for high-volume code edits, repo traversal, and tool-heavy agent execution.
MiniMax M2.5
Supported by SambaNova's Responses API and useful where existing evaluations or integrations already depend on M2.5.
DeepSeek V3.1
128k context, hybrid thinking and non-thinking modes, and strong coding-agent positioning for planning and reasoning tasks.
gpt-oss-120b
128k context, open weights, function calling support, and a familiar OpenAI-compatible path for agent deployments.
| Model | Throughput | Latency | SN40L Positioning | Best Fit |
|---|---|---|---|---|
| MiniMax-M2.7 | #1 speed 424.8 t/s, ahead of Together.ai and far ahead of Fireworks in the current snapshot. |
#2 first-answer latency among measured providers. | Best proof point for SN40L: high sustained generation on an agent-native open-weight model. | Coding agents, execution loops, refactors, migrations, and high-volume tool use. |
| MiniMax-M2.5 | SambaNova has publicly positioned M2.5 at over 300 t/s on SambaCloud. | Use as a compatibility option rather than the headline benchmark model. | Shows continuity across MiniMax generations while M2.7 becomes the stronger speed story. | Existing M2.5 workflows, pilots, and migration paths into M2.7. |
| DeepSeek-V3.1 | #2 speed 187.8 t/s non-reasoning, close to Baseten FP8 at 191.7 t/s. |
#2 first-answer latency on reasoning mode in the benchmarked provider set. | Demonstrates SN40L strength on very large MoE models where memory movement is the bottleneck. | Planner models, coding reasoning, maths, and mixed planner/executor agent stacks. |
| gpt-oss-120b | Top 5 of 22 691.7 t/s on the high reasoning setting. Cerebras leads raw speed, but SambaNova sits in the elite provider group. |
4.17s first-answer latency, top-five among 22 benchmarked providers. | Strong enterprise alternative for open-weight reasoning with OpenAI-compatible integration. | Reasoning agents, structured tool use, and teams standardising on open-weight models. |
Benchmark snapshot prepared May 27, 2026. Public speeds vary by prompt length, reasoning mode, batching, region, and provider load. Sources: Artificial Analysis MiniMax M2.7, Artificial Analysis gpt-oss-120b, Artificial Analysis DeepSeek V3.1, SambaCloud model docs, SambaNova MiniMax M2.5 note, SambaNova Responses API note, and SambaNova SN40L RDU paper.