Inference Provider Benchmark

SambaNova SN40L vs Major Inference Providers

A focused comparison across MiniMax M2.5, MiniMax M2.7, DeepSeek V3.1, and gpt-oss-120b, highlighting throughput, latency, and agent-readiness for production open-weight inference.

Built for the inference bottleneck: moving large models fast.

SambaNova's SN40L RDU uses dataflow execution and a three-tier memory design to improve data locality and reduce repeated model-load overhead. The result is a platform that is especially strong when agentic workloads need sustained tokens, long context, model switching, and OpenAI-compatible deployment paths.

MiniMax M2.7 #1 speed

424.8 output tokens/s in the latest Artificial Analysis provider snapshot.

gpt-oss-120b Top 5

691.7 output tokens/s and top-five first-answer latency among 22 providers.

DeepSeek V3.1 #2 speed

187.8 output tokens/s on non-reasoning mode, within striking distance of first place.

Agent Stack One API

Run planner and executor models behind a single OpenAI-compatible SambaCloud interface.

Output Throughput

Tokens per second measures generation speed after the model starts responding. Higher is better for coding agents, long answers, and multi-step workflows.

MiniMax M2.7

SambaNova424.8 t/s
Together.ai406.6 t/s
Fireworks64.6 t/s

gpt-oss-120b (high)

Cerebras1,979.6 t/s
Fireworks768.1 t/s
Nebius Fast701.5 t/s
SambaNova691.7 t/s

DeepSeek V3.1 non-reasoning

Baseten FP8191.7 t/s
SambaNova187.8 t/s
Amazon181.8 t/s

First Answer Latency

Time to first answer token matters when a user or agent is waiting for the next step. Lower is better.

#2 of 6

MiniMax M2.7

8.47s

SambaNova is second only to Together.ai on first-answer latency while taking the top throughput position.

#5 of 22

gpt-oss-120b (high)

4.17s

SambaNova lands in the top-five latency group while also sitting in the top speed tier.

#2 reasoning

DeepSeek V3.1

21.02s

For reasoning mode, SambaNova is #2 for first-answer latency and #2 for output speed.

Agent Readiness

For production agents, speed is only one part of the stack. Context length, tool use, JSON mode, and model routing determine how useful the platform is.

Executor

MiniMax M2.7

192k context on SambaCloud. Strong fit for high-volume code edits, repo traversal, and tool-heavy agent execution.

Legacy executor

MiniMax M2.5

Supported by SambaNova's Responses API and useful where existing evaluations or integrations already depend on M2.5.

Planner

DeepSeek V3.1

128k context, hybrid thinking and non-thinking modes, and strong coding-agent positioning for planning and reasoning tasks.

Reasoning

gpt-oss-120b

128k context, open weights, function calling support, and a familiar OpenAI-compatible path for agent deployments.

Model Throughput Latency SN40L Positioning Best Fit
MiniMax-M2.7 #1 speed
424.8 t/s, ahead of Together.ai and far ahead of Fireworks in the current snapshot.
#2 first-answer latency among measured providers. Best proof point for SN40L: high sustained generation on an agent-native open-weight model. Coding agents, execution loops, refactors, migrations, and high-volume tool use.
MiniMax-M2.5 SambaNova has publicly positioned M2.5 at over 300 t/s on SambaCloud. Use as a compatibility option rather than the headline benchmark model. Shows continuity across MiniMax generations while M2.7 becomes the stronger speed story. Existing M2.5 workflows, pilots, and migration paths into M2.7.
DeepSeek-V3.1 #2 speed
187.8 t/s non-reasoning, close to Baseten FP8 at 191.7 t/s.
#2 first-answer latency on reasoning mode in the benchmarked provider set. Demonstrates SN40L strength on very large MoE models where memory movement is the bottleneck. Planner models, coding reasoning, maths, and mixed planner/executor agent stacks.
gpt-oss-120b Top 5 of 22
691.7 t/s on the high reasoning setting. Cerebras leads raw speed, but SambaNova sits in the elite provider group.
4.17s first-answer latency, top-five among 22 benchmarked providers. Strong enterprise alternative for open-weight reasoning with OpenAI-compatible integration. Reasoning agents, structured tool use, and teams standardising on open-weight models.

Benchmark snapshot prepared May 27, 2026. Public speeds vary by prompt length, reasoning mode, batching, region, and provider load. Sources: Artificial Analysis MiniMax M2.7, Artificial Analysis gpt-oss-120b, Artificial Analysis DeepSeek V3.1, SambaCloud model docs, SambaNova MiniMax M2.5 note, SambaNova Responses API note, and SambaNova SN40L RDU paper.