Qwen3-coder

Qwen3-coder is a large language model optimized for code generation, completion, and analysis. It supports multi-language programming and excels at tool-calling, agentic workflows, and structured output.

View on GitHub

Parameters

32B

Architecture

Transformer (Decoder-only)

License

Apache 2.0

Configs

Hardware Configurations

GPU	Framework	Quantization	Throughput	Latency	Actions
RTX 4090	vLLM	fp16	3.8k tok/s	12ms	GitHub
A100	vLLM	fp8	6.2k tok/s	8ms	GitHub
H100	TGI	fp8	7.1k tok/s	6ms	GitHub
GB10	vLLM	nvfp4	8.4k tok/s	5ms	GitHub

RTX 4090

vLLM · fp16

Throughput

3.8k tok/s

Latency

12ms

GitHub

A100

vLLM · fp8

Throughput

6.2k tok/s

Latency

8ms

GitHub

H100

TGI · fp8

Throughput

7.1k tok/s

Latency

6ms

GitHub

GB10

vLLM · nvfp4

Throughput

8.4k tok/s

Latency

5ms

GitHub

Fork This Config

Create your own optimized configuration based on these community-verified settings.

Open Configurator