Qwen3-coder

Qwen3-coder is a large language model optimized for code generation, completion, and analysis. It supports multi-language programming and excels at tool-calling, agentic workflows, and structured output.

View on GitHub

Parameters

32B

Architecture

Transformer (Decoder-only)

License

Apache 2.0

Configs

4

Hardware Configurations

RTX 4090

vLLM · fp16

Throughput

3.8k tok/s

Latency

12ms

GitHub

A100

vLLM · fp8

Throughput

6.2k tok/s

Latency

8ms

GitHub

H100

TGI · fp8

Throughput

7.1k tok/s

Latency

6ms

GitHub

GB10

vLLM · nvfp4

Throughput

8.4k tok/s

Latency

5ms

GitHub

Fork This Config

Create your own optimized configuration based on these community-verified settings.

Open Configurator