Qwen3-coder
Qwen3-coder is a large language model optimized for code generation, completion, and analysis. It supports multi-language programming and excels at tool-calling, agentic workflows, and structured output.
Parameters
32B
Architecture
Transformer (Decoder-only)
License
Apache 2.0
Configs
4
Hardware Configurations
| GPU | Framework | Quantization | Throughput | Latency | Actions |
|---|---|---|---|---|---|
| RTX 4090 | vLLM | fp16 | 3.8k tok/s | 12ms | |
| A100 | vLLM | fp8 | 6.2k tok/s | 8ms | |
| H100 | TGI | fp8 | 7.1k tok/s | 6ms | |
| GB10 | vLLM | nvfp4 | 8.4k tok/s | 5ms |
Fork This Config
Create your own optimized configuration based on these community-verified settings.
Open Configurator