Configuration Parsing Warning: Invalid JSON for config file config.json

Quantization

Model quantized using NVIDIA Model Optimizer. Weights quantized to NVFP4, KV Cache quantized to FP8.

Serving with VLLM

Use VLLM version 0.12.0 or later. Set following environment variables prior to serving the model.

export VLLM_USE_FLASHINFER_MOE_FP4=1
export VLLM_FLASHINFER_MOE_BACKEND="throughput"
Downloads last month
1,377
Safetensors
Model size
16B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for rahtml/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4

Finetuned
(18)
this model