Configuration Parsing
Warning:
Invalid JSON for config file config.json
Quantization
Model quantized using NVIDIA Model Optimizer. Weights quantized to NVFP4, KV Cache quantized to FP8.
Serving with VLLM
Use VLLM version 0.12.0 or later. Set following environment variables prior to serving the model.
export VLLM_USE_FLASHINFER_MOE_FP4=1
export VLLM_FLASHINFER_MOE_BACKEND="throughput"
- Downloads last month
- 1,377
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for rahtml/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4
Base model
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16