Throughput NVFP4 on Dual 6000 Blackwells
#2
by
zenmagnets
- opened
For this model on your Dual RTX 6000 Pro machine, you got ~73 tok/s output tokens.
But on a similar system (Also Dual RTX 6000 Pro), you mention that there's "Something very wrong" with the setup to be getting only 91 tok/s on Minimax m2.5 NVFP4. A similar arch. https://huggingface.co/lukealonso/MiniMax-M2.5-NVFP4/discussions/1#699539896b63e3e83c1eb6ed
Please elaborate what you think I'm doing wrong please! Would love to get more than 91 tok/s.