WARNING 02-08 00:50:40 [vllm.py:1500] `torch.compile` is turned on, but the model zai-org/GLM-4.7-Flash does not support it. Please open an issue on GitHub if you want it to be supported.
Loading safetensors checkpoint shards: 92% Completed | 44/48 [00:26<00:02, 1.90it/s]
Loading safetensors checkpoint shards: 94% Completed | 45/48 [00:26<00:01, 1.84it/s]
Loading safetensors checkpoint shards: 96% Completed | 46/48 [00:27<00:01, 1.78it/s]
Loading safetensors checkpoint shards: 98% Completed | 47/48 [00:27<00:00, 1.74it/s]
Loading safetensors checkpoint shards: 100% Completed | 48/48 [00:28<00:00, 1.71it/s]
Loading safetensors checkpoint shards: 100% Completed | 48/48 [00:28<00:00, 1.68it/s]
glm-4-7-flash | (EngineCore_DP0 pid=321)
glm-4-7-flash | (EngineCore_DP0 pid=321) INFO 02-08 00:50:40 [default_loader.py:291] Loading weights took 28.57 seconds
glm-4-7-flash | (EngineCore_DP0 pid=321) INFO 02-08 00:50:40 [gpu_model_runner.py:4139] Loading drafter model...
glm-4-7-flash | (EngineCore_DP0 pid=321) WARNING 02-08 00:50:40 [vllm.py:1500] torch.compile is turned on, but the model zai-org/GLM-4.7-Flash does not support it. Please open an issue on GitHub if you want it to be supported.
Loading safetensors checkpoint shards: 0% Completed | 0/48 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 4% Completed | 2/48 [00:00<00:13, 3.52it/s]
Loading safetensors checkpoint shards: 19% Completed | 9/48 [00:00<00:02, 16.44it/s]
Loading safetensors checkpoint shards: 33% Completed | 16/48 [00:00<00:01, 27.28it/s]
Loading safetensors checkpoint shards: 48% Completed | 23/48 [00:00<00:00, 36.51it/s]
Loading safetensors checkpoint shards: 60% Completed | 29/48 [00:01<00:00, 25.57it/s]
Loading safetensors checkpoint shards: 77% Completed | 37/48 [00:01<00:00, 34.56it/s]
Loading safetensors checkpoint shards: 92% Completed | 44/48 [00:01<00:00, 28.55it/s]
Loading safetensors checkpoint shards: 100% Completed | 48/48 [00:01<00:00, 27.21it/s]