LLAMA.cpp for SnapDragon on Windows ARM64

#8
by Berowne - opened

Hi all. I have LM Studio installed and can run many SLMs on CPU mode, including Llama-v2-7B-Chat and PHi3 and even a 20B model... BUT the model doesn't use the GPU or NPU.
Closest I've come to finding a working llama.cpp file for SnapDragon on windows is https://github.com/ggerganov/llama.cpp/discussions/8336#discussioncomment-10472433

Then I joined Qualcomm tech program and found this document...
https://docs.qualcomm.com/bundle/publicresource/topics/80-62010-1/genai-llama-cpp.html
but there is currently no LLVM ARM version pre-compiled.

I would rather develop my agents and compare SLMs than build these fundamental tools in C++. But it looks like I might have to. If there is a SnapDragon llama.cpp download for windows native or WSL, I'm keen to try it.

Qualcomm org

We are working on an example app that will demonstrate how to deploy Llama on Windows. Stay tuned!

@gustlars , any update on the example app?

Qualcomm org
meghan3 changed discussion status to closed

We are now in 2026 and my LMstudio still doesn't show NPU or GPU support on my asus A14 (Qualcomm X Elite 78), even vulkan llama.cpp does not work on ARM64 cpu's
The only thing that works is CPU inference, but that's slow ... very slow and make the laptop heats up a lot.
When I use my intel cpu (meteor lake from 2024 it works two times as fast because it uses the GPU)

Everythgingf that i find about the qualcomm AI hub requires advanced skills or the links are broken. There is nothing simple.
Can't qualcomm just work woth the lmstudio team to build native support?

Is there any news?

Qualcomm org

@sebastienbo I completely agree with you. This is a high-priority item for us and we are actively working towards this. Sorry that it is taking some time. Stay tuned.

Sign up or log in to comment