BitNet-TRNQ

Ternary-quantized version of microsoft/bitnet-b1.58-2B-4T-bf16, packaged for the Trillim DarkNet inference engine.

This model runs entirely on CPU — no GPU required.

Model Details

Architecture BitNet (BitNetForCausalLM)
Parameters ~2B
Hidden size 2560
Layers 30
Attention heads 20 (5 KV heads)
Context length 4096
Quantization Ternary ({-1, 0, 1})
Source model microsoft/bitnet-b1.58-2B-4T-bf16
License MIT

Usage

pip install trillim
trillim pull Trillim/BitNet-TRNQ
trillim serve Trillim/BitNet-TRNQ

This starts an OpenAI-compatible API server at http://127.0.0.1:8000.

For interactive CLI chat:

trillim chat Trillim/BitNet-TRNQ

What's in this repo

File Description
qmodel.tensors Ternary-quantized weights in Trillim format
rope.cache Precomputed RoPE embeddings
config.json Model configuration
tokenizer.json Tokenizer
tokenizer_config.json Tokenizer configuration
trillim_config.json Trillim metadata

License

This model is released under the MIT License, following the license of the source model.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Trillim/BitNet-TRNQ

Quantized
(6)
this model