--- license: apache-2.0 base_model: - HuggingFaceTB/SmolLM2-1.7B --- # Mamba2 Distilled Model **Version:** 1.0 **Architecture:** [MOHAWK LMHead](https://github.com/schwartz-lab-NLP/SSM-Pruner/blob/d8019fb81a1081dd5df63b606904c3b0f7e653d4/phi_mamba/modules/lm_head.py) --- ## Overview This model is a distilled version of SmolLM2-1.7B using the MOHAWK method to SSM based Mamba2 architecture, keeping the MLP layers as is and replacing the attention layers with Mamba2 layers. It was developed for [On Pruning State-Space LLMs](). --- ## Evaluation The model has been benchmarked on several tasks: | **Task** | **Metric** | **Value** | **Stderr** | |-----------------------|--------------:|-----------:|-----------:| | **ARC Challenge** | acc | 0.4164 | ±0.0144 | | **ARC Easy** | acc | 0.7492 | ±0.0089 | | **Hellaswag** | acc | 0.4988 | ±0.0050 | | **Lambada (OpenAI)** | acc | 0.5707 | ±0.0069 | | | perplexity | 7.0794 | ±0.1761 | | **PIQA** | acc | 0.7661 | ±0.0099 | | **Winogrande** | acc | 0.6283 | ±0.0136 | > **Note:** > - For accuracy metrics, higher values are better. > - For perplexity, lower values are better. --- ## Intended Use - **General NLP Tasks:** Suitable for various language understanding and reasoning tasks. - **Research & Prototyping:** Ideal for lightweight experiments and efficient production environments. --- ## Citation If you use this model, please cite: ```bibtex @misc{ghattas2025pruningstatespacellms, title={On Pruning State-Space LLMs}, author={Tamer Ghattas and Michael Hassid and Roy Schwartz}, year={2025}, eprint={2502.18886}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.18886}, } ``` --- *Model Card Last Updated: February 16, 2025*