view post Post 4404 OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗 merterbak/gpt-oss-20b-demo See translation
view post Post 4693 Qwen 3 technical report released🚀Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf See translation
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 110 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 91 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 8.34M • • 979 Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26, 2025 • 3.42M • • 385 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26, 2025 • 3.72M • • 524 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 3.91M • • 859
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 110 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 91 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 8.34M • • 979 Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26, 2025 • 3.42M • • 385 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26, 2025 • 3.72M • • 524 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 3.91M • • 859
merterbak/Mistral-Small-3.1-24B-Instruct-2503-GGUF Text Generation • 24B • Updated Apr 27, 2025 • 73 • 1