admarcosai 's Collections Efficient Training
updated
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published • 13
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published • 47
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper
• 2401.17574
• Published • 17
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
• 2401.02038
• Published • 65
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Paper
• 2312.00678
• Published • 2
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published • 95
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published • 53
Ziya2: Data-centric Learning is All LLMs Need
Paper
• 2311.03301
• Published • 20
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published • 51
Towards Optimal Learning of Language Models
Paper
• 2402.17759
• Published • 18
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 189
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published • 53