Papers - Fine-tuning
updated
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
• 2310.20587
• Published
• 18
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
• 2310.00533
• Published
• 2
QLoRA: Efficient Finetuning of Quantized LLMs
Paper
• 2305.14314
• Published
• 59
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
• 2309.14717
• Published
• 46
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
• 2310.09263
• Published
• 40
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper
• 2403.15042
• Published
• 27
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper
• 2302.04761
• Published
• 12
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
• 2403.17887
• Published
• 82
InternLM2 Technical Report
Paper
• 2403.17297
• Published
• 34
LIMA: Less Is More for Alignment
Paper
• 2305.11206
• Published
• 27
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published
• 64
sDPO: Don't Use Your Data All at Once
Paper
• 2403.19270
• Published
• 41
Deep reinforcement learning from human preferences
Paper
• 1706.03741
• Published
• 4
Fine-tuning Language Models for Factuality
Paper
• 2311.08401
• Published
• 30
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
• 2310.12962
• Published
• 13
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper
• 2403.20327
• Published
• 48
Model Stock: All we need is just a few fine-tuned models
Paper
• 2403.19522
• Published
• 14
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published
• 101
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper
• 2310.01377
• Published
• 5
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
• 2404.03673
• Published
• 15
Stream of Search (SoS): Learning to Search in Language
Paper
• 2404.03683
• Published
• 30
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
• 2404.03820
• Published
• 25
ORPO: Monolithic Preference Optimization without Reference Model
Paper
• 2403.07691
• Published
• 72
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published
• 90
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking
Paper
• 2402.14811
• Published
• 4
Comprehensive Survey of Model Compression and Speed up for Vision
Transformers
Paper
• 2404.10407
• Published
• 1
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of
Instruction Data
Paper
• 2404.12195
• Published
• 12
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Paper
• 2303.15647
• Published
• 4
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Paper
• 2205.12148
• Published
• 2
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex
Models
Paper
• 2406.15718
• Published
• 14
In-context Vectors: Making In Context Learning More Effective and
Controllable Through Latent Space Steering
Paper
• 2311.06668
• Published
• 5
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper
• 2407.09025
• Published
• 139
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
• 2403.13372
• Published
• 180
Adapting While Learning: Grounding LLMs for Scientific Problems with
Intelligent Tool Usage Adaptation
Paper
• 2411.00412
• Published
• 10
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper
• 2410.18057
• Published
• 209
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Paper
• 2410.21228
• Published
• 3
Cut Your Losses in Large-Vocabulary Language Models
Paper
• 2411.09009
• Published
• 49
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper
• 2411.09595
• Published
• 77
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
• 2412.11768
• Published
• 43
Group Robust Preference Optimization in Reward-free RLHF
Paper
• 2405.20304
• Published
• 1