Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Paper
β’
2405.19332
β’
Published
β’
22
Quantization made by Richard Erkhov.
SELM-Llama-3-8B-Instruct-iter-1 - GGUF
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
| AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
|---|---|---|
| SELM-Llama-3-8B-Instruct-iter-3 | β β ββ 33.47 | β β β 8.29 |
| SELM-Llama-3-8B-Instruct-iter-2 | β β ββ 35.65 | β β β 8.09 |
| SELM-Llama-3-8B-Instruct-iter-1 | β β ββ 32.02 | β β β 7.92 |
| Meta-Llama-3-8B-Instruct | β β ββ 24.31 | β β β 7.93 |
The following hyperparameters were used during training:
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit