Translationese English Goldfish-style models, with training data machine-translated from different source languages
Jenny Kunz
jekunz
AI & ML interests
Explainability and interpretability of NLP models, language adaptation, PEFT methods
Recent Activity
authored
a paper
7 days ago
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+
Languages and Cultures authored
a paper
7 days ago
A Dataset for Probing Translationese Preferences in English-to-Swedish Translation updated
a dataset 7 days ago
liu-nlp/translationese-opensubtitles Organizations
Adaptation of SmolLM to Faroese
All datasets and models created for the paper "Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese".
SmolLM CPT LoRA
Idiomatic Language Acquisition
Models associated with the Paper "Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish"
-
jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk
Text Generation • 0.1B • Updated • 10 -
jekunz/smollm-135m-fineweb-swedish-from-scratch-smol-smoltalk
Text Generation • 0.1B • Updated • 10 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 11 -
jekunz/smollm-135m-fineweb-swedish-from-scratch
Text Generation • 0.1B • Updated • 12
SmolLM baselines trained from scratch
SmolLM CPT
Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages.
-
jekunz/smollm-135m-cpt-fineweb-faroese
Text Generation • 0.1B • Updated • 1 -
jekunz/smollm-135m-cpt-fineweb-icelandic
Text Generation • 0.1B • Updated • 2 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 11 -
jekunz/smollm-135m-cpt-fineweb-faroese-transfer-from-icelandic
Text Generation • 0.1B • Updated • 3
Translationese English Models
Translationese English Goldfish-style models, with training data machine-translated from different source languages
Idiomatic Language Acquisition
Models associated with the Paper "Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish"
-
jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk
Text Generation • 0.1B • Updated • 10 -
jekunz/smollm-135m-fineweb-swedish-from-scratch-smol-smoltalk
Text Generation • 0.1B • Updated • 10 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 11 -
jekunz/smollm-135m-fineweb-swedish-from-scratch
Text Generation • 0.1B • Updated • 12
Adaptation of SmolLM to Faroese
All datasets and models created for the paper "Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese".
SmolLM baselines trained from scratch
SmolLM CPT LoRA
SmolLM CPT
Continued Pre-Training of SmolLM models on the Fineweb-2 portions of Scandinavian languages.
-
jekunz/smollm-135m-cpt-fineweb-faroese
Text Generation • 0.1B • Updated • 1 -
jekunz/smollm-135m-cpt-fineweb-icelandic
Text Generation • 0.1B • Updated • 2 -
jekunz/smollm-135m-cpt-fineweb-swedish
Text Generation • 0.1B • Updated • 11 -
jekunz/smollm-135m-cpt-fineweb-faroese-transfer-from-icelandic
Text Generation • 0.1B • Updated • 3