-
-
-
-
-
-
Inference Providers
Active filters:
grpo
snap-stanford/humanlm-opinion
Text Generation
•
8B
•
Updated
•
22
•
6
lightx2v/Wan2.1-T2V-1.3B-longcat-step1500
Text-to-Video
•
Updated
•
40
•
5
lightx2v/Wan2.1-T2V-1.3B-longcat-step500
Text-to-Video
•
Updated
•
75
•
4
lightx2v/Wan2.1-T2V-1.3B-longcat-step1000
Text-to-Video
•
Updated
•
13
•
3
LightningRodLabs/Golf-Forecaster
Text Generation
•
Updated
•
19
•
3
MING-ZCH/MetaphorStar-32B
Image-Text-to-Text
•
33B
•
Updated
•
17
•
2
LightningRodLabs/Trump-Forecaster
Text Generation
•
Updated
•
97
•
2
ericrisco/salamandra-7b-r1
8B
•
Updated
•
21
•
2
almaghrabima/ALLaM-Thinking
7B
•
Updated
•
41
•
5
Jeremmmyyyyy/gemma-3-1b-Math
Text Generation
•
1.0B
•
Updated
•
3
•
1
Makrrr/Qwen3-1.7B-GSM8K-GRPO-verl
Reinforcement Learning
•
2B
•
Updated
•
27
•
3
Image-Text-to-Text
•
Updated
•
65
•
4
Paulescu/LFM2-350M-browsergym-20251224-013119
Text Generation
•
0.4B
•
Updated
•
39
•
2
Image-Text-to-Text
•
4B
•
Updated
•
16
•
1
Image-Text-to-Text
•
8B
•
Updated
•
14
•
1
Jarrodbarnes/opensec-gdpo-4b
Text Generation
•
4B
•
Updated
•
81
•
1
ragtag1/qwen3-8b-historical-final
Text Generation
•
Updated
•
13
•
1
Text Generation
•
0.1B
•
Updated
•
4
8B
•
Updated
sergiopaniego/Qwen2-0.5B-GRPO-test
Updated
Novaciano/ESP-NSFW-GRPO-1B-Sin_Censura-GGUF
1B
•
Updated
•
69
•
4
nbd22/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
Updated
sergiopaniego/Qwen2-0.5B-GRPO
Updated
philschmid/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
7
•
8
spinech/qwen-2.5-3b-r1-countdown
Text Generation
•
3B
•
Updated
•
4
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
•
2B
•
Updated
•
1
•
1
spinech/qwen2.5-3b-r1-rearc-stage1
Text Generation
•
3B
•
Updated
•
9