Running 65 UncheatableEval π 65 Compare and analyze AI model compression performance across different sizes and metrics