Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images Paper • 2604.07338 • Published 9 days ago • 5
Appear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images Paper • 2604.07338 • Published 9 days ago • 5
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published Jun 16, 2025 • 94
All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection Paper • 2601.04160 • Published Jan 7 • 4
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection Paper • 2601.05403 • Published Jan 8 • 11
The FinBen: An Holistic Financial Benchmark for Large Language Models Paper • 2402.12659 • Published Feb 20, 2024 • 24
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design Paper • 2311.13743 • Published Nov 23, 2023 • 2
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments Paper • 2603.23638 • Published 23 days ago • 11
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments Paper • 2603.23638 • Published 23 days ago • 11
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Paper • 2602.16990 • Published Feb 19 • 11
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Paper • 2602.16990 • Published Feb 19 • 11