Weida Wang's picture

7 28 4

Weida Wang

weidawang

·

https://davidweidawang.github.io/

davidweida

AI & ML interests

None yet

Recent Activity

authored a paper about 18 hours ago

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

upvoted a paper about 23 hours ago

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

upvoted a paper 11 days ago

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

View all activity

Organizations

upvoted a paper about 23 hours ago

PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

Paper • 2602.06663 • Published 4 days ago • 5

upvoted a paper 11 days ago

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

Paper • 2601.18491 • Published 15 days ago • 122

upvoted a paper about 1 month ago

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

Paper • 2512.22334 • Published Dec 26, 2025 • 35

upvoted 2 papers about 2 months ago

Memory in the Age of AI Agents

Paper • 2512.13564 • Published Dec 15, 2025 • 151

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

Paper • 2512.12602 • Published Dec 14, 2025 • 44

upvoted 3 papers 3 months ago

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published Nov 17, 2025 • 134

Scaling Agent Learning via Experience Synthesis

Paper • 2511.03773 • Published Nov 5, 2025 • 82

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48

upvoted 7 papers 4 months ago

Chem-R: Learning to Reason as a Chemist

Paper • 2510.16880 • Published Oct 19, 2025 • 53

Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published Oct 20, 2025 • 68

DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Paper • 2510.16872 • Published Oct 19, 2025 • 109

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 190

From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning

Paper • 2509.23768 • Published Sep 28, 2025 • 49

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

Paper • 2509.26495 • Published Sep 30, 2025 • 12

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 175

upvoted 2 papers 5 months ago

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104

HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?

Paper • 2509.07894 • Published Sep 9, 2025 • 31

upvoted 3 papers 6 months ago

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Paper • 2508.18124 • Published Aug 25, 2025 • 49

SSRL: Self-Search Reinforcement Learning

Paper • 2508.10874 • Published Aug 14, 2025 • 97

IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding

Paper • 2508.09456 • Published Aug 13, 2025 • 8