The rapid advancement of Large Language Models (LLMs) in software engineering has revealed critical limitations in existing benchmarks, particularly the widely used SWE-bench dataset. Recent studies have uncovered severe data contamination issues, e.g. SWE-bench~\cite{jimenez2023swe} reports 32.67\%...
No comments yet
Be the first to share your thoughts!