This paper introduces \textbf{DRAGOn}, method to design a RAG benchmark on a regularly updated corpus. It features recent reference datasets, a question generation framework, an automatic evaluation pipeline, and a public leaderboard. Specified reference datasets allow for uniform comparison of RAG ...
No comments yet
Be the first to share your thoughts!