Benchmark Your LLM Against Korea’s Most Challenging Exam!

Are you ready to put your LLM to the ultimate test? The Korean SAT, one of the toughest college entrance exams in Korea, now has a leaderboard where you can compare your model’s performance against real student scores using real human Korean SAT grading system!

Additionally, gpt o1-preview accomplish 1st grade in Korean SAT! (Top 4%!!)

🤷 What makes this leaderboard special?

  • It uses the exact human evaluation methods applied in the Korean SAT grading system.
  • You’ll get a real sense of how your LLM stands up against the challenges that Korean students face.
  • Compare your model's score to the top-performing students aiming for Korea’s most prestigious universities!

😆 Why is this exciting?

  • You’ll be able to see where your model ranks and even compare it to human performance!
  • From an LLM benchmarking perspective, the diverse range of fields and genres in this dataset provides a comprehensive evaluation of the model's ability to understand, reason, and critically assess information across multiple domains.

Join the challenge! Submit your LLM, see how it scores, and compare it to the results of real students. Can your model get into a top Korean university?

https://github.com/minsing-jin/Korean-SAT-LLM-Leaderboard

i.e)

This Korean-SAT benchmarking system powerd by [AutoRAG](https://github.com/Marker-Inc-Korea/AutoRAG). (AutoRAG is an automatic RAG optimization tool that can also be used for LLM performance comparison and prompt engineering.)

https://preview.redd.it/8dt551nbg2vd1.png?width=2138&format=png&auto=webp&s=16fac79a214342d06fa05bea82eda6afa15c89ec

https://preview.redd.it/d5ysyhkfg2vd1.png?width=2028&format=png&auto=webp&s=93e777146bb194c6c58c2d2565825bdbc57c4b21