LLMeBench -- Accelerating LLM Benchmarking for Multimodality, Languages
We are excited to announce a major update to LLMeBench (http://llmebench.qcri.org/), our flexible and scalable framework for benchmarking LLMs across diverse tasks, modalities, and languages. This latest release brings expanded support for multimodal tasks, new model providers, and a broader range of NLP evaluations, further strengthening its capabilities.
The research community has been actively leveraging LLMeBench to assess LLM performance across different settings, and your engagement has been invaluable in shaping its development.
🔗 Read more in our papers: 📄 EACL 2024 Long Paper: https://aclanthology.org/2024.eacl-long.30/ 📄 EACL 2024 Demo Paper: https://aclanthology.org/2024.eacl-demo.23/