Vector Institute Unveils Comprehensive Evaluation of Leading AI Models

/EIN News/ -- At a glance:

Canada’s Vector Institute has assessed 11 leading AI models from around the world, using 16 performance benchmarks, including those pioneered by Vector researchers.
The State of Evaluation study marks the first time that both open and closed-source models have been evaluated against an expanded suite of benchmarks, revealing leaders and laggards in model performance.
The independent results can help organizations develop, deploy, and apply AI safely and responsibly.
In a first for this kind of research, Vector has shared the benchmarks, underlying code, and results in open-source to foster accountability, transparency, and collaboration that builds trust in AI.

TORONTO, April 10, 2025 (GLOBE NEWSWIRE) -- Canada’s Vector Institute has unveiled the results of its independent evaluation of leading large language models (LLMs), offering an objective look at how prominent frontier AI models perform against a comprehensive suite of benchmarks. The study, summarized in a new article on its website, assesses capabilities in increasingly complex tests of general knowledge, coding, cyber-safety, and other critical areas, providing key insights into the strengths and limitations of top AI agents.

AI companies are releasing new and more powerful LLMs at an unprecedented pace, with each new model promising greater capabilities from more human-like text generation to advanced problem-solving and decision-making. Developing widely used and trusted benchmarks advances AI safety; it helps researchers, developers, and users understand how these models perform in terms of accuracy, reliability, and fairness, enabling their responsible deployment.

In its State of Evaluation study, Vector’s AI Engineering team assessed 11 leading LLMs from around the world, including both publicly available (‘open’) models such as DeepSeek-R1 and Cohere’s Command R+, as well as commercial (‘closed’) models such as OpenAI’s GPT-4o and Gemini 1.5 from Google. Each agent was tested against 16 performance benchmarks, making this one of the most comprehensive, independent evaluations conducted to date.

“Independent, objective evaluation of this kind is vital to understanding how models perform in terms of accuracy, reliability, and fairness,” explains Deval Pandya, Vector’s Vice President of AI Engineering. “Robust benchmarks and accessible evaluations enable researchers, organizations, and policymakers to better understand the strengths, weaknesses, and real-world impact of these rapidly evolving, highly capable AI models and systems, and ultimately to foster trust in AI.”

In a first for this kind of research, Vector has shared the results of the study, the benchmarks, and the underlying code in an open-sourced, interactive leaderboard to promote transparency and foster advances in AI innovation. “Researchers, developers, regulators, and end-users can independently verify results, compare model performance, and build out their own benchmarks and evaluations to drive improvements and accountability,” says John Willes, Vector's AI Infrastructure and Research Engineering Manager, who led the project.

The project is a natural extension of Vector’s leadership in developing the benchmarks now used widely across the global AI safety community, including MMLU-Pro, MMMU, and OS-World, which were developed by Vector Institute Faculty Members and Canada CIFAR AI Chairs Wenhu Chen and Victor Zhong. It also builds on recent work by Vector’s AI Engineering team to develop Inspect Evals — an open-source AI safety testing platform created in collaboration with the UK AI Security Institute to standardize global safety evaluations and facilitate collaboration among researchers and developers.

“As organizations seek to unlock the transformative benefits of AI, Vector is in a unique position to provide independent, trusted expertise that enables them to do so safely and responsibly,” explains Pandya, citing the institute’s programs in which its industry partners collaborate with expert researchers at the forefront of AI safety and application. “Whether they’re in financial services, technology innovation, or health or more, our industry partners have access to Vector’s unparalleled sandbox environment where they can experiment and test models and techniques to help address their specific AI-related business challenges.”

Read more about Vector Institute’s “State of Evaluation” here.
Explore interactive leaderboard here.

About Vector Institute: The Vector Institute is an independent, not-for-profit corporation dedicated to advancing artificial intelligence, excelling in machine learning and deep learning. Our vision is to drive excellence and leadership in Canada’s knowledge, creation, and use of AI to foster economic growth and improve the lives of Canadians. The Vector Institute is funded by the Province of Ontario, the Government of Canada through CIFAR Pan-Canadian AI Strategy, and industry sponsors across Canada.

For further information or media enquiries, please contact: media@vectorinstitute.ai

Distribution channels: Science

Legal Disclaimer:

EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Submit your press release