Large Language Models Benchmarks

Logical Intelligence Achieves 76 Percent on Putnam Benchmark, Highlighting Shift Beyond Large Language Models to Language-free, Mathematically Grounded Models

Logical Intelligence Achieves 76 Percent on Putnam Benchmark, Highlighting Shift Beyond Large Language Models to Language-free, Mathematically Grounded Models Over the last decade, artificial ...

Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows

Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source models move beyond chat-based applications and into production ...

How 2025 Recalibrated AI Models Race

In 2025, large language models moved beyond benchmarks to efficiency, reliability, and integration, reshaping how AI is ...

ZDNet

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

The Brighterside of News on MSN

New memory structure helps AI models think longer and faster without using more power

Researchers from the University of Edinburgh and NVIDIA have introduced a new method that helps large language models reason ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Hosted on MSN

Meta Platforms’ (META) New Llama 3.3 Language Model Outperforms Competitors in Industry Benchmarks

We recently compiled a list of the 11 Trending AI Stocks on Latest News and Ratings. In this article, we are going to take a look at where Meta Platforms, Inc. (NASDAQ:META) stands against the other ...

11d

Xiaomi Unveils Fast, Low-Cost AI Model As 'Genius Girl' Researcher Outlines Next Phase Of Agent Intelligence

Since April, Xiaomi has released a series of open-source foundation models covering language, multimodal and voice ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results