OpenAI releases GPT-5.2 after “code pink” Google menace alert

In making an attempt to maintain up with (or forward of) the competitors, mannequin releases proceed at a gradual clip: GPT-5.2 represents OpenAI’s third main mannequin launch since August. GPT-5 launched that month with a brand new routing system that toggles between instant-response and simulated reasoning modes, although customers complained about responses that felt chilly and medical. November’s GPT-5.1 replace added eight preset “persona” choices and centered on making the system extra conversational.

Numbers go up

Oddly, although the GPT-5.2 mannequin launch is ostensibly a response to Gemini 3’s efficiency, OpenAI selected to not record any benchmarks on its promotional web site evaluating the 2 fashions. As an alternative, the official weblog put up focuses on GPT-5.2’s enhancements over its predecessors and its efficiency on OpenAI’s new GDPval benchmark, which makes an attempt to measure skilled data work duties throughout 44 occupations.

Throughout the press briefing, OpenAI did share some competitors comparability benchmarks that included Gemini 3 Professional and Claude Opus 4.5 however pushed again on the narrative that GPT-5.2 was rushed to market in response to Google. “You will need to observe this has been within the works for a lot of, many months,” Simo instructed reporters, though selecting when to launch it, we’ll observe, is a strategic determination.

In response to the shared numbers, GPT-5.2 Pondering scored 55.6 p.c on SWE-Bench Professional, a software program engineering benchmark, in comparison with 43.3 p.c for Gemini 3 Professional and 52.0 p.c for Claude Opus 4.5. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 scored 92.4 p.c versus Gemini 3 Professional’s 91.9 p.c.

GPT-5.2 benchmarks that OpenAI shared with the press.

Credit score:

OpenAI / Venturebeat

OpenAI says GPT-5.2 Pondering beats or ties “human professionals” on 70.9 p.c of duties within the GDPval benchmark (in comparison with 53.3 p.c for Gemini 3 Professional). The corporate additionally claims the mannequin completes these duties at greater than 11 occasions the pace and fewer than 1 p.c of the price of human consultants.

GPT-5.2 Pondering additionally reportedly generates responses with 38 p.c fewer confabulations than GPT-5.1, based on Max Schwarzer, OpenAI’s post-training lead, who instructed VentureBeat that the mannequin “hallucinates considerably much less” than its predecessor.

Nevertheless, we at all times take benchmarks with a grain of salt as a result of it’s simple to current them in a means that’s constructive to an organization, particularly when the science of measuring AI efficiency objectively hasn’t fairly caught up with company gross sales pitches for humanlike AI capabilities.

Impartial benchmark outcomes from researchers outdoors OpenAI will take time to reach. Within the meantime, for those who use ChatGPT for work duties, anticipate competent fashions with incremental enhancements and a few higher coding efficiency thrown in for good measure.