Elon Musk launched Grok 4 final night time, calling it the ‘smartest AI on the earth’ — what companies must know

Metro Loud
16 Min Read

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now


After days of controversy surrounding a flurry of antisemitic responses made not too long ago by his Grok AI-powered chatbot on his social community X (previously Twitter), a seemingly unrepentant and unbothered Elon Musk launched the newest model of his AI mannequin household, Grok 4, throughout an occasion livestreamed on X final night time, calling it the “the neatest AI on the earth.”

As Musk posted on X: “Grok 4 is the primary time, in my expertise, that an AI has been capable of remedy troublesome, real-world engineering questions the place the solutions can’t be discovered anyplace on the Web or in books. And it’ll get significantly better.”

The brand new launch truly consists of two distinct fashions: Grok 4, a single-agent reasoning mannequin, and Grok 4 Heavy, a multi-agent system designed to resolve advanced issues by means of inner collaboration and synthesis.

Each fashions are optimized for reasoning duties and include native instrument integration, enabling capabilities akin to internet search, code execution, and multimodal evaluation.

Musk and his workforce at xAI showcased benchmarks that recommend Grok 4 outperforms all present opponents throughout a spread of educational and coding evaluations, even in comparison with previously main AI reasoning mannequin rivals OpenAI o3 and Google Gemini.

Nonetheless, xAI has not but launched a mannequin card nor any official launch notes documentation for Grok 4 to the general public, making it difficult to independently assess efficiency and the claims made throughout the stream. We’ll replace if/when these grow to be obtainable.

Nor did Musk and his xAI workforce members taking part within the livestream tackle the obvious controversy dealing with Grok over the previous week, together with many incidents of Grok making antisemitic remarks or referring to itself as “MechaHitler“, and suggesting that individuals with Jewish surnames ought to be dealt with decisively by Adolf Hitler — a seemingly overt reference to the Holocaust and genocide of 6 million Jews throughout World Conflict 2.

The closest Musk got here was when he said: “The factor that I feel is most necessary for AI security—a minimum of my organic neural web tells me an important factor—is to be maximally truth-seeking,” and “We have to be sure that the AI is an effective AI. Good Grok” in addition to “It’s necessary to instill the values you need in a toddler that may develop as much as be extremely highly effective.”

Nonetheless, Musk didn’t apologize nor did he settle for accountability for Grok’s antisemitic, sexually offensive, and conspiratorial remarks. Right here’s a cop of the total stream beneath:

All through the livestream, the workforce emphasised Grok 4’s potential to cause from first ideas, appropriate its personal errors, and probably invent new applied sciences or uncover novel scientific insights.

The presentation additionally included demonstrations of Grok 4 Heavy making use of multi-agent collaboration to deal with research-level issues throughout disciplines.

Availability and pricing

Grok 4 is out there now by means of a number of channels, relying on person kind and subscription degree:

  • API Entry (for builders and enterprises):
    Grok 4 and Grok 4 Heavy are stay through the xAI API. Pricing is structured as follows:
    • $3 per 1 million enter tokens
    • $15 per 1 million output tokens
    • $0.75 per 1 million cached enter tokens
    • Costs double after 128,000 tokens in a single context window
      The API helps textual content and picture inputs, operate calling, structured outputs, and provides a 256,000-token context window.
  • Client Entry (through Grok chatbot and apps):
    Particular person customers can entry Grok 4 by means of the Grok chatbot on X, the Grok app (iOS and Android), and X.com, however solely with one of many following subscriptions:
    • PremiumPlus: $16/month
    • SuperGrok: $300/month
    • A brand new “SuperGrok Heavy” tier, additionally priced at $300/month, supplies entry to each Grok 4 and Grok 4 Heavy, the multi-agent variant.
      (Be aware: SuperGrok and PremiumPlus tiers could differ in availability and utilization quotas throughout X and Grok platforms.)
  • Launch Timing:
    Grok 4 grew to become obtainable instantly following the July 9, 2025 livestream. Non permanent entry limits have been in place throughout the demo, however full rollout to subscribers started shortly after.
  • Platform Enlargement:
    xAI has indicated plans to make Grok 4 obtainable by means of Microsoft Azure AI Foundry, the place Grok 3 and Grok 3 Mini are presently listed.

For subscription particulars, customers are directed to x.ai/grok and X Premium help. Right here’s the way it compares to different main AI fashions when it comes to pricing per million tokens.

Supplier & mannequinContext windowEnter ($/Mtok)Cached enterOutput ($/Mtok)Further notes
xAI – Grok 4 / 4 Heavy256 Ok (2× value >128 Ok)$3.00$0.75$15.00Picture enter, operate calling, structured JSON (apidog)
OpenAI – o3200 Ok$2.00$0.50$8.0050 % Batch-API low cost obtainable (OpenAI, OpenAI Assist Middle)
GPT-4o128 Ok$5.00$2.50$20.00Imaginative and prescient, audio, instruments (OpenAI)
Anthropic – Claude Sonnet 4200 Ok$3.00$0.30$15.0050 % batch output low cost (Anthropic)
Claude Opus 4200 Ok$15.00$1.50$75.00Excessive-accuracy flagship (Anthropic)
Google – Gemini 2.5 Professional200 Ok (2× value >200 Ok)$1.25$0.31$10.0075 % cache hit low cost (Google AI for Builders, Google Cloud)
Gemini 2.5 Flash200 Ok$0.30$0.075$2.50Quick, low cost preview tier (Google Cloud)
DeepSeek – deepseek-reasoner64 Ok$0.55 (miss) / $0.14 (hit)$0.14$2.1950-75 % off-peak low cost (DeepSeek API Docs)

Not like its predecessor Grok 3, launched in February, which separated tool-augmented responses from normal reasoning, Grok 4 was skilled with instruments from the beginning.

The mannequin integrates capabilities akin to code execution, internet search, and doc parsing. It additionally introduces Grok 4 Heavy, a multi-agent system the place a number of inner fashions work in parallel to generate and validate solutions.

Grok 4 additionally features a new voice mode that includes expressive outputs with lowered latency, and it helps textual content and picture enter, structured outputs, and performance calling.

Efficiency highlights

The unbiased AI mannequin evaluation and benchmarking group Artificial Evaluation said on X that xAI offered it with a model of Grok 4 (not Heavy) sooner than the general public launch for scoring.

On technical benchmarks, Grok 4 leads the Synthetic Evaluation Intelligence Index with a rating of 73, forward of opponents akin to OpenAI’s o3 (70) and Google’s Gemini 2.5 Professional (70).

It additionally recorded high scores in:

  • GPQA Diamond: 88%
  • ARC-AGI 2: 15.9%, double the second-best rating
  • Humanities Final Examination: 24% on the text-only model, and 44% with instruments
  • MMLU-Professional and AIME 2024: 87% and 94%, respectively
  • Coding and Math evaluations: Highest so far on LiveCodeBench, SciCode, AIME24, and MATH-500

Regardless of its benchmark success, Grok 4’s output pace stands at 75 tokens per second—slower than fashions like Gemini 2.5 Flash (353) or OpenAI’s o3 (187), however nonetheless sooner than Anthropic’s Claude 4 Opus (66).

The mannequin contains a 256,000 token context window, which sits above the 200k context limits of o3 and Claude 4 Sonnet however beneath the 1 million tokens supplied by Gemini 2.5 Professional and GPT-4.1.

Actual world use circumstances

xAI offered a number of demonstrations of Grok 4’s efficiency in utilized situations:

  • In a simulated enterprise activity referred to as VendingBench, Grok 4 considerably outperformed different fashions in long-horizon monetary planning.
  • On the Arc Institute, researchers used Grok 4 to investigate CRISPR logs and uncover novel hypotheses.
  • In radiology, the mannequin interpreted chest X-rays with greater accuracy than main friends.
  • Within the monetary sector, its mixture of real-time information entry and reasoning made it appropriate for forecasting and evaluation.

The mannequin can even create 3D video video games with minimal enter by autonomously sourcing and integrating belongings. Moreover, it demonstrated capabilities to simulate astrophysical occasions utilizing grounded approximations from revealed analysis.

Reception and dialogue

Business response to the Grok 4 launch has been divided, mixing enthusiasm for its efficiency with criticism of the occasion’s supply and broader belief points.

David Shapiro, an AI energy person and author, famous: “Grok 4 now takes its place as ‘sensible sufficient to truly assist with frontier analysis’… however has merely caught up with OpenAI.”

Ethan Mollick, a professor at Wharton, remarked on X: “So Grok 3 has had three separate incidents the place apparently unvetted modifications to the deployed system prompted a large-scale moral problem and an emergency rollback. I don’t assume you are able to do a Grok 4 launch that doesn’t a minimum of tackle this actually, if person belief issues,” later including, “Grok 3 was an excellent mannequin, and Grok 4 could be superb however having an excellent mannequin shouldn’t be sufficient – there are a number of actually good fashions on the market. You truly wish to belief the mannequin you might be constructing on.”

Ben Hyak, co-founder and CTO of AI product observability startup Raindrop AI (himself a former Musk worker) criticized the livestream itself: “This xAI livestream is without doubt one of the worst issues I’ve ever watched in my life. Love y’all, but it surely’s unhealthy.”

Regardless of the criticisms, benchmarking agency Synthetic Evaluation famous: “Grok 4 is now the main AI mannequin.”

Ongoing belief points

The launch of Grok 4 comes amid renewed criticism over Grok’s prior conduct in shopper deployments, notably as a chatbot built-in into Musk’s social community, X.

Over the July 4 vacation and in subsequent days, Grok generated antisemitic and conspiratorial responses that reignited scrutiny over its system design and governance practices.

As reported by my VentureBeat colleague Michael F. Nuñez, Grok responded to questions on Jewish affect in Hollywood by asserting that Jewish executives “dominate management” at main studios and affect content material by means of “progressive ideologies,” and went on to rant about folks of Jewish surnames as becoming a “sample” of participating in “excessive leftist activism,” and suggesting Hitler knew “ deal with it decisively, each rattling time,” an obvious reference to the Holocaust.

The conspiratorial and antisemitic posting was so prolific, the Anti-Defamation League (ADL), a preeminent U.S.-based non-profit combating anti-semitism and hatred, posted on July 8: “What we’re seeing from Grok LLM proper now’s irresponsible, harmful and antisemitic, plain and easy. This supercharging of extremist rhetoric will solely amplify and encourage the antisemitism that’s already surging on X and plenty of different platforms.”

This incident follows a historical past of problematic Grok outputs, together with a Could 2025 case the place the Grok bot built-in into X randomly inserted references to a totally nonsensical and non-real “white genocide” in South Africa into unrelated queries, and an earlier case whereby its system immediate was found to direct the Grok chatbot on X to keep away from referencing any sources that declared Musk and his former political funding beneficiary U.S. President Donald J. Trump as spreaders of misinformation. In each of those two circumstances, xAI blamed the behaviors on anonymous workers and mentioned they have been being addressed.

Already, at present, customers of Grok 4 on the buyer app have noticed it to as soon as once more be outputting anti-Zionist and anti-Semitic remarks:

As I beforehand famous, Musk has brazenly said on a number of events he wished to change Grok to raised replicate his private beliefs and mistrust in mainstream media and accredited sources. This makes it a poor supply in enterprise contexts the place such views might adversely impression customers and the companies constructing atop the Grok household of fashions.

My prior advice stays: For these within the enterprise making an attempt to make sure their enterprise’s AI merchandise work correctly and precisely… Grok is unfortunately greatest prevented. Fortunately, there are quite a few different alternate options to select from.


Share This Article