Be part of the occasion trusted by enterprise leaders for almost 20 years. VB Rework brings collectively the folks constructing actual enterprise AI technique. Study extra
Chinese language AI startup MiniMax, maybe finest recognized within the West for its hit sensible AI video mannequin Hailuo, has launched its newest massive language mannequin, MiniMax-M1 — and in nice information for enterprises and builders, it’s utterly open supply beneath an Apache 2.0 license, that means companies can take it and use it for industrial functions and modify it to their liking with out restriction or fee.
M1 is an open-weight providing that units new requirements in long-context reasoning, agentic instrument use, and environment friendly compute efficiency. It’s obtainable right now on the AI code sharing neighborhood Hugging Face and Microsoft’s rival code sharing neighborhood GitHub, the primary launch of what the corporate dubbed as “MiniMaxWeek” from its social account on X — with additional product bulletins anticipated.
MiniMax-M1 distinguishes itself with a context window of 1 million enter tokens and as much as 80,000 tokens in output, positioning it as probably the most expansive fashions obtainable for long-context reasoning duties.
The “context window” in massive language fashions (LLMs) refers back to the most variety of tokens the mannequin can course of at one time — together with each enter and output. Tokens are the essential models of textual content, which can embody complete phrases, components of phrases, punctuation marks, or code symbols. These tokens are transformed into numerical vectors that the mannequin makes use of to characterize and manipulate that means by its parameters (weights and biases). They’re, in essence, the LLM’s native language.
For comparability, OpenAI’s GPT-4o has a context window of solely 128,000 tokens — sufficient to trade a few novel’s value of knowledge between the consumer and the mannequin in a single forwards and backwards interplay. At 1 million tokens, MiniMax-M1 may trade a small assortment or e book collection’ value of knowledge. Google Gemini 2.5 Professional provides a token context higher restrict of 1 million, as effectively, with a reported 2 million window within the works.
However M1 has one other trick up its sleeve: it’s been educated utilizing reinforcement studying in an modern, resourceful, extremely environment friendly approach. The mannequin is educated utilizing a hybrid Combination-of-Consultants (MoE) structure with a lightning consideration mechanism designed to cut back inference prices.
In line with the technical report, MiniMax-M1 consumes solely 25% of the floating level operations (FLOPs) required by DeepSeek R1 at a era size of 100,000 tokens.
Structure and variants
The mannequin is available in two variants—MiniMax-M1-40k and MiniMax-M1-80k—referring to their “pondering budgets” or output lengths.
The structure is constructed on the corporate’s earlier MiniMax-Textual content-01 basis and consists of 456 billion parameters, with 45.9 billion activated per token.
A standout characteristic of the discharge is the mannequin’s coaching price. MiniMax stories that the M1 mannequin was educated utilizing large-scale reinforcement studying (RL) at an effectivity hardly ever seen on this area, with a complete price of $534,700.
This effectivity is credited to a customized RL algorithm known as CISPO, which clips significance sampling weights relatively than token updates, and to the hybrid consideration design that helps streamline scaling.
That’s an astonishingly “low cost” quantity for a frontier LLM, as DeepSeek educated its hit R1 reasoning mannequin at a reported price of $5-$6 million, whereas the coaching price of OpenAIs’ GPT-4 — a greater than two-year-old mannequin now — was mentioned to exceed $100 million. This price comes from each the value of graphics processing models (GPUs), the massively parallel computing {hardware} primarily manufactured by corporations like Nvidia, which might price $20,000–$30,000 or extra per module, and from the vitality required to run these chips repeatedly in large-scale knowledge facilities.
Benchmark efficiency
MiniMax-M1 has been evaluated throughout a collection of established benchmarks that check superior reasoning, software program engineering, and tool-use capabilities.
On AIME 2024, a arithmetic competitors benchmark, the M1-80k mannequin scores 86.0% accuracy. It additionally delivers robust efficiency in coding and long-context duties, attaining:
- 65.0% on LiveCodeBench
- 56.0% on SWE-bench Verified
- 62.8% on TAU-bench
- 73.4% on OpenAI MRCR (4-needle model)

These outcomes place MiniMax-M1 forward of different open-weight rivals corresponding to DeepSeek-R1 and Qwen3-235B-A22B on a number of complicated duties.
Whereas closed-weight fashions like OpenAI’s o3 and Gemini 2.5 Professional nonetheless high some benchmarks, MiniMax-M1 narrows the efficiency hole significantly whereas remaining freely accessible beneath an Apache-2.0 license.
For deployment, MiniMax recommends vLLM because the serving backend, citing its optimization for big mannequin workloads, reminiscence effectivity, and batch request dealing with. The corporate additionally supplies deployment choices utilizing the Transformers library.
MiniMax-M1 consists of structured perform calling capabilities and is packaged with a chatbot API that includes on-line search, video and picture era, speech synthesis, and voice cloning instruments. These options purpose to assist broader agentic habits in real-world functions.
Implications for technical decision-makers and enterprise patrons
MiniMax-M1’s open entry, long-context capabilities, and compute effectivity deal with a number of recurring challenges for technical professionals accountable for managing AI methods at scale.
For engineering leads accountable for the complete lifecycle of LLMs — corresponding to optimizing mannequin efficiency and deploying beneath tight timelines — MiniMax-M1 provides a decrease operational price profile whereas supporting superior reasoning duties. Its lengthy context window may considerably scale back preprocessing efforts for enterprise paperwork or log knowledge that span tens or lots of of 1000’s of tokens.
For these managing AI orchestration pipelines, the flexibility to fine-tune and deploy MiniMax-M1 utilizing established instruments like vLLM or Transformers helps simpler integration into present infrastructure. The hybrid-attention structure might assist simplify scaling methods, and the mannequin’s aggressive efficiency on multi-step reasoning and software program engineering benchmarks provides a high-capability base for inside copilots or agent-based methods.
From a knowledge platform perspective, groups accountable for sustaining environment friendly, scalable infrastructure can profit from M1’s assist for structured perform calling and its compatibility with automated pipelines. Its open-source nature permits groups to tailor efficiency to their stack with out vendor lock-in.
Safety leads may discover worth in evaluating M1’s potential for safe, on-premises deployment of a high-capability mannequin that doesn’t depend on transmitting delicate knowledge to third-party endpoints.
Taken collectively, MiniMax-M1 presents a versatile possibility for organizations trying to experiment with or scale up superior AI capabilities whereas managing prices, staying inside operational limits, and avoiding proprietary constraints.
The discharge indicators MiniMax’s continued concentrate on sensible, scalable AI fashions. By combining open entry with superior structure and compute effectivity, MiniMax-M1 might function a foundational mannequin for builders constructing next-generation functions that require each reasoning depth and long-range enter understanding.
We’ll be monitoring MiniMax’s different releases all through the week. Keep tuned!