Positron believes it has discovered the key to tackle Nvidia in AI inference chips — here is the way it may gain advantage enterprises

Metro Loud
12 Min Read

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now


As demand for large-scale AI deployment skyrockets, the lesser-known, personal chip startup Positron is positioning itself as a direct challenger to market chief Nvidia by providing devoted, energy-efficient, memory-optimized inference chips geared toward relieving the business’s mounting price, energy, and availability bottlenecks.

“A key differentiator is our capacity to run frontier AI fashions with higher effectivity—reaching 2x to 5x efficiency per watt and greenback in comparison with Nvidia,” stated Thomas Sohmers, Positron co-founder and CTO, in a latest video name interview with VentureBeat.

Clearly, that’s excellent news for giant AI mannequin suppliers, however Positron’s management contends it’s useful for a lot of extra enterprises past, together with these utilizing AI fashions of their workflows, not as service choices to prospects.

“We construct chips that may be deployed in a whole bunch of present information facilities as a result of they don’t require liquid cooling or excessive energy densities,” identified Mitesh Agrawal, Positron’s CEO and the previous chief working officer of AI cloud inference supplier Lambda, additionally in the identical video name interview with VentureBeat.


The AI Influence Collection Returns to San Francisco – August 5

The following part of AI is right here – are you prepared? Be a part of leaders from Block, GSK, and SAP for an unique have a look at how autonomous brokers are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Safe your spot now – area is proscribed: https://bit.ly/3GuuPLF


Enterprise capitalists and early customers appear to agree.

Positron yesterday introduced an oversubscribed $51.6 million Collection A funding spherical led by Valor Fairness Companions, Atreides Administration and DFJ Progress, with help from Flume Ventures, Resilience Reserve, 1517 Fund and Until.

As for Positron’s early buyer base, that features each name-brand enterprises and firms working in inference-heavy sectors. Confirmed deployments embrace the foremost safety and cloud content material networking supplier Cloudflare, which makes use of Positron’s Atlas {hardware} in its globally distributed, power-constrained information facilities, and Parasail, by way of its AI-native information infrastructure platform SnapServe.

Past these, Positron experiences adoption throughout a number of key verticals the place environment friendly inference is crucial, corresponding to networking, gaming, content material moderation, content material supply networks (CDNs), and Token-as-a-Service suppliers.

These early customers are reportedly drawn in by Atlas’s capacity to ship excessive throughput and decrease energy consumption with out requiring specialised cooling or reworked infrastructure, making it a sexy drop-in choice for AI workloads throughout enterprise environments.

Getting into a difficult market that’s reducing AI mannequin dimension and rising effectivity

However Positron can be getting into a difficult market. The Data simply reported that rival buzzy AI inference chip startup Groqthe place Sohmers beforehand labored as Director of Know-how Technique — has decreased its 2025 income projection from $2 billion+ to $500 million, highlighting simply how unstable the AI {hardware} area will be.

Even well-funded companies face headwinds as they compete for information middle capability and enterprise mindshare towards entrenched GPU suppliers like Nvidia, to not point out the elephant within the room: the rise of extra environment friendly, smaller giant language fashions (LLMs) and specialised small language fashions (SLMs) that may even run on units as small and low-powered as smartphones.

But Positron’s management is for now embracing the pattern and shrugging off the potential impacts on its progress trajectory.

“There’s all the time been this duality—light-weight purposes on native units and heavyweight processing in centralized infrastructure,” stated Agrawal. “We imagine each will continue to grow.”

Sohmers agreed, stating: “We see a future the place each particular person may need a succesful mannequin on their cellphone, however these will nonetheless depend on giant fashions in information facilities to generate deeper insights.”

Atlas is an inference-first AI chip

Whereas Nvidia GPUs helped catalyze the deep studying growth by accelerating mannequin coaching, Positron argues that inference — the stage the place fashions generate output in manufacturing — is now the true bottleneck.

Its founders name it essentially the most under-optimized a part of the “AI stack,” particularly for generative AI workloads that rely upon quick, environment friendly mannequin serving.

Positron’s answer is Atlas, its first-generation inference accelerator constructed particularly to deal with giant transformer fashions.

In contrast to general-purpose GPUs, Atlas is optimized for the distinctive reminiscence and throughput wants of contemporary inference duties.

The corporate claims Atlas delivers 3.5x higher efficiency per greenback and as much as 66% decrease energy utilization than Nvidia’s H100, whereas additionally reaching 93% reminiscence bandwidth utilization—far above the standard 10–30% vary seen in GPUs.

From Atlas to Titan, supporting multi-trillion parameter fashions

Launched simply 15 months after founding — and with solely $12.5 million in seed capital — Atlas is already transport and in manufacturing.

The system helps as much as 0.5 trillion-parameter fashions in a single 2kW server and is suitable with Hugging Face transformer fashions by way of an OpenAI API-compatible endpoint.

Positron is now getting ready to launch its next-generation platform, Titan, in 2026.

Constructed on custom-designed “Asimov” silicon, Titan will function as much as two terabytes of high-speed reminiscence per accelerator and help fashions as much as 16 trillion parameters.

At the moment’s frontier fashions are within the hundred billions and single digit trillions of parameters, however newer fashions like OpenAI’s GPT-5 are presumed to be within the multi-trillions, and bigger fashions are at the moment considered required to succeed in synthetic common intelligence (AGI), AI that outperforms people on most economically worthwhile work, and superintelligence, AI that exceeds the flexibility for people to grasp and management.

Crucially, Titan is designed to function with normal air cooling in typical information middle environments, avoiding the high-density, liquid-cooled configurations that next-gen GPUs more and more require.

Engineering for effectivity and compatibility

From the beginning, Positron designed its system to be a drop-in substitute, permitting prospects to make use of present mannequin binaries with out code rewrites.

“If a buyer needed to change their conduct or their actions in any method, form or kind, that was a barrier,” stated Sohmers.

Sohmers defined that as a substitute of constructing a posh compiler stack or rearchitecting software program ecosystems, Positron centered narrowly on inference, designing {hardware} that ingests Nvidia-trained fashions instantly.

“CUDA mode isn’t one thing to struggle,” stated Agrawal. “It’s an ecosystem to take part in.”

This pragmatic method helped the corporate ship its first product shortly, validate efficiency with actual enterprise customers, and safe vital follow-on funding. As well as, its give attention to air cooling versus liquid cooling makes its Atlas chips the one choice for some deployments.

“We’re centered totally on purely air-cooled deployments… all these Nvidia Hopper- and Blackwell-based options going ahead are required liquid cooling… The one place you’ll be able to put these racks are in information facilities which might be being newly constructed now in the midst of nowhere,” stated Sohmers.

All advised, Positron’s capacity to execute shortly and capital-efficiently has helped distinguish it in a crowded AI {hardware} market.

Reminiscence is what you want

Sohmers and Agrawal level to a basic shift in AI workloads: from compute-bound convolutional neural networks to memory-bound transformer architectures.

Whereas older fashions demanded excessive FLOPs (floating-point operations), trendy transformers require huge reminiscence capability and bandwidth to run effectively.

Whereas Nvidia and others proceed to give attention to compute scaling, Positron is betting on memory-first design.

Sohmers famous that with transformer inference, the ratio of compute to reminiscence operations flips to close 1:1, which means that boosting reminiscence utilization has a direct and dramatic affect on efficiency and energy effectivity.

With Atlas already outperforming modern GPUs on key effectivity metrics, Titan goals to take this additional by providing the best reminiscence capability per chip within the business.

At launch, Titan is anticipated to supply an order-of-magnitude enhance over typical GPU reminiscence configurations — with out demanding specialised cooling or boutique networking setups.

U.S.-built chips

Positron’s manufacturing pipeline is proudly home. The corporate’s first-generation chips have been fabricated within the U.S. utilizing Intel amenities, with closing server meeting and integration additionally based mostly domestically.

For the Asimov chip, fabrication will shift to TSMC, although the group is aiming to maintain as a lot of the remainder of the manufacturing chain within the U.S. as potential, relying on foundry capability.

Geopolitical resilience and provide chain stability have gotten key buying standards for a lot of prospects — one more reason Positron believes its U.S.-made {hardware} provides a compelling different.

What’s subsequent?

Agrawal famous that Positron’s silicon targets not simply broad compatibility however most utility for enterprise, cloud, and analysis labs alike.

Whereas the corporate has not named any frontier mannequin suppliers as prospects but, he confirmed that outreach and conversations are underway.

Agrawal emphasised that promoting bodily infrastructure based mostly on economics and efficiency—not bundling it with proprietary APIs or enterprise fashions—is a part of what offers Positron credibility in a skeptical market.

“Should you can’t persuade a buyer to deploy your {hardware} based mostly on its economics, you’re not going to be worthwhile,” he stated.


Share This Article