Reddit accuses Perplexity of stealing person posts, increasing knowledge rights battle with AI business

Metro Loud
4 Min Read


Sopa Photos | Lightrocket | Getty Photos

Social media big Reddit has launched a lawsuit in opposition to synthetic intelligence firm Perplexity, alleging that it illegally scraped person posts to coach its AI mannequin, marking the most recent data-rights conflict between content material homeowners and the AI business. 

The criticism filed in New York federal courtroom on Wednesday additionally named three defendants, which Reddit says helped Perplexity acquire its knowledge: Lithuanian knowledge scraper Oxylabs, “former Russian botnet” AWMProxy, and Texas startup SerpApi.

Reddit alleged that the three smaller entities had been in a position to extract its copyrighted content material “by masking their identities, hiding their areas and disguising their net scrapers as common individuals.”

Perplexity, which runs an AI-powered search engine, denied the allegations and accused Reddit of “extortion” and opposition to an open web, whereas SerpApi advised CNBC it “strongly disagrees” with Reddit’s claims and intends to defend itself in courtroom. 

The case represents one in all many filed by content material homeowners accusing AI companies of utilizing copyrighted materials with out permission to coach their giant language fashions. Reddit, particularly, has been on the entrance traces of that battle, having launched the same ongoing lawsuit in opposition to AI startup Anthropic in June. CNBC was unable to succeed in Oxylabs and AWMProxy.

In a press release shared with CNBC, Ben Lee, Chief Authorized Officer at Reddit, mentioned that AI corporations are” locked in an arms race for high quality human content material” and that strain has fueled an “industrial-scale ‘knowledge laundering’ financial system.”

Scrapers bypass technological protections to steal knowledge, then promote it to purchasers hungry for coaching materials. Reddit is a chief goal as a result of it is one of many largest and most dynamic collections of human dialog ever created.

Reddit — which hosts over 100,000 interest-based “subreddit” communities — mentioned in its lawsuit that its person posts had turn out to be probably the most generally cited supply for AI-generated solutions on Perplexity. 

It added that it despatched Perplexity a cease-and-desist letter, after which it elevated the amount of citations to Reddit “forty-fold.”

AI researchers have beforehand famous that Reddit’s giant quantity of moderated conversations will help make AI chatbots produce extra natural-sounding responses.

Within the age of synthetic intelligence, Reddit has labored to leverage its large knowledge pool, allowing entry to it solely by way of AI-related licensing agreements. The social media firm has signed such agreements with OpenAI and Alphabet‘s Google. 

In a response to the lawsuit, Perplexity, in a publish on the Reddit platform, argued that it doesn’t practice AI fashions on content material however merely summarizes and cites public Reddit discussions. Subsequently, it mentioned it’s “unattainable” to signal a license settlement.

“A yr in the past, after explaining this, Reddit insisted we pay anyway, regardless of lawfully accessing Reddit knowledge. Bowing to sturdy arm techniques simply is not how we do enterprise,” the assertion learn, happening to explain the go well with as a “present of drive in Reddit’s coaching knowledge negotiations with Google and OpenAI.” 

“Perplexity believes it is a unhappy instance of what occurs when public knowledge turns into an enormous a part of a public firm’s enterprise mannequin,” Perplexity added, noting that knowledge licensing has turn out to be an more and more vital income for Reddit. 

In February, Reddit’s COO Jen Wong advised the commerce publication Adweek that AI licensing offers with Google and OpenAI made up practically 10% of Reddit’s income. 

Share This Article