FTC investigation launched into Reddit's sale of user data for AI training


Reddit further said Its IPO next week says it could generate $203 million in revenue over the next few years from licensing user posts to Google and others for AI projects. The community-driven platform was forced to disclose on Friday that US regulators already have questions about that new line of business.

In a regulatory filing, Reddit said it received a letter from the US Federal Trade Commission on Thursday regarding “our selling, licensing, or sharing of user-generated content with third parties to train AI models.” I was asked.

The FTC, the U.S. government's primary antitrust regulator, has the power to sanction companies found to engage in unfair or deceptive trade practices. The idea of ​​licensing user-generated content for AI projects has raised questions from lawmakers and rights groups about privacy risks, fairness, and copyright.

Reddit isn't alone in trying to make money from licensing data, including user-generated data, for AI. Programming Q&A site Stack Overflow has signed a deal with Google, the Associated Press has signed a deal with OpenAI, and Tumblr owner Automattic has said it is working “with select AI companies.” Has been, but will allow users to opt out of passing on their data. Neither licensor immediately responded to requests for comment. Reddit isn't the only company to receive an FTC letter about data licensing, Axios reported Friday, citing an unnamed former agency official.

It is unclear whether the letter to Reddit is directly related to the review by another company.

Reddit said in Friday's disclosure that it does not believe it engaged in any inappropriate or deceptive activities, but warned that dealing with any government investigation could be costly and time-consuming. “The letter indicated that FTC staff were interested in meeting with us to learn more about our plans and that the FTC intended to request information and documents from us as its inquiry continues,” the filing said. Reddit said the FTC letter described the investigation as “a non-public investigation.”

Reddit, whose 17 billion posts and comments are seen by AI experts as valuable for training chatbots in the art of conversation, announced a deal last month to license content to Google. Reddit and Google did not immediately respond to requests for comment. The FTC declined to comment. (Advance Magazine Publishers, parent of WIRED's publisher Condé Nast, owns a stake in Reddit.)

AI chatbots like OpenAI's ChatGPT and Google's Gemini are seen as competitive threats to Reddit, publishers, and other ad-supported, content-driven businesses. Last year the possibility of licensing data to AI developers emerged as a potential benefit of generative AI for some companies.

But the use of data collected online to train AI models has raised many questions in boardrooms, courtrooms and Congress. For Reddit and others whose data is generated by users, those questions include who actually owns the content and whether it is appropriate to license it without granting a waiver to the creator. Security researchers have found that AI models can leak personal data included in the content used to create them. And some critics have suggested the deals could make powerful companies even more influential.