Most of the top news sites block AI bots. Right wing media welcomes him


“A process called reinforcement learning from human response is used in every state-of-the-art model right now,” says Baum, “to improve their responses.” The goal of most AI companies is to create systems that appear neutral. If the humans operating the AI ​​see an increase in right-wing content, but deem it unsafe or inaccurate, they may cancel any attempts to give the machine a definitive perspective.

OpenAI spokesperson Kayla Wood says that in the search for AI models that “deeply represent all cultures, industries, ideologies, and languages” the company uses extensive collections of training data. She says, “Any single region and any single news site, including news, is a small piece of the overall training data, and does not have a measurable impact on the model's intended learning and output.”

fight for rights

The disconnect in which news sites block AI crawlers may also reflect an ideological divide over copyright. new York Times is currently suing OpenAI for copyright infringement, arguing that the AI ​​upstart's data collection is illegal. Other leaders of the mainstream media also see this scraping as theft. Condé Nast CEO Roger Lynch told a recent Senate hearing that many AI tools are made from “stolen goods.” (WIRED is owned by Condé Nast.) Right-wing media owners have been largely absent from the debate. Perhaps they quietly allow data scraping because they support the argument that data scraping to create AI tools is protected by the fair use doctrine?

Of the nine right-wing outlets contacted by WIRED to ask why they allowed AI scrapers, their responses pointed to a different, less ideological reason. Washington Examiner did not respond to questions about its intentions, but began blocking OpenAI's GPPTot within 48 hours of WIRED's request, suggesting it was not previously aware of the option to block web crawlers. Was or was not given priority.

Meanwhile, the Daily Caller admitted that its permissiveness towards AI crawlers was a simple mistake. “We do not support bots stealing our assets. That may have been a mistake, but it's now being fixed,” says Neil Patel, co-founder and publisher of the Daily Caller.

Right-wing media are influential, and are particularly adept at leveraging social media platforms like Facebook to share articles. But like outlets Washington Examiner And are smaller and leaner than established media giants like the Daily Caller the new York TimesWho have extensive technical teams.

Data journalist Ben Welsh maintains a list of news websites that block AI crawlers from OpenAI, Google, and the nonprofit Common Crawl Project, which are widely used in data AI. Their results showed that about 53 percent of the 1,156 media publishers surveyed blocked one of those three bots. Its sample size is much larger than Originality AI's and includes smaller and less popular news sites, which suggests that outlets with larger staffs and higher traffic are more likely to block AI bots, perhaps with better resources or technology. Because of knowledge.

At least one right-wing news site is considering how to take advantage of the way its mainstream competitors are trying to disrupt AI projects to counter alleged political bias. “Our legal terms prohibit scraping, and we are exploring new tools to protect our IP. That said, we’re also looking for ways to help ensure that AI doesn’t end up with all the same biases as the establishment press,” says Daily Wire spokesperson Jen Smith. Until today, GPTBot and other AI bots were still free to extract content from the Daily Wire.

Updated at 10:20 AM ET on January 24, 2024, to include the specific number of top news sites from which Originality AI collected data.