Home
News
You are here

Apple's AI training faces backlash as major publishers opt out

By Aleksandar Anastasov

Published: Aug 29, 2024, 8:29 AM

0comments

Apple

Is Apple's AI momentum about to crash? A growing number of news outlets and social platforms are saying "no" to the tech giant's data-hungry web crawlers.

For decades, these digital bots have been quietly collecting information from the internet, feeding it to everything from search engines to AI models. But as AI has become more powerful, the stakes have risen. Now, publishers are drawing a line in the sand, demanding control over their content and challenging Apple's AI ambitions.

Apple's web crawler, Applebot, was initially designed to power features like Siri and Spotlight. However, it has recently taken on another big role: gathering data to train Apple's foundational AI models, or what the company calls "Apple Intelligence". This data includes text, images, and other content.

To appease publishers, Apple introduced Applebot-Extended, a tool that allows website owners to opt out of AI training. So, while the option exists, many publishers are taking advantage of it. By updating their robots.txt files, they can block Applebot (and other crawlers) from accessing their content.

What is Robots.txt?

Robots.txt is a file used by website owners to control which bots can access their content. Publishers are increasingly using it to block AI bots from scraping their websites for training data. This is due to concerns about copyright and the potential misuse of their content.

While robots.txt is a relatively simple tool, it has become more complex in the age of AI. With the rapid emergence of new AI agents, it can be challenging for publishers to keep their block lists up-to-date. As a result, many are turning to services that automatically update their robots.txt files.

The backlash

Since the robots.txt files are publically accessible, it means that everyone can see which parties are opting out of Apple's AI training, which is exactly what Wired did.

Turns out that some media outlets like The New York Times, for example, has been outspoken in its criticism of Apple's opt-out approach. The paper, which is suing OpenAI over copyright infringement, argues that publishers shouldn't have to opt-out to begin with; instead, a permission must be required for web crawlers to gain access to the media's content.

Other popular websites that have opted out also include Instagram, Facebook, Tumblr, Craigslist, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast.

So, what's next? Will Apple be forced to rethink its AI strategy? Or will it find a way to appease publishers and continue its data-driven ambitions? The battle for control of the internet's digital goldmine is far from over.

View Full Bio

Aleksandar is a tech enthusiast with a broad range of interests, from smartphones to space exploration. His curiosity extends to hands-on DIY experiments with his gadgets, and he enjoys switching between different brands to experience the latest innovations. Prior to joining PhoneArena, Aleksandar worked on the Google Art Project, digitizing valuable artworks and gaining diverse perspectives on technology. When he's not immersed in tech, Aleksandar is an outdoorsman who enjoys mountain hikes, wildlife photography, and nature conservation. His interests also extend to martial arts, running, and snowboarding, reflecting his dynamic approach to life and technology.

Loading Comments...

Apple's AI training faces backlash as major publishers opt out

What is Robots.txt?

The backlash

Popular stories

Latest News

nproxy.org