WordPress and Tumblr to sign AI training content deal

OpenAI and Midjourney may soon have access to a huge trove of user-generated content to train their artificial intelligence (AI) models.

Content management system giant WordPress and micro-blogging site Tumblr are getting ready to sign a content-sharing deal with OpenAI and Midjourney.

No official announcement has been made, but news site 404 Media is reporting it has spoken to an internal source about the upcoming deal. The site has also seen internal communications and documentation regarding Tumblr’s preparations to hand over a tranche of user-generated content.

According to an internal post by a product manager at Tumblr, the process has not been an easy one and has led to more data being collected than at first intended, including private posts on public blogs, posts on deleted or suspended blogs, unanswered asks, as well as posts marked explicit, NSFW, and/or mature.

============

It’s not known if that data had already been sent to either OpenAI or Midjourney or if it was being sanitised before sending. However, the post does confirm the data to be shared includes content created between 2014 and 2023.

Automattic, which owns the two companies, has not confirmed the specific deal as yet, but it did release a statement on the importance of its users being able to opt out of having their content used to train AIs.

“AI is rapidly transforming nearly every aspect of our world, including the way we create and consume content,” Automattic said. “At Automattic, we’ve always believed in a free and open web and individual choice. Like other tech companies, we’re closely following these advancements, including how to work with AI companies in a way that respects our users’ preferences.”

The company said it currently blocks AI platform crawlers and will keep adding new companies as they launch. Users can also use a setting to “discourage” search engine indexation, which, if turned on, will also discourage AI crawlers.

However, Automattic does note that “no law exists that requires crawlers to follow these preferences, though this may change soon with pending legislation in the European Union”.

VIEW ALL

But the company does admit it is “working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control”.

“Our partnerships will respect all opt-out settings. We also plan to take that a step further and regularly update any partners about people who newly opt out and ask that their content be removed from past sources and future training,” it said.

David Hollingworth

David Hollingworth has been writing about technology for over 20 years, and has worked for a range of print and online titles in his career. He is enjoying getting to grips with cyber security, especially when it lets him talk about Lego.

Introducing Cyber Daily, the new name for Cyber Security Connect

Click here to learn all about it

newsletter

Be the first to hear the latest developments in the cyber industry.

WordPress and Tumblr to sign AI training content deal

David Hollingworth

Introducing Cyber Daily, the new name for Cyber Security Connect

OUR PLATFORMS AND BRANDS

EVENTS AND SUMMITS

PODCASTS

LEARNING AND EDUCATION

MOMENTUM MARKETS NETWORK

LINKS

STAY CONNECTED