Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

The dataset includes transcripts of YouTube videos from the platform's biggest creators.

·Senior Editor

Updated 16 July 2024 at 6:17 pm·3-min read

Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

Some of the world’s largest tech companies trained their AI models on a dataset that included transcripts of more than 173,000 YouTube videos without permission, a new investigation from Proof News has found. The dataset, which was created by a nonprofit company called EleutherAI, contains transcripts of YouTube videos from more than 48,000 channels and was used by Apple, NVIDIA and Anthropic among other companies. The findings of the investigation spotlight AI’s uncomfortable truth: the technology is largely built on the backs of data siphoned from creators without their consent or compensation.

The dataset doesn’t include any videos or images from YouTube, but contains video transcripts from the platform's biggest creators including Marques Brownlee and MrBeast, as well as large news publishers like The New York Times, the BBC, and ABC News. Subtitles from videos belonging to Engadget are also part of the dataset.

“Apple has sourced data for their AI from several companies,” Brownlee posted on X. “One of them scraped tons of data/transcripts from YouTube videos, including mine,” he added. “This is going to be an evolving problem for a long time.”

Apple has sourced data for their AI from several companies

One of them scraped tons of data/transcripts from YouTube videos, including mine

Apple technically avoids "fault" here because they're not the ones scraping

But this is going to be an evolving problem for a long time https://t.co/U93riaeSlY
— Marques Brownlee (@MKBHD) July 16, 2024

A Google spokesperson told Engadget that previous comments made by YouTube CEO Neal Mohan saying that companies using YouTube's data to train AI models would violate the paltform's terms and service still stand. Apple, NVIDIA, Anthropic and EleutherAI did not respond to a request for comment from Engadget.

So far, AI companies haven’t been transparent about the data used to train their models. Earlier this month, artists and photographers criticized Apple for failing to reveal the source of training data for Apple Intelligence, the company own spin on generative AI coming to millions of Apple devices this year.

However, Apple told 9to5Mac on July 17 that its OpenELM model doesn't power any of its AI or machine learning features, including Apple Intelligence. Rather, the company said that the model was created strictly for research purposes. Previously, Apple has stated that its Apple Intelligence models were trained on "licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler," as 9to5Mac noted.

YouTube, the world’s largest repository of videos, in particular, is a goldmine of not only transcripts but also audio, video, and images, making it an attractive dataset for training AI models. Earlier this year, OpenAI’s chief technology officer, Mira Murati, evaded questions from The Wall Street Journal about whether the company used YouTube videos to train Sora, OpenAI’s upcoming AI video generation tool. “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” Murati said at the time. Alphabet CEO Sundar Pichai has also said that companies using data from YouTube to train their AI models would violate of the platform’s terms of service.

If you want to see if subtitles from your YouTube videos or from your favorite channels are part of the dataset, head over to the Proof News' lookup tool.

Update, July 16 2024, 3:17 PM PT: This story has been updated to include a statement from Google.

Update July 19, 2024, 2:32 AM ET: Story updated with comment from Apple that its OpenELM model doesn't power Apple Intelligence or any of its other AI or machine learning features.

BBC
City's own version of Monopoly board unveiled
More than 30 city landmarks have featured in the game including the cathedral.
BBC
'My accommodation has empty vapes everywhere'
An eco-charity launches a campaign against single-use vapes which it says harm the environment.
The Independent
Why aren’t we dealing with the last infertility taboo... it might be him?
LET’S UNPACK THAT: When a heterosexual couple can’t conceive, the burden of blame is still frequently laid at the woman’s door. But as sperm counts drop globally and men increasingly feel like an afterthought in the fertility industry production line, these antiquated attitudes aren’t helping anyone, writes Helen Coffey
The Independent
What happens when you don’t wash your bedsheets
LET’S UNPACK THAT: As we dive deep into colder months and hygiene becomes more important for maintaining our immunity, Olivia Petter asks how often we really need to be cleaning the sheets we sleep in
BBC
'I was abused by my biological mother but it does not define me'
Dr Emily Haythornthwaite now works to help other abuse survivors and give them a voice.
The Independent
Fake photos of Disney World flooded during Hurricane Milton spread online by Russian news agency
The photos have been widely circulated online, falsely showing the theme park underwater
Australian Associated Press
Veteran cop questioned over 'abhorrent' Nazi salutes
A Victorian police instructor will likely be charged by summons after being questioned by detectives over allegedly performing banned Nazi gestures.
BBC
What to expect at this year's Great Eastern Run
Aaron Murrell said the team is “ready and set for Sunday” to welcome more than 5, 000 participants.
BBC
Women urged to conquer 'self-doubt' running fells
Fran Blackett urges women to overcome "self doubt" and learn navigational skills on the fells.
BBC
Teen who died at hospital 'deserved so much more'
Chloe Longster's family say they want "tangible change" after their daughter's death from sepsis.
France 24
🔴 Live: Israel observes Yom Kippur amid condemnation over Lebanon strikes
Israel observed Yom Kippur, the holiest day in the Jewish calendar, on Saturday amid a firestorm of international criticism over its military offensive in Lebanon and its soldiers firing on peacekeepers. As the holy day got under way Friday from sundown, Israel faced diplomatic backlash over what it acknowledged was a "hit" earlier in the day on a United Nations peacekeeping position in Lebanon. Read our liveblog to see how all the day's events unfolded. Summary: Two UN peacekeepers were wounded
HuffPost UK
Thought The Stars Of It's What's Inside Looked Familiar? Here's Where You've Seen The Cast Before
Fans of The White Lotus, American Horror Story and superhero TV series might spot some familiar faces in Netflix's new comedy horror.
Thomson Reuters StreetEvents
Q4 2024 Oil-Dri Corporation of America Earnings Call
Q4 2024 Oil-Dri Corporation of America Earnings Call
Thomson Reuters StreetEvents
Q3 2024 Fastenal Co Earnings Call
Q3 2024 Fastenal Co Earnings Call
Yahoo Sports
Paul George scores 23 points for 76ers in preseason debut
Paul George made his preseason debut for the Philadelphia 76ers, scoring 23 points in 26 minutes.
LA Times
How the top 25 high school football teams fared
A look at how the top 25 high school football teams in the Southland fared this week, Oct. 10-12.
Thomson Reuters StreetEvents
Q3 2024 JPMorgan Chase & Co Earnings Call
Q3 2024 JPMorgan Chase & Co Earnings Call
BBC
Lowry exhibition 'big boost' for town
Shops and hotels say the exhibition has brought people to Berwick for the first time.
BBC
Beterbiev v Bivol - big-fight predictions
BBC Sport asks the boxing world for their predictions for Artur Beterbiev v Dmitry Bivol on Saturday.
The Independent
Ukraine-Russia latest: Putin faced ‘bloodiest month’ of war in September as Zelensky pitches victory plan
Putin casualties reach more than 600,000 in Ukraine since the war began in 2022, says US

ALL ORDS

ASX 200

AUD/USD

OIL

GOLD

Bitcoin AUD

XRP AUD

AUD/EUR

AUD/NZD

NZX 50

NASDAQ

FTSE

Dow Jones

DAX

Hang Seng

NIKKEI 225

Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

The dataset includes transcripts of YouTube videos from the platform's biggest creators.

Latest stories

City's own version of Monopoly board unveiled

'My accommodation has empty vapes everywhere'

Why aren’t we dealing with the last infertility taboo... it might be him?

What happens when you don’t wash your bedsheets

'I was abused by my biological mother but it does not define me'

Fake photos of Disney World flooded during Hurricane Milton spread online by Russian news agency

Veteran cop questioned over 'abhorrent' Nazi salutes

What to expect at this year's Great Eastern Run

Women urged to conquer 'self-doubt' running fells

Teen who died at hospital 'deserved so much more'

🔴 Live: Israel observes Yom Kippur amid condemnation over Lebanon strikes

Thought The Stars Of It's What's Inside Looked Familiar? Here's Where You've Seen The Cast Before

Q4 2024 Oil-Dri Corporation of America Earnings Call

Q3 2024 Fastenal Co Earnings Call

Paul George scores 23 points for 76ers in preseason debut

How the top 25 high school football teams fared

Q3 2024 JPMorgan Chase & Co Earnings Call

Lowry exhibition 'big boost' for town

Beterbiev v Bivol - big-fight predictions

Ukraine-Russia latest: Putin faced ‘bloodiest month’ of war in September as Zelensky pitches victory plan