Google reportedly let OpenAI transcribe a million hours of YouTube videos to train GPT-4
From Neowin: 2024-04-07 02:16:00
AI companies like OpenAI, Google, and Meta are resorting to shady tactics to collect data to train their AI models. OpenAI has transcribed over a million hours of YouTube videos to train GPT-4. Google has also been allegedly involved in similar data scraping practices.
The NY Times report aligns with The Information’s report, stating that OpenAI scrapped data from YouTube videos and podcasts to train their AI systems. OpenAI’s president, Greg Brockman, was reportedly part of the team involved in this practice.
YouTube CEO Neil Mohan stated that using YouTube data for AI training violates their terms of service. Google, who owns YouTube, was also reported to engage in similar data scraping practices for their AI models, claiming to do so only with the creator’s consent.
Google allegedly asked a team to adjust its privacy policy to allow tapping into publicly available Google Docs and other online materials for training its AI products.
Read more at Neowin: Google reportedly let OpenAI transcribe a million hours of YouTube videos to train GPT-4