OpenAI, Meta Debate Wild Solutions As They Run Out of Data to Train AI
From Business Insider: 2024-04-07 16:08:00
Big Tech firms like Meta, Google, and OpenAI are running out of high-quality data to train their AI models by 2026. To combat this, Google considered using consumer data from Google Docs, Sheets, and Slides, while Meta executives brainstormed options like buying Simon & Schuster for new data sources. OpenAI is exploring synthetic data as an alternative.
Google’s legal department aimed to broaden the use of consumer data for training AI systems from Google Docs, Sheets, and Slides. Meanwhile, Meta executives considered purchasing Simon & Schuster for new data sources, while also contemplating budget-friendly options like paying $10 per book for full licensing rights to new titles.
OpenAI is considering synthetic data generated by AI systems as an alternative to train its AI models. However, using synthetic data has its challenges, including reinforcing AI limitations and mistakes. OpenAI is working on a process where one AI system produces data, and another judges it to improve the training process.
Read more at Business Insider: OpenAI, Meta Debate Wild Solutions As They Run Out of Data to Train AI