Four Takeaways on the Race to Amass Data for A.I.

From New York Times: 2024-04-06 05:03:13

Tech companies like Meta and Google are using online data to fuel their AI development, leading to more accurate and powerful models like large language systems. However, the supply of high-quality digital data is predicted to run out by 2026, pushing companies to go to great lengths to obtain more data, even crossing legal boundaries.

In the quest for more data, companies like OpenAI and Google are resorting to innovative methods, such as converting YouTube videos into text. These companies are also exploring the controversial use of copyrighted material to train their AI models, challenging privacy and copyright laws. Additionally, they are considering options like purchasing major publishers for more data sources.

One potential solution to the data scarcity issue is the creation of ‘synthetic’ data using AI-generated text. While this approach can help generate more data for AI training, it also poses risks of errors and inaccuracies in the models. Companies are exploring this option to address the impending shortage of high-quality digital data for AI development.

Read more at New York Times: Four Takeaways on the Race to Amass Data for A.I.

You may also like

Takeaways from U.S.-China summit—Taiwan, military talks, fentanyl

Amazon Launches Free AI Classes in Bid to Win Talent Arms Race

Motherhood penalty laid bare: From co-workers comparing pregnant colleagues to broken race cars to senior women ‘hazing’ other moms