Is OpenAI’s Sora Trained on YouTube Videos? A Question of Ethics and Licensing

From Cined Magazine:

OpenAI’s video generator Sora created excitement and controversy in the filmmaking community. Chief Technology Officer Mira Murati avoided questions about the data used to train Sora, raising ethical and licensing concerns. Sora is a text-to-video generator capable of replacing stock footage and potentially impacting job opportunities in the industry.

In an interview with The Wall Street Journal, Murati did not disclose whether Sora was trained on YouTube videos. Concerns were raised about the legality of using publicly available data for commercial purposes. The lack of transparency regarding the datasets used for training generative AI models is a common issue in the industry.

OpenAI’s Sora reportedly used content from Shutterstock, but it’s unclear if other sources, including YouTube, were also used. The copyright and attribution implications of using publicly available data for training generative AI models are a growing concern. Companies like OpenAI face legal challenges from media companies for unauthorized use of materials.

Generative AI developers, including OpenAI, often face copyright challenges due to the lack of regulations governing dataset usage. Some companies have taken legal action to prevent unauthorized use of their content. Lack of transparency in disclosing training data sources raises ethical and legal concerns for AI developers.

The release of Sora to the public raises questions about how OpenAI will address the use of YouTube videos and other content for training the model. Failure to clarify training data sources or compensate content creators could lead to copyright lawsuits from studios and production companies. OpenAI must find solutions to address ethical and legal concerns before launching Sora commercially.



Read more at Cined Magazine: Is OpenAI’s Sora Trained on YouTube Videos? A Question of Ethics and Licensing