OpenAI's Sora trained on Netflix videos?

Sora is one of the best text-to-video models on the market. When OpenAI announced it in early December 2024, it produced nearly photorealistic videos (albeit with some noticeable errors). The Washington Post discovered that content from Netflix, TikTok, and Twitch may have been used for training.
Training only with public data?At the time of the announcement, OpenAI disclosed that Sora was trained using public and licensed data , without clearly specifying the sources. The Washington Post (which has a partnership with the California-based company) generated hundreds of videos, finding that many are similar to those shown in movies, TV shows, games, and social media.
Some videos generated by Sora (about 20 seconds without sound) appear to be clips from Netflix TV series (Wednesday), popular games (Minecraft), and TikTok. The final videos also feature logos and watermarks from the companies offering the original content, confirming that they were used to train the model.
However, it doesn't necessarily mean the content was copied or obtained from the owner. It could have been "lifted" from video-sharing platforms (like YouTube) or social media, where it was uploaded without the copyright holder's consent. Spokespeople for Netflix and Twitch stated that their respective companies have no agreements with OpenAI.
YouTube's terms prohibit downloading videos. Last year, a group of creators sued OpenAI because audio transcripts of videos were used to train the model used by ChatGPT. The California-based company has received several complaints for its use of books, articles, and other sources. OpenAI has not yet received a complaint for the data used to train Sora, likely because the final quality is poor.
Punto Informatico