OpenAI, the leading artificial intelligence (AI) research laboratory, has announced a new AI system called Sora that can generate videos from text prompts. Sora represents a major advance in AI’s ability to understand and simulate the physical world.
According to the OpenAI blog post, Sora is the first text-to-video generator that can create high quality videos up to a minute long while accurately adhering to user prompts. The system has an advanced understanding of physics, allowing it to realistically generate complex scenes with multiple characters and actions.
Sora utilizes cutting-edge diffusion transformer architecture, allowing it to be trained on a diverse range of visual data including different durations, resolutions and aspect ratios. It builds on previous OpenAI models like DALL-E for image generation and GPT for language understanding.
OpenAI notes that Sora still has some limitations in accurately modeling complex physical interactions and tracking objects over time. But they believe the model serves as an important foundation for achieving artificial general intelligence that can truly understand and interact with the real world.
Initially, OpenAI is granting access to Sora only for researchers and creative professionals to provide feedback. They outline intentions to take safety precautions before any public release, including working with experts to identify potential harms. The organization also plans to eventually include metadata with videos to indicate AI origins.
While promising, the unveiling of Sora raises some concerns about how synthesized video could be misused for fraud or misinformation. Policymakers will likely want to be involved with OpenAI to ensure proper safeguards are in place as this technology continues advancing rapidly.