Technology

Can Shengshu’s Vidu AI Text-to-Video Generator Dethrone the Competition?

2024-09-17

Introduction

In the ever-evolving realm of text-to-video generation, a new contender has surfaced: Vidu AI, developed by Shengshu Technology and globally launched on July 30. This innovative platform allows users to convert text—available in both Chinese and English—alongside images into sharp video clips lasting 4 to 8 seconds. Promising an impressive 1080p resolution, Vidu AI aims to set itself apart in terms of quality.

The emergence of Vidu AI marks a significant milestone in China's ambition to spearhead advancements in generative artificial intelligence. As a part of an expanding lineup that includes Kuaishou's Kling AI and MiniMax's Hailuo AI, Vidu AI joins in the race against established global players like OpenAI's Sora and Google's Veo.

Performance Insights: Vidu AI vs. The Rest

What makes Vidu AI a standout? For starters, efficiency reigns supreme: the platform can produce a 4-second clip in just 30 seconds, positioning it among the fastest in the industry. A unique feature called “reference-to-video” ensures that visual elements—including subjects, settings, and styles—remain consistent, a critical tool for creators engaged in projects where coherence matters, such as films and video games.

To evaluate Vidu AI’s prowess, KrASIA conducted a series of tests using prompts previously employed with Kling AI and Hailuo AI, examining factors like video quality, coherency, creativity, and speed. One test involved generating a clip of a “realistic puppy driving a car.” Vidu AI produced a visually appealing output, yet the puppy's interaction felt more static than dynamic, leaning towards a toy-like portrayal of the car. A similar challenge faced Hailuo AI, indicating that detailed inputs may yield superior results.

In a lighter test, Vidu AI was asked to create a “cute kitten eating lunch like a human.” Here, it performed admirably, delivering an adorable scene that compared favorably against both Kling AI and Hailuo AI, although Kling AI slightly edged out with enhanced realism.

A more demanding prompt—“astronauts repairing a space station orbiting Earth”—revealed Vidu AI's strength in animation. Despite the space station being designed more conservatively, the astronauts were depicted in lively, action-oriented movements. Conversely, Kling AI's output lacked dynamism.

However, the challenge of generating “medieval knights in combat” highlighted both Vidu AI’s and Hailuo AI’s struggles with producing coherent and fluid fight sequences. Timely adjustments to the prompts improved Vidu AI's output, emphasizing the importance of specificity.

When pushed further to generate “samurais in combat, anime style,” Vidu AI outperformed its competitors, producing visually captivating animations that echoed traditional anime conventions. Where Kling AI faltered by veering towards hyper-realism, Hailuo AI fell short in the stylized combat portrayal.

In a final test showcasing its key feature, Vidu AI managed to successfully replicate a character from a reference image—a blonde woman with blue eyes—seamlessly placing her in a beach environment, reflecting its promise of continuity in style.

Despite some inconsistencies in meeting the anticipated 30-seconds clip generation, Vidu AI consistently delivered outputs in under a minute, outperforming Kling AI considerably.

The Tech Behind Vidu AI

Driving Vidu AI’s functionality is Shengshu’s universal vision transformer (U-ViT) model, developed by chief scientist Zhu Jun and his team since its introduction in 2022. This architecture merges transformer and diffusion algorithms, empowering the platform to provide a wide spectrum of video outputs.

Since Vidu AI's introduction to the market, its relevance has surged, particularly in the film industry. Notably, Chinese director Li Ning has incorporated Vidu AI within his visionary project—China's first fully AI-generated movie, set to premiere later this year. This project's emphasis on visual consistency will likely hinge on Vidu AI’s capabilities, hinting at the transformational potential of AI in the filmmaking process.

Founded in March 2023 by a vision-driven team from Tsinghua University’s Institute for AI Industry Research, Shengshu has quickly gained noteworthy financial backing. It concluded a funding round with contributions from Qiming Venture Partners, Baidu, and others, showcasing strong investor confidence.

The generative AI space remains intensely competitive, especially following the recent debut of Zhipu AI's Ying. Meanwhile, ByteDance’s Faceu Technology is also stepping into the scene with its video tools. With firm ambitions, Shengshu's CEO Tang Jiayu is poised to challenge titans like OpenAI and Google, focusing on film production, anime creation, and the digital restoration of cultural relics—segments that align with China's strategic initiatives to lead in AI technology.

As the battle heats up, can Vidu AI carve a niche for itself among giants? Only time will answer this pivotal question for generative AI in the modern world. Stay tuned!