On Tuesday, Google launched Veo 3, a brand new AI video synthesis mannequin that may do one thing no main AI video generator has been capable of do earlier than: create a synchronized audio observe. Whereas from 2022 to 2024, we noticed early steps in AI video technology, every video was silent and often very brief in length. Now you’ll be able to hear voices, dialog, and sound results in eight-second high-definition video clips.
Shortly after the brand new launch, individuals started asking the obvious benchmarking query: How good is Veo 3 at faking Oscar-winning actor Will Smith at consuming spaghetti?
First, a short recap. The spaghetti benchmark in AI video traces its origins again to March 2023, after we first coated an early instance of horrific AI-generated video utilizing an open supply video synthesis mannequin known as ModelScope. The spaghetti instance later turned well-known sufficient that Smith parodied it nearly a yr later in February 2024.
Here is what the unique viral video regarded like:
One factor individuals neglect is that on the time, the Smith instance wasn’t the perfect AI video generator on the market—a video synthesis mannequin known as Gen-2 from Runway had already achieved superior outcomes (although it was not but publicly accessible). However the ModelScope consequence was humorous and bizarre sufficient to stay in individuals’s reminiscences as an early poor instance of video synthesis, helpful for future comparisons as AI fashions progressed.
AI app developer Javi Lopez first got here to the rescue for curious spaghetti followers earlier this week with Veo 3, performing the Smith check and posting the outcomes on X. However as you may discover under once you watch, the soundtrack has a curious high quality: The fake Smith seems to be crunching on the spaghetti.
On X, Javi Lopez ran “Will Smith consuming spaghetti” in Google’s Veo 3 AI video generator and obtained this consequence.
It is a glitch in Veo 3’s experimental capability to use sound results to video, doubtless as a result of the coaching knowledge used to create Google’s AI fashions featured many examples of chewing mouths with crunching sound results. Generative AI fashions are pattern-matching prediction machines, they usually have to be proven sufficient examples of varied varieties of media to generate convincing new outputs. If an idea is over-represented or under-represented within the coaching knowledge, you may see uncommon technology outcomes, resembling jabberwockies.