Skip to content

I replicated the original YouTube video utilising Veo 3, and the outcome is remarkably close to the original.

Person discoursing on the topic of elephants

Individual discussing elephants in conversation
Individual discussing elephants in conversation

I replicated the original YouTube video utilising Veo 3, and the outcome is remarkably close to the original.

In 2005, a humble 19-second video uploaded to YouTube marked the start of a digital revolution, featuring co-founder Jawed Karim at the zoo, commenting on elephants. Fast-forward to 2025, and Google's newest innovation, Veo 3, showcases how far we've come in the realm of generative video platforms.

Unveiled at Google I/O 2025, Veo 3 is a groundbreaking tool that can create videos with synced dialogue, sound effects, and background noises with a single prompt. Surprisingly, most of these clips are generated within five minutes.

During my time experimenting with Veo 3, I decided to recreate the first renowned social video – the YouTube video of Karim at the zoo. Understanding the importance of a precise prompt, I turned to another AI – Google AI Mode – to help me gather relevant details.

Google AI Mode swiftly provided a detailed description based on the YouTube URL, which I then tailored into a prompt for Veo 3, requesting an 8MM-style, 4:3 ratio video. After some waiting, Veo 3 produced a video, although it was incomplete – cutting off the dialogue mid-sentence due to the service's high demand at the time.

The result demonstrates impressive advancements: Veo 3 nailed the film quality, giving it a pleasing 2005 aesthetic, but fell short on the 4:3 aspect ratio and added unnecessary labels at the top. Despite this, the audio was exceptional, with syncing dialogue and distinct background noises.

The challenge lies in the prompt's detail. Without specific instructions, Veo 3 tends to make its own decisions, often resulting in outcomes that may differ from the desired video. In my case, Veo 3 struggled to accurately depict Karim, as my prompt didn't include particulars about his appearance.

Remembering that I have only two Veo 3 video generations per day as a Google AI Pro member, I plan on revisiting this experiment tomorrow with your suggestions.

[Enrichment Data: Veo 3 is part of Google Gemini and boasts significant advancements in audio-synced video generation. The model can produce highly accurate lip-syncing, realistic visuals up to 4K resolution, and generate complex scenes with lifelike audio.]

Incorporating the advancements of Google Gemini, Veo 3's latest feature includes generating videos with lifelike audio, high-resolution visuals, and even complex scenes using artificial-intelligence technology. Interestingly, the audio quality in the videos generated by Veo 3 is so precise that it synced perfectly with lip movements, imitating real conversations and ambient sounds.

Read also:

    Latest