How to Make Ads with Veo 3 [STEP-BY-STEP]
Struggling to make Veo 3 ads that don't look fake? This step-by-step guide walks you from prompt formula to shipped 8-second video ad, in minutes.
If you're searching for how to make ads with Veo 3, you're probably one prompt away from a shippable creative and looking for the formula that gets you there without burning a weekend on bad takes. This step-by-step tutorial covers the prompt structure that actually works, the workflow for stitching 8-second clips into a full ad, and the cost math so you know what you're spending per output. By the end you'll be able to ship a Veo 3 ad from blank prompt to ready-to-upload MP4 in about an hour.
Veo 3 is the first major video model that generates synchronized audio (dialogue, SFX, ambient) in the same pass as the visuals. Per the Google Cloud announcement, it ships 4 to 8-second clips at up to 4K, in both 16:9 and 9:16 aspect ratios, at 24fps with 48kHz stereo audio. That's the spec that makes ad-grade output realistic for the first time, no separate voiceover pass, no extra sound-design step.

Watch: a Veo 3 ad built in real time
A short walkthrough that pairs with the workflow below. Skip if you already have a Flow or Vertex AI account open.
What Veo 3 actually is (and what it isn't)
Veo 3 is Google DeepMind's flagship text-to-video model, announced in May 2025 alongside Imagen 4 and Lyria 2. DeepMind's product page positions it as best-in-class on physics, realism, and prompt adherence, with the headline feature being native audio synthesis. The follow-up release, Veo 3.1, adds "Ingredients to Video": up to three reference images per generation so your character, product, or scene stays consistent across cuts.
What Veo 3 is not: a long-form storytelling engine. The hard cap is 8 seconds per clip. For a 30-second ad you stitch three to four clips together in a timeline editor. For a 15-second TikTok or Reel you typically need two. Plan your concept around that constraint before you write a single prompt.
Where to actually run Veo 3
Three official entry points, each with a different cost and latency profile:
- Google AI Studio (Veo 3 page). Free credits to try, fastest path to your first generation, no setup. The right place to validate a prompt before you commit budget.
- Flow. Google's video-first creative app, designed for filmmakers and ad creatives. Best UX, built-in timeline, scene-extension and Ingredients-to-Video features. The right place to ship a final ad.
- Vertex AI. The enterprise API. Use it when you're batch-generating dozens of variants, wiring Veo 3 into a custom creative ops pipeline, or running it inside a workflow tool like n8n.
For your first ad, start in AI Studio, finish in Flow. Skip Vertex until you have a winner you want to scale.
The Veo 3 prompt formula (six slots)
Vague prompts produce slop. Every Veo 3 prompt that ships should cover six slots, in this order. Treat them as fill-in-the-blanks, not optional.
- Subject + age + emotion. "A 28-year-old woman, gym clothes, mid-laugh." Specificity here is non-negotiable.
- Action + product interaction. "Picks up a protein shake, takes one sip, raises an eyebrow at the camera."
- Environment + time of day. "Sunlit home kitchen, late morning, soft window light from the left."
- Camera + lens + movement. "Handheld 35mm, slow push-in, eye-level, shallow depth of field."
- Dialogue (in quotes). Anything in quotation marks gets generated as on-screen dialogue with lip-sync. Example: She says, "Okay, but how is this only 8 bucks."
- Audio bed. "Ambient kitchen sounds, faint pop music in the background, no music swell." Veo 3 generates this natively, you just have to ask for it.
Stack those six in one paragraph. Don't bullet them in the prompt itself, Veo 3 reads cinematic prose better than schemas.
A full example prompt (copy and adapt)
Use this as a starting template for a product-first 8-second hook. Replace the bracketed values, keep the structure:
A [28-year-old woman, casual hoodie, no makeup], standing in a [bright Brooklyn apartment kitchen, morning light pouring through the window from camera-left]. She [picks up a sleek matte-black can of [product name] from the counter, cracks it open, takes one sip, eyes go wide]. She turns to the camera and says, "Okay, this actually slaps, why did nobody tell me about this." Shot on a handheld 35mm lens, slow push-in, eye-level, shallow depth of field, soft natural color grade. Ambient morning kitchen sounds, faint birdsong outside, no music. 8 seconds, 9:16 vertical.
Read that prompt back to yourself before you run it. If you can picture the shot frame-for-frame, Veo 3 probably can too. If any slot is fuzzy in your head, it'll be fuzzy in the output.
The full ad workflow, prompt to MP4
Eight steps. Run them in order on your first ad, then shortcut once you have your own pipeline.
- Pick an angle, not a script. Write the hook in one sentence ("she didn't know it cost $8 until the end"). Skip the full storyboard.
- Draft three prompt variants. Same hook, different subjects, environments, or deliveries. Cheap insurance against a bad first generation.
- Generate in AI Studio at lowest quality first. You're testing the concept, not shipping. Once a variant looks right, re-run it at the highest quality preset.
- Use Ingredients to Video for product consistency. Upload up to three reference images of your actual product so the can, bottle, or device matches across cuts. Veo 3.1 only.
- Generate B-roll prompts in parallel. While waiting on the hero clip, draft a close-up product shot and a context shot (using the product in the wild). Three clips total fills a 24-second ad cleanly.
- Stitch in a timeline editor. CapCut, Premiere, DaVinci, whatever you already use. Cut on motion, hold the audio bed continuous across cuts, the natural dialogue from Veo 3 carries the spot.
- Color-match between clips. Veo 3 outputs are close but not identical across generations. A 5-minute LUT pass in your editor erases the seam.
- Export per platform. 9:16 1080×1920 for Meta Reels and TikTok, 1:1 1080×1080 for the Feed, 16:9 1920×1080 for YouTube. Veo 3 can output 9:16 and 16:9 natively, square is a quick crop.
What it costs (and how to keep that number low)
Cost varies by entry point and quality preset, but the order of magnitude is roughly:
| Surface | Per 8s clip | Best for |
|---|---|---|
| AI Studio (free tier) | $0 (limited credits) | Concept testing |
| Flow (Pro) | ~$0.50–$2 / clip | Final ad shipping |
| Vertex AI (API) | ~$0.40–$1.50 / clip | Batch generation |
| Veo 3.1 Lite | ~30–50% cheaper | High-volume creative testing |
A full 24-second ad (three 8-second clips, two cheap variants scrapped per winner) tends to come in around $5 to $15 in raw generation cost. Compare that against the "AI video generation lets you cut production costs by about 85% compared to a traditional shoot" benchmark that Syllaby reports and the math is hard to argue with for top-of-funnel testing.
For pricing details, always check Google's Veo 3.1 Lite announcement and the current Vertex AI pricing page, both are the source of truth and update faster than this post can.
Five mistakes that kill Veo 3 ads
- Asking for too much in one clip. If your prompt covers more than one scene or one beat, split it. Two 8-second clips will beat one frankenstein every time.
- Forgetting the audio slot. The whole point of Veo 3 is native synced audio. If you don't ask for it, you'll get silence and end up paying for a separate voiceover pass.
- Vague camera language. "Cinematic" and "high quality" mean nothing. "Handheld 35mm, slow push-in" gives the model something to render.
- Skipping reference images for products. Without Ingredients-to-Video, your hero product will drift across generations. The label, color, or shape will mutate. Use references on anything customers need to recognize.
- Treating the first output as the answer. Even the best operators throw out 60–80% of generations. Budget for waste. Treat the first hit as your concept-validation step, not the final ad.
Veo 3 as an AI UGC engine
Veo 3 is the model that finally makes synthetic UGC ad-grade. A prompt describing a real-feeling creator (specific age, specific room, specific delivery) plus the dialogue slot is, structurally, an AI UGC pipeline.
The trade-off applies the same as with any synthetic creator: AI UGC ships volume, real UGC ships trust. We unpack where each one belongs in your funnel in AI UGC vs UGC. The short version: use Veo 3 to test angles at the top of the funnel, pay a real creator to refilm the winner before you put it on a product page.
If you want to compare Veo 3 against the other top video-ad engines (Runway, Sora 2, HeyGen, Creatify), our breakdown of the best AI ad generators walks through where each one wins, with real ad-spend results.
FAQ
How long can a Veo 3 ad be?
Each Veo 3 generation maxes out at 8 seconds. For longer ads you stitch multiple clips together in a timeline editor. Veo 3.1's scene-extension and Ingredients-to-Video features make this much cleaner than it was on Veo 2, but the per-clip limit hasn't moved.
Does Veo 3 actually generate audio?
Yes, natively. Dialogue, sound effects, and ambient noise all come out of the same generation as the video, lip-synced to the on-screen character. This is the single biggest spec change between Veo 2 and Veo 3, and it's why Veo 3 is the first model that's genuinely useful for ad production end-to-end.
Can I use Veo 3 ads commercially?
Yes, under Google's commercial-use terms (check the AI Studio / Vertex AI usage policy for your jurisdiction). Every Veo 3 output carries an invisible SynthID watermark, which doesn't prevent commercial use but does let platforms detect AI provenance. Plan for AI-disclosure on your ad creative where required, the rules are tightening fast.
Veo 3 vs Sora 2, which is better for ads?
Veo 3 wins on prompt adherence, physics realism, and the audio slot. Sora 2 is stronger on stylized, cinematic, fantasy-leaning output and has a slight edge on creative motion. For product-led ads (UGC-style, demo, testimonial, hook reels), Veo 3 is the safer pick today. For brand-led, mood-driven films, Sora 2 is competitive.
Do I need reference images to use Veo 3?
No, text-only prompts work. But if your ad features a specific product, character, or brand element that has to stay consistent across multiple clips, Veo 3.1's Ingredients to Video (up to three reference images) is what makes that consistency possible. For anything with a product hero shot, use it.
What resolution does Veo 3 output?
Up to 4K, at 24fps, in 16:9 or 9:16, with 48kHz stereo audio. That's overkill for most paid social, where 1080p is the platform standard. For ad shipping, generate at 1080p and reserve 4K for case-study reels or anything destined for a connected-TV placement.
The takeaway
Veo 3 turns AI video from a novelty into a production tool the moment you stop treating it like one. The six-slot prompt formula plus the eight-step workflow is enough to ship your first ad this week. The cost is small, the iteration loop is fast, and the audio finally sounds like it belongs in the clip.
Fresh Veo 3 prompts, workflow tweaks, and ad teardowns land each Sunday in the newsletter. For the broader picture of how AI fits into your ad stack, start with what AI advertising is and the best AI ad generators breakdown.
Keep reading
- ChatGPT Prompts for Facebook Ads [15 PROMPTS]Need ChatGPT prompts for Facebook ads that don't read like AI? Steal our 15 templates (AIDA, PAS, BAB) and ship ad copy that converts on first run.
- AI UGC vs UGC: Which Should Your Brand Use in 2026? [COMPARED]Torn between real UGC and AI UGC? We compare both (plus AIGC) on trust, cost, and conversion so you know exactly which to ship where.
- What is AI Advertising? Types, Tools, and Examples [2026 GUIDE]Curious how marketers really use AI in advertising? Our 2026 guide covers the types, tools, and real-world examples so you can run smarter campaigns.