Perplexity but for Video Generation

travelingwilbury07 · August 18, 2025, 6:40pm

Hey folks, I’m an AI enthusiast (not a developer) who’s been experimenting with video generation tools lately — Runway, Pika, Luma, etc. One thing I’ve noticed is: each model shines in different situations (realism, anime, cinematic shots, smooth motion). But as a casual user, I never know which model is best for my prompt until I try them all, which is slow and expensive.

It got me thinking: what if there was a tool that worked like Perplexity, but for video generation?

The Core Idea

You give a natural prompt (like “a cinematic drone shot over snowy mountains at sunrise”).
The system rewrites/optimizes that prompt into versions that each model “understands” better.
It then fans out the prompt to multiple video models via API.
An evaluation layer decides which output is the best match for your request (or shows a side-by-side comparison).

Basically: a meta-layer orchestrator that sits on top of video generation models.

Why It Matters

Removes “model-picking anxiety” — you don’t need to know whether Runway or Pika is better for your style.
Lets people see outputs across engines quickly.
Builds a feedback loop: over time, the system learns what you (and the community) like.

What I’m Looking For

I don’t have the engineering chops to build this, but I’d love to:

Hear whether this is technically feasible in practice.
Learn how one might start with a small experiment (e.g. Runway + Pika only).
Find potential collaborators who are interested in hacking on this.

If this sounds fun to you, I’d love to chat!

John6666 · August 19, 2025, 5:09am

Hear whether this is technically feasible in practice.

Feasible. There are various attempts to optimize prompts for each model (Prompt Extender, Prompt Translator). I am not familiar with existing tools for video models, but…
For example, even a simple script like the one below can achieve some results.
For more advanced approaches, you would collect a large dataset of prompts, create a dataset, and fine-tune a generative AI model. Here’s an example.

import re

CAMERA = [(r"\b(drone|aerial)\b", "smooth aerial drone glide"),
          (r"\b(orbit|circle)\b", "slow orbital move"),
          (r"\b(dolly|push in|forward)\b", "dolly forward"),
          (r"\b(pan)\b", "slow pan"),
          (r"\b(tilt)\b", "gentle tilt")]
TIMES = [(r"\b(sunrise|dawn|golden hour)\b", ("sunrise", "warm golden backlight")),
         (r"\b(sunset|dusk|twilight)\b", ("sunset", "soft amber light")),
         (r"\b(night)\b", ("night", "cool moonlight"))]

def pick(pairs, text, default):
    for pat, val in pairs:
        if re.search(pat, text, flags=re.I): return val
    return default

def detect_camera(t): return pick(CAMERA, t, "smooth camera glide")
def detect_time(t):   return pick(TIMES, t, ("unspecified time", "neutral soft light"))
def detect_style(t):  return "anime" if re.search(r"\b(anime|toon|manga|cel)\b", t, re.I) else ("photoreal" if re.search(r"\b(real|photoreal)\b", t, re.I) else "cinematic")

def clean_subject(t):
    t = re.sub(r"\b(drone|aerial|cinematic|photoreal|anime|shot|video|clip)\b", "", t, flags=re.I)
    return re.sub(r"\s{2,}", " ", t).strip(",. ").strip()

def split_setting(s):
    m = re.search(r"(.*?)(?:\b(in|over|at|inside|on)\b\s+)(.*)", s, flags=re.I)
    return (s, None) if not m else (m.group(1).strip(), f"{m.group(2)} {m.group(3)}".strip())

def positive_only(t):  # Runway: avoid negatives
    t = re.sub(r"\b(no|not|without|avoid|never)\b.*?(,|\.|$)", "", t, flags=re.I)
    return re.sub(r"\s{2,}", " ", t).strip()

def expand(prompt):
    cam = detect_camera(prompt)
    tod, light = detect_time(prompt)
    style = detect_style(prompt)
    subj = clean_subject(prompt)
    subj, setting = split_setting(subj)
    where = f", {setting}" if setting else ""
    runway = positive_only(f"{subj}{where}, {cam}, {tod}, {light}, {style} look, steady motion")
    luma   = f"Wide establishing shot of {subj}{(' ' + setting) if setting else ''}. The camera performs a {cam}. Lighting: {light} at {tod}. Style: {style}. Mood: natural."
    pika   = f"{subj}{(' ' + setting) if setting else ''}, {cam}, {tod}, {style}"
    return {"runway": runway, "luma": luma, "pika": pika}

print(expand("a cinematic drone shot over snowy mountains at sunrise"))
#{'runway': 'a, over snowy mountains at sunrise, smooth aerial drone glide, sunrise, warm golden backlight, cinematic look, steady motion',
# 'luma': 'Wide establishing shot of a over snowy mountains at sunrise. The camera performs a smooth aerial drone glide. Lighting: warm golden backlight at sunrise. Style: cinematic. Mood: natural.',
# 'pika': 'a over snowy mountains at sunrise, smooth aerial drone glide, sunrise, cinematic'}

Topic		Replies	Views
Image to video - many images as input Models	1	105	July 9, 2025
Text 2 Video -> Wan2_1-T2V-1_3B_fp32 Models	0	35	May 2, 2025
Video and picture making ai Beginners	3	102	July 2, 2025
Image-to-video model like GEN-2 Models	0	1178	August 7, 2023
Image to Video testing Beginners	2	69	June 7, 2025

Perplexity but for Video Generation

The Core Idea

Why It Matters

What I’m Looking For

Related topics