Perplexity but for Video Generation

Hey folks, I’m an AI enthusiast (not a developer) who’s been experimenting with video generation tools lately — Runway, Pika, Luma, etc. One thing I’ve noticed is: each model shines in different situations (realism, anime, cinematic shots, smooth motion). But as a casual user, I never know which model is best for my prompt until I try them all, which is slow and expensive.

It got me thinking: what if there was a tool that worked like Perplexity, but for video generation?


The Core Idea

  • You give a natural prompt (like “a cinematic drone shot over snowy mountains at sunrise”).

  • The system rewrites/optimizes that prompt into versions that each model “understands” better.

  • It then fans out the prompt to multiple video models via API.

  • An evaluation layer decides which output is the best match for your request (or shows a side-by-side comparison).

Basically: a meta-layer orchestrator that sits on top of video generation models.


Why It Matters

  • Removes “model-picking anxiety” — you don’t need to know whether Runway or Pika is better for your style.

  • Lets people see outputs across engines quickly.

  • Builds a feedback loop: over time, the system learns what you (and the community) like.


What I’m Looking For

I don’t have the engineering chops to build this, but I’d love to:

  • Hear whether this is technically feasible in practice.

  • Learn how one might start with a small experiment (e.g. Runway + Pika only).

  • Find potential collaborators who are interested in hacking on this.

If this sounds fun to you, I’d love to chat! :raising_hands:

1 Like

Hear whether this is technically feasible in practice.

Feasible. There are various attempts to optimize prompts for each model (Prompt Extender, Prompt Translator). I am not familiar with existing tools for video models, but…
For example, even a simple script like the one below can achieve some results.
For more advanced approaches, you would collect a large dataset of prompts, create a dataset, and fine-tune a generative AI model. Here’s an example.

import re

CAMERA = [(r"\b(drone|aerial)\b", "smooth aerial drone glide"),
          (r"\b(orbit|circle)\b", "slow orbital move"),
          (r"\b(dolly|push in|forward)\b", "dolly forward"),
          (r"\b(pan)\b", "slow pan"),
          (r"\b(tilt)\b", "gentle tilt")]
TIMES = [(r"\b(sunrise|dawn|golden hour)\b", ("sunrise", "warm golden backlight")),
         (r"\b(sunset|dusk|twilight)\b", ("sunset", "soft amber light")),
         (r"\b(night)\b", ("night", "cool moonlight"))]

def pick(pairs, text, default):
    for pat, val in pairs:
        if re.search(pat, text, flags=re.I): return val
    return default

def detect_camera(t): return pick(CAMERA, t, "smooth camera glide")
def detect_time(t):   return pick(TIMES, t, ("unspecified time", "neutral soft light"))
def detect_style(t):  return "anime" if re.search(r"\b(anime|toon|manga|cel)\b", t, re.I) else ("photoreal" if re.search(r"\b(real|photoreal)\b", t, re.I) else "cinematic")

def clean_subject(t):
    t = re.sub(r"\b(drone|aerial|cinematic|photoreal|anime|shot|video|clip)\b", "", t, flags=re.I)
    return re.sub(r"\s{2,}", " ", t).strip(",. ").strip()

def split_setting(s):
    m = re.search(r"(.*?)(?:\b(in|over|at|inside|on)\b\s+)(.*)", s, flags=re.I)
    return (s, None) if not m else (m.group(1).strip(), f"{m.group(2)} {m.group(3)}".strip())

def positive_only(t):  # Runway: avoid negatives
    t = re.sub(r"\b(no|not|without|avoid|never)\b.*?(,|\.|$)", "", t, flags=re.I)
    return re.sub(r"\s{2,}", " ", t).strip()

def expand(prompt):
    cam = detect_camera(prompt)
    tod, light = detect_time(prompt)
    style = detect_style(prompt)
    subj = clean_subject(prompt)
    subj, setting = split_setting(subj)
    where = f", {setting}" if setting else ""
    runway = positive_only(f"{subj}{where}, {cam}, {tod}, {light}, {style} look, steady motion")
    luma   = f"Wide establishing shot of {subj}{(' ' + setting) if setting else ''}. The camera performs a {cam}. Lighting: {light} at {tod}. Style: {style}. Mood: natural."
    pika   = f"{subj}{(' ' + setting) if setting else ''}, {cam}, {tod}, {style}"
    return {"runway": runway, "luma": luma, "pika": pika}

print(expand("a cinematic drone shot over snowy mountains at sunrise"))
#{'runway': 'a, over snowy mountains at sunrise, smooth aerial drone glide, sunrise, warm golden backlight, cinematic look, steady motion',
# 'luma': 'Wide establishing shot of a over snowy mountains at sunrise. The camera performs a smooth aerial drone glide. Lighting: warm golden backlight at sunrise. Style: cinematic. Mood: natural.',
# 'pika': 'a over snowy mountains at sunrise, smooth aerial drone glide, sunrise, cinematic'}