Video Data Generator

A lightweight pipeline for generating and labeling synthetic videos using Replicate API + OpenCV.

Features

Generates short videos from text prompts
Stores videos and metadata locally
Allows manual labeling with bounding clicks per frame using a simple OpenCV UI
Saves annotations in structured .json format
Easy-to-integrate into ML pipelines

Setup

1. Install Dependencies

pip install opencv-python requests

2. Set Replicate API Key

export REPLICATE_API_TOKEN=your-api-key

How It Works

`generate_video_data(prompt, columns, num_rows=1)`

Generates num_rows videos from a given prompt. For each video:

Calls Replicate API (ZeroScope model version)
Downloads the video (MP4, 24 frames, 6 FPS, 576×320)
Saves metadata in a .json label file

`annotate_video(video_path, label_path)`

Interactive tool to manually label objects in individual frames using mouse clicks.
Each click records:

{
  "frame": 12,
  "x": 214,
  "y": 156,
  "label": "person"
}

Dataset Structure

dataset/
└── Video/
    ├── video_0.mp4
    └── labels/
        └── video_0.json

Sample label file (`video_0.json`):

{
  "prompt": "a cat riding a skateboard",
  "video_path": "dataset/Video/video_0.mp4",
  "annotations": [
    {
      "frame": 12,
      "x": 200,
      "y": 150,
      "label": "cat"
    }
  ],
  "summary_labels": []
}

Example Usage

prompt = "a cat riding a skateboard"
columns = ["video_path", "prompt", "annotations"]

results = generate_video_data(prompt=prompt, columns=columns, num_rows=1)

for result in results:
    if result["status"] == "succeeded":
        annotate_video(result["video_path"], result["label_path"])

Recommended Prompts

"a person riding a bicycle in the rain"
"a robot walking through a forest"
"a spaceship flying over a futuristic city"
"a lion chasing prey across the savannah"

Notes

Replicate model: zeroscope-v2-xl (version: 8ba52bde11...)
All videos are short clips (24 frames) suitable for downstream tasks like:
Object detection
Action recognition
Prompt-to-video grounding
Manual annotation supports single-point labels per frame, but extendable

To Do / Ideas

[ ] Add bounding box UI
[ ] Include auto-labeling stub (e.g. via SAM)
[ ] Export dataset in COCO-video format