Skip to content

Video Data Generator

A lightweight pipeline for generating and labeling synthetic videos using Replicate API + OpenCV.


Features

  • Generates short videos from text prompts
  • Stores videos and metadata locally
  • Allows manual labeling with bounding clicks per frame using a simple OpenCV UI
  • Saves annotations in structured .json format
  • Easy-to-integrate into ML pipelines

Setup

1. Install Dependencies

pip install opencv-python requests

2. Set Replicate API Key

export REPLICATE_API_TOKEN=your-api-key

How It Works

generate_video_data(prompt, columns, num_rows=1)

Generates num_rows videos from a given prompt. For each video:

  • Calls Replicate API (ZeroScope model version)
  • Downloads the video (MP4, 24 frames, 6 FPS, 576×320)
  • Saves metadata in a .json label file

annotate_video(video_path, label_path)

Interactive tool to manually label objects in individual frames using mouse clicks.
Each click records:

{
  "frame": 12,
  "x": 214,
  "y": 156,
  "label": "person"
}

Dataset Structure

dataset/
└── Video/
    ├── video_0.mp4
    └── labels/
        └── video_0.json

Sample label file (video_0.json):

{
  "prompt": "a cat riding a skateboard",
  "video_path": "dataset/Video/video_0.mp4",
  "annotations": [
    {
      "frame": 12,
      "x": 200,
      "y": 150,
      "label": "cat"
    }
  ],
  "summary_labels": []
}

Example Usage

prompt = "a cat riding a skateboard"
columns = ["video_path", "prompt", "annotations"]

results = generate_video_data(prompt=prompt, columns=columns, num_rows=1)

for result in results:
    if result["status"] == "succeeded":
        annotate_video(result["video_path"], result["label_path"])

  • "a person riding a bicycle in the rain"
  • "a robot walking through a forest"
  • "a spaceship flying over a futuristic city"
  • "a lion chasing prey across the savannah"

Notes

  • Replicate model: zeroscope-v2-xl (version: 8ba52bde11...)
  • All videos are short clips (24 frames) suitable for downstream tasks like:
  • Object detection
  • Action recognition
  • Prompt-to-video grounding
  • Manual annotation supports single-point labels per frame, but extendable

To Do / Ideas

  • [ ] Add bounding box UI
  • [ ] Include auto-labeling stub (e.g. via SAM)
  • [ ] Export dataset in COCO-video format