Video Data Generator
A lightweight pipeline for generating and labeling synthetic videos using Replicate API + OpenCV.
Features
- Generates short videos from text prompts
- Stores videos and metadata locally
- Allows manual labeling with bounding clicks per frame using a simple OpenCV UI
- Saves annotations in structured
.jsonformat - Easy-to-integrate into ML pipelines
Setup
1. Install Dependencies
2. Set Replicate API Key
How It Works
generate_video_data(prompt, columns, num_rows=1)
Generates num_rows videos from a given prompt. For each video:
- Calls Replicate API (ZeroScope model version)
- Downloads the video (MP4, 24 frames, 6 FPS, 576×320)
- Saves metadata in a
.jsonlabel file
annotate_video(video_path, label_path)
Interactive tool to manually label objects in individual frames using mouse clicks.
Each click records:
Dataset Structure
Sample label file (video_0.json):
{
"prompt": "a cat riding a skateboard",
"video_path": "dataset/Video/video_0.mp4",
"annotations": [
{
"frame": 12,
"x": 200,
"y": 150,
"label": "cat"
}
],
"summary_labels": []
}
Example Usage
prompt = "a cat riding a skateboard"
columns = ["video_path", "prompt", "annotations"]
results = generate_video_data(prompt=prompt, columns=columns, num_rows=1)
for result in results:
if result["status"] == "succeeded":
annotate_video(result["video_path"], result["label_path"])
Recommended Prompts
- "a person riding a bicycle in the rain"
- "a robot walking through a forest"
- "a spaceship flying over a futuristic city"
- "a lion chasing prey across the savannah"
Notes
- Replicate model:
zeroscope-v2-xl(version:8ba52bde11...) - All videos are short clips (24 frames) suitable for downstream tasks like:
- Object detection
- Action recognition
- Prompt-to-video grounding
- Manual annotation supports single-point labels per frame, but extendable
To Do / Ideas
- [ ] Add bounding box UI
- [ ] Include auto-labeling stub (e.g. via SAM)
- [ ] Export dataset in COCO-video format