Pipeline for building large-scale video datasets, covering the full workflow from raw data to training-ready format:
- Download (
0_download/) - bulk video acquisition - Curation (
1_curation/) - scene detection and splitting, frame extraction - Captioning (
2_captioning/) - audio, frame, and video captioning via VLMs - Training (
3_training/) - training of ViCLIP, CLAP and CLIP