Creating Video Datasets
How to convert a video dataset to Hub format.

This tutorial is also available as a Colab Notebook​

Video datasets are becoming increasingly common in Computer Vision applications. This tutorial demonstrates how to convert a simple video classification dataset into Hub format. Uploading videos in Hub is nearly identical as uploading images, aside from minor differences in sample compression that are described below.

Create the Hub Dataset

The first step is to download the small dataset below called running walking.
running_walking.zip
7MB
Binary
animals object detection dataset
The dataset has the following folder structure:
1
data_dir
2
|_running
3
|_video_1.mp4
4
|_video_2.mp4
5
|_walking
6
|_video_3.mp4
7
|_video_4.mp4
Copied!
Now that you have the data, let's create a Hub Dataset in the ./running_walking_hub folder by running:
1
import hub
2
from PIL import Image, ImageDraw
3
import numpy as np
4
import os
5
​
6
ds = hub.empty('./running_walking_hub') # Create the dataset locally
Copied!
Next, let's inspect the folder structure for the source dataset ./running_walking to find the class names and the files that need to be uploaded to the Hub dataset.
1
# Find the class_names and list of files that need to be uploaded
2
dataset_folder = './running_walking'
3
​
4
class_names = os.listdir(dataset_folder)
5
​
6
fn_vids = []
7
for dirpath, dirnames, filenames in os.walk(dataset_folder):
8
for filename in filenames:
9
fn_vids.append(os.path.join(dirpath, filename))
Copied!
Finally, let's create the tensors and iterate through all the images in the dataset in order to populate the data in Hub.
They key difference between video and image htypes is that Hub does not explicitly perform compression for videos. The sample_compression input in the create_tensor function is used to verify that the compression of the input video file to hub.read()matches the sample_compression parameter. If there is a match, the video is uploaded in compressed format. Otherwise, an error is thrown.
Images have a slightly different behavior, because the input image files are stored and re-compressed (if necessary) to the sample_compression format.
1
with ds:
2
ds.create_tensor('videos', htype='image', sample_compression = 'mp4')
3
ds.create_tensor('labels', htype='class_label', class_names = class_names)
4
​
5
for fn_vid in fn_vids:
6
label_text = os.path.basename(os.path.dirname(fn_vid))
7
label_num = class_names.index(label_text)
8
​
9
# Append data to tensors
10
ds.videos.append(hub.read(fn_vid))
11
ds.labels.append(np.uint32(label_num))
Copied!
In order for Activeloop Platform to correctly visualize the labels, class_names must be a list of strings, where the numerical labels correspond to the index of the label in the list.

Inspect the Hub Dataset

Let's check out the first frame in the third sample from this dataset.
1
ind = 2
2
img = Image.fromarray(ds.video[ind][0].numpy())
Copied!
1
# Load the numberic label and read the class name from ds.labels.info.class_names
2
ds.labels.info.class_names[ds.labels[ind].numpy()[0]]# Display the frame
Copied!
1
img
Copied!
Congrats! You just created a video classification dataset! πŸŽ‰
Last modified 2d ago