Creating Hub datasets is simple, you have full control over connecting your source data (files, images, etc.) to specific tensors in the Hub Dataset.
Let's follow along with the example below to create our first dataset. First, download and unzip the small classification dataset below called animals dataset.
The dataset has the following folder structure:
Now that you have the data, you can create a Hub Dataset and initialize its tensors. Running the following code will create Hub dataset inside of the
import hubfrom PIL import Imageimport numpy as npimport osimport globhub_dataset_path = './animals_hub'ds = hub.empty(hub_dataset_path) # Creates the dataset# Create the tensors with names of your choice.ds.create_tensor('images', htype = 'image', sample_compression = 'jpeg')ds.create_tensor('labels', htype = 'class_label')# Add arbitrary metadata - Optionalds.info.update(description = 'My first Hub dataset')ds.images.info.update(camera_type = 'SLR')
Next populate data in the tensors using the following code:
dataset_folders = glob.glob('./animals/*') #Paths to source data# Iterate through the subfolders (/dogs, /cats)for label, folder_path in enumerate(dataset_folders):paths = glob.glob(os.path.join(folder_path, '*')) # Get subfolders# Iterate through images in the subfoldersfor path in paths:ds.images.append(hub.read(path)) # Append to images tensor using hub.readds.labels.append(np.uint32(label)) # Append to labels tensor
Check out the first image from this dataset. More details about Accessing Data are available in Step 5.
Often it's important to create tensors hierarchically, because information between tensors may be inherently coupled—such as bounding boxes and their corresponding labels. Hierarchy can be created using the following lines of code:
ds.create_tensor('localization/bbox')ds.create_tensor('localization/label')# Tensors are accessed via:ds.localization.bboxds.localization.label
For more detailed information regarding accessing datasets and their tensors, check out the next section.