Step 6: Connecting Hub Datasets to ML Frameworks

Connecting Hub Datasets to machine learning frameworks such as PyTorch and TensorFlow.

You can connect Hub Datasets to popular ML frameworks such as PyTorch and TensorFlow using minimal boilerplate code, and Hub takes care of the parallel processing!


You can train a model by creating a PyTorch DataLoader from a Hub Dataset using ds.pytorch() .

import hub
from import DataLoader
ds = hub.dataset('./dataset_path') # Hub Dataset
dataloader= ds.pytorch(batch_size = 16, num_workers = 2) #PyTorch Dataloader
for data in dataloader:
# Training Loop

TensorFlow - Coming Soon

Similarly, you can convert a Hub Dataset to a TensorFlow Dataset via the tf.Data API.

ds # Hub Dataset object, to be used for training
ds = ds.tensorflow() # A TensorFlow Dataset