A jump-start guide to using Hub.

How to Get Started with Activeloop Hub in Under 5 Minutes

Installing Hub

Hub can be installed through pip. By default, Hub does not install dependencies for audio, video, and google-cloud (GCS) support. Full installation details are available here.
$ pip3 install hub

Fetching Your First Hub Dataset

Let's load MNIST dataset, the hello world dataset of machine learning. Datasets hosted on Activeloop Platform are typically identified by host organization name followed by the dataset name: activeloop/mnist-train.
import hub
dataset_path = 'hub://activeloop/mnist-train'
ds = hub.load(dataset_path) # Returns a Hub Dataset but does not download data locally

Reading Samples From a Hub Dataset

Data is not immediately read into memory because Hub operates lazily. You can fetch data by calling the.numpy()method, which reads data into a numpy array.
# Indexing
image = ds.images[0].numpy() # Fetch an image and return a numpy array
labels = ds.labels[0].data() # Fetch the first labels
# Slicing
img_array = ds.images[0:100].numpy() # Fetch 100 images and return a numpy array
# The method above produces an exception if
# the images are not all the same size
img_list = ds.labels[0:100].numpy(aslist=True) # Fetch 100 labels and store
# them as a list of numpy arrays

Visualizing a Hub Dataset

Hub enables users to visualize and interpret large datasets. The tensor layout for a dataset can be inspected using:
The dataset can be visualized in Activeloop Platform, or using an iframe in a Jupyter notebook:
Visualizing datasets in Activeloop Platform will unlock more features and faster performance compared to visualization in Jupyter notebooks.
Congratulations, you've got Hub working on your local machine