Step 1: Hello World
Installing Hub and accessing your first Hub Dataset.

Installing Hub

Hub can be installed through pip. By default, Hub does not install dependencies for audio, video, and google-cloud (GCS) support.
1
pip install hub
Copied!
Support for audio, video, and GSC can be installed via:
1
pip install "hub[av]" -> Video and Audio support via PyAV
2
3
pip install "hub[gcp]" -> GSS support via google-* dependencies
4
5
pip install "hub[all]" -> Installs everything - audio, video and GCS support
Copied!

Fetching Your First Hub Dataset

Let's load MNIST, the hello world dataset of machine learning.
First, instantiate a Dataset by pointing to its storage location. Datasets hosted on Activeloop Platform are typically identified by the namespace of the organization followed by the dataset name: activeloop/mnist-train.
1
import hub
2
3
dataset_path = 'hub://activeloop/mnist-train'
4
ds = hub.load(dataset_path) # Returns a Hub Dataset but does not download data locally
Copied!

Reading Samples From a Hub Dataset

Data is not immediately read into memory because Hub operates lazily. You can fetch data by calling the .numpy() method, which reads data into a NumPy array.
1
# Indexing
2
W = ds.images[0].numpy() # Fetch the 1st image and return a NumPy array
3
X = ds.labels[0].numpy(aslist=True) # Fetch the 1st label and store it as a
4
# as a list
5
6
V = ds.videos[0][4].numpy() # Fetch the 5th frame in the 1st video
7
# and return a NumPy array
8
9
# Slicing
10
Y = ds.images[0:100].numpy() # Fetch 100 images and return a NumPy array
11
# The method above produces an exception if
12
# the images are not all the same size
13
14
Z = ds.labels[0:100].numpy(aslist=True) # Fetch 100 labels and store
15
# them as a list of NumPy arrays
Copied!
Congratulations, you've got Hub working on your local machine
🤓