Step 4: Accessing Data
Accessing and loading Hub Datasets.

Loading Datasets

Hub Datasets can be loaded and created in a variety of storage locations with minimal configuration.
import hub
# Local Filepath
ds = hub.load('./my_dataset_path')
# S3
ds = hub.load('s3://my_dataset_bucket', creds={...})
## Activeloop Storage - See Step 6
# Public Dataset hosted by Activeloop
ds = hub.load('hub://activeloop/public_dataset_name')
# Dataset in another workspace on Activeloop Platform
ds = hub.load('hub://workspace_name/dataset_name')
Since ds = hub.dataset(path)can be used to both create and load datasets, you may accidentally create a new dataset if there is a typo in the path you provided while intending to load a dataset. If that occurs, simply use ds.delete() to remove the unintended dataset permanently.

Referencing Tensors

Hub allows you to reference specific tensors using keys or via the "." notation outlined below.
Note: data is still not loaded by these commands.
ds.images # is equivalent to
ds.labels # is equivalent to
ds.localization.boxes # is equivalent to
ds.localization.labels # is equivalent to

Accessing Data

Data within the tensors is loaded and accessed using the .numpy() command:
# Indexing
W = ds.images[0].numpy() # Fetch an image and return a NumPy array
X = ds.labels[0].numpy(aslist=True) # Fetch a label and store it as a
# list of NumPy arrays
# Slicing
Y = ds.images[0:100].numpy() # Fetch 100 images and return a NumPy array
# The method above produces an exception if
# the images are not all the same size
Z = ds.labels[0:100].numpy(aslist=True) # Fetch 100 labels and store
# them as a list of NumPy arrays
The .numpy()method will produce an exception if all samples in the requested tensor do not have a uniform shape. If that's the case, running .numpy(aslist=True)solves the problem by returning a list of NumPy arrays, where the indices of the list correspond to different samples.
Copy link
On this page
Loading Datasets
Referencing Tensors
Accessing Data