Step 4: Accessing Data
Learn how Hub Datasets can be accessed quickly or loaded from a variety of storage locations.

How to Access and Load Datasets with Hub

Loading Datasets

Hub Datasets can be loaded from a variety of storage locations using:
1
import hub
2
3
# Local Filepath
4
ds = hub.load('./my_dataset_path') # Similar functionality to hub.dataset(path)
5
6
# S3
7
ds = hub.load('s3://my_dataset_bucket', creds={...})
8
9
# Public Dataset hosted by Activeloop
10
## Activeloop Storage - See Step 6
11
ds = hub.load('hub://activeloop/public_dataset_name')
12
13
# Dataset in another organization on Activeloop Platform
14
ds = hub.load('hub://org_name/dataset_name')
Copied!
Since ds = hub.dataset(path)can be used to both create and load datasets, you may accidentally create a new dataset if there is a typo in the path you provided while intending to load a dataset. If that occurs, simply use ds.delete() to remove the unintended dataset permanently.

Referencing Tensors

Hub allows you to reference specific tensors using keys or via the "." notation outlined below.
Note: data is still not loaded by these commands.
1
### NO HIERARCHY ###
2
ds.images # is equivalent to
3
ds['images']
4
5
ds.labels # is equivalent to
6
ds['labels']
7
8
### WITH HIERARCHY ###
9
ds.localization.boxes # is equivalent to
10
ds['localization/boxes']
11
12
ds.localization.labels # is equivalent to
13
ds['localization/labels']
Copied!

Accessing Data

Data within the tensors is loaded and accessed using the .numpy() , .data() , and .tobytes() commands. When the underlying data can be converted to a numpy array, .data() and .numpy() return equivalent objects.
1
# Indexing
2
img = ds.images[0].numpy() # Fetch the 1st image and return a NumPy array
3
label = ds.labels[0].numpy(aslist=True) # Fetch the 1st label and store it as a
4
# as a list
5
6
# frame = ds.videos[0][4].numpy() # Fetch the 5th frame in the 1st video
7
# and return a NumPy array
8
9
text_labels = ds.labels[0].data()['text'] # Fetch the first labels and return them as text
10
11
# Slicing
12
imgs = ds.images[0:100].numpy() # Fetch 100 images and return a NumPy array
13
# The method above produces an exception if
14
# the images are not all the same size
15
16
labels = ds.labels[0:100].numpy(aslist=True) # Fetch 100 labels and store
17
# them as a list of NumPy arrays
Copied!
The .numpy()method will produce an exception if all samples in the requested tensor do not have a uniform shape. If that's the case, running .numpy(aslist=True)solves the problem by returning a list of NumPy arrays, where the indices of the list correspond to different samples.