API Basics
Summary of the most important Hub API commands
Please note that the code examples below may not run standalone.

Creating and Loading Hub Datasets

1
from hub import Dataset
2
​
3
# Load a Hub Dataset if it already exists, or initialize a new
4
# Hub Dataset if it does not already exist.
5
ds = Dataset('./local_path') # Local path
6
ds = Dataset('hub://username/dataset_name') # Activeloop Platform Storage
7
ds = Dataset('s3://bucket_name/dataset_name', creds = {}) # AWS S3
8
​
9
# Automatically create a Hub Dataset - Coming Soon
10
ds = Dataset.from_path('source_path', 'hub_dataset_path')
11
ds = Dataset.from_kaggle('kaggle_path', 'hub_dataset_path')
12
​
13
# Delete a Hub Dataset
14
ds.delete()
Copied!

Creating Tensors and Adding Data

1
# Create a tensor
2
# Specifying htype and dtype is recommended for maximizing performance
3
ds.create_tensor('my_tensor', htype = 'bbox', dtype = 'int32')
4
ds.create_tensor('localization/my_tensor', htype = 'bbox', dtype = 'float32')
5
​
6
# Specifiying the correct compression is critical for images, videos, and
7
# other rich data types.
8
ds.create_tensor('images', htype = 'image', sample_compression = 'jpeg')
9
​
10
# Append a single sample array at the end of a tensor
11
ds.my_tensor.append(np.ones((1,4))) # Appends an array at the end of a tensor
12
​
13
# Append multiple samples at the end of a tensor. The first axis in the
14
# numpy array is assumed to be the sample axis for the tensor
15
ds.my_tensor.extend(np.ones((5,1,4)))
16
​
17
# Append multiple samples at the end of a tensor.
18
ds.my_tensor.extend([np.ones((1,4)), np.ones((3,4)), np.ones((2,4))])
Copied!

Maximizing performance

1
# Data gets written to long-terms storage at the end of the 'with'
2
# block or whenever the cache is full. This minimizes the number of
3
# write operations during dataset creation.
4
​
5
with Dataset('dataset_path') as ds:
6
​
7
ds.create_tensor('my_tensor')
8
9
for i in range(10):
10
ds.my_tensor.append(i)
Copied!

Accessing Tensor Data

1
# Read tensor sample into numpy array
2
np_array = ds.my_tensor[0].numpy()
3
​
4
# Read multiple tensor samples into numpy array
5
# Returns an error if tensor samples do not have equal shape
6
np_array = ds.my_tensor[0:10].numpy()
7
​
8
# Read multiple tensor samples into a list of numpy arrays
9
np_array_list = ds.my_tensor[0:10].numpy(aslist=True)
Copied!

Connecting Hub Datasets to ML Frameworks

1
# PyTorch Dataloader
2
dataloader = ds.pytorch()
3
​
4
# TensorFlow Dataset
5
ds_tensorflow = ds.tensorflow()
Copied!
Last modified 6mo ago