API Basics
Summary of the most important Hub API commands.
Please note that the code examples below may not run standalone.

Creating and Loading Hub Datasets

1
import hub
2
​
3
# Load a Hub Dataset if it already exists (same as hub.load), or initialize
4
# a new Hub Dataset if it does not already exist (same as hub.empty)
5
ds = hub.dataset('./local_path') # Local path
6
ds = hub.dataset('hub://username/dataset_name') # Activeloop Platform Storage
7
ds = hub.dataset('s3://bucket_name/dataset_name') # AWS S3
8
ds = hub.dataset('gcs://bucket_name/dataset_name') # Google Cloud Storage
9
ds = hub.dataset('mem://dataset_name') # In-memory dataset that is not stored permanantly
10
# This is useful for testing.
11
​
12
# Load a Hub Dataset
13
ds = hub.load('dataset_path') # Can use any of the paths above
14
​
15
# Create an empty Hub Dataset
16
ds = hub.empty('dataset_path') # Can use any of the paths above
17
​
18
# Create an dataset with the same tensors as another dataset
19
ds = hub.like(ds_object or 'dataset_path')
20
​
21
# Automatically create a Hub Dataset
22
ds = hub.ingest('source_path', 'hub_dataset_path')
23
ds = hub.ingest_kaggle('kaggle_path', 'hub_dataset_path')
24
​
25
# Delete a Hub Dataset
26
ds.delete()
Copied!

Creating Tensors and Adding Data

1
# Create a tensor
2
# Specifying htype is recommended for maximizing performance.
3
# Specifying dtype is required if you desire a different dtype compared
4
# to the default dtype for the specified htype.
5
ds.create_tensor('my_tensor', htype = 'bbox', dtype = 'int32')
6
​
7
# Specifiying the correct compression is critical for images, videos, and
8
# other rich data types.
9
ds.create_tensor('images', htype = 'image', sample_compression = 'jpeg')
10
​
11
# Create tensor hierarchies
12
ds.create_group('my_group')
13
ds.my_group.create_tensor('my_tensor')
14
ds.create_tensor('my_group/my_tensor') #Automatically creates the group 'my_group'
15
​
16
# Append a single sample array at the end of a tensor
17
ds.my_tensor.append(np.ones((1,4)))
18
​
19
# Append multiple samples at the end of a tensor. The first axis in the
20
# numpy array is assumed to be the sample axis for the tensor
21
ds.my_tensor.extend(np.ones((5,1,4)))
22
​
23
# Append multiple samples at the end of a tensor.
24
ds.my_tensor.extend([np.ones((1,4)), np.ones((3,4)), np.ones((2,4))])
Copied!

Adding User-Specified Metadata to Datasets and Tensors

1
# Add or update dataset metadata
2
ds.info.update(key1 = 'text', key2 = number)
3
# Also can run
4
# ds.info.update({'key1'='value1', 'key2' = num_value})
5
​
6
# Add or update tensor metadata
7
ds.my_tensor.info.update(key1 = 'text', key2 = number)
8
​
9
#Delete metadata
10
ds.info.delete() #Delete all metadata
11
ds.info.delete('key1') #Delete 1 key in metadata
12
ds.info.delete(['key1', 'key2']) #Delete multiple keys in metadata
Copied!

Maximizing performance

1
# Data gets written to long-terms storage at the end of the 'with'
2
# block or whenever the cache is full. This minimizes the number of
3
# write operations during dataset creation.
4
​
5
ds = hub.load('dataset_path')
6
​
7
with ds:
8
​
9
ds.create_tensor('my_tensor')
10
11
for i in range(10):
12
ds.my_tensor.append(i)
Copied!

Accessing Tensor Data

1
# Read tensor sample into numpy array
2
np_array = ds.my_tensor[0].numpy()
3
​
4
# Read tensor samples from tensors in a hierarchical group
5
np_array_1 = ds.my_group.my_tensor_1[0].numpy()
6
np_array_2 = ds.my_group.my_tensor_2[0].numpy()
7
​
8
# Read multiple tensor samples into numpy array
9
# Returns an error if tensor samples do not have equal shape
10
np_array = ds.my_tensor[0:10].numpy()
11
​
12
# Read multiple tensor samples into a list of numpy arrays
13
np_array_list = ds.my_tensor[0:10].numpy(aslist=True)
Copied!

Connecting Hub Datasets to ML Frameworks

1
# PyTorch Dataloader
2
dataloader = ds.pytorch(batch_size = 16, num_workers = 2)
3
​
4
# TensorFlow Dataset
5
ds_tensorflow = ds.tensorflow()
Copied!
Last modified 26d ago