API Basics
Summary of the most important Hub API commands
Please note that the code examples below may not run standalone.

Creating and Loading Hub Datasets

1
import hub
2
​
3
# Load a Hub Dataset if it already exists (same as hub.load), or initialize
4
# a new Hub Dataset if it does not already exist (same as hub.empty)
5
ds = hub.dataset('./local_path') # Local path
6
ds = hub.dataset('hub://username/dataset_name') # Activeloop Platform Storage
7
ds = hub.dataset('s3://bucket_name/dataset_name', creds = {}) # AWS S3
8
​
9
# Load a Hub Dataset
10
ds = hub.load('dataset_path') # Can use any of the paths above
11
​
12
# Create an empty Hub Dataset
13
ds = hub.empty('dataset_path') # Can use any of the paths above
14
​
15
# Automatically create a Hub Dataset - Coming Soon
16
ds = hub.ingest.from_path('source_path', 'hub_dataset_path')
17
ds = hub.ingest.from_kaggle('kaggle_path', 'hub_dataset_path')
18
​
19
# Delete a Hub Dataset
20
ds.delete()
Copied!

Creating Tensors and Adding Data

1
# Create a tensor
2
# Specifying htype is recommended for maximizing performance.
3
# Specifying dtype is required if you desire a different dtype compared
4
# to the default dtype for the specified htype.
5
ds.create_tensor('my_tensor', htype = 'bbox', dtype = 'int32')
6
ds.create_tensor('localization/my_tensor', htype = 'bbox', dtype = 'float32')
7
​
8
# Specifiying the correct compression is critical for images, videos, and
9
# other rich data types.
10
ds.create_tensor('images', htype = 'image', sample_compression = 'jpeg')
11
​
12
# Append a single sample array at the end of a tensor
13
ds.my_tensor.append(np.ones((1,4))) # Appends an array at the end of a tensor
14
​
15
# Append multiple samples at the end of a tensor. The first axis in the
16
# numpy array is assumed to be the sample axis for the tensor
17
ds.my_tensor.extend(np.ones((5,1,4)))
18
​
19
# Append multiple samples at the end of a tensor.
20
ds.my_tensor.extend([np.ones((1,4)), np.ones((3,4)), np.ones((2,4))])
Copied!

Adding User-Specified Metadata to Datasets and Tensors

1
# Add or update dataset metadata
2
ds.info.update(key1 = 'text', key2 = number)
3
# Also can run
4
# ds.info.update({'key1'='value1', 'key2' = num_value})
5
​
6
# Add or update tensor metadata
7
ds.my_tensor.info.update(key1 = 'text', key2 = number)
8
​
9
#Delete metadata
10
ds.info.delete() #Delete all metadata
11
ds.info.delete('key1') #Delete 1 key in metadata
12
ds.info.delete(['key1', 'key2']) #Delete multiple keys in metadata
Copied!

Maximizing performance

1
# Data gets written to long-terms storage at the end of the 'with'
2
# block or whenever the cache is full. This minimizes the number of
3
# write operations during dataset creation.
4
​
5
ds = hub.load('dataset_path')
6
​
7
with ds:
8
​
9
ds.create_tensor('my_tensor')
10
11
for i in range(10):
12
ds.my_tensor.append(i)
Copied!

Accessing Tensor Data

1
# Read tensor sample into numpy array
2
np_array = ds.my_tensor[0].numpy()
3
​
4
# Read multiple tensor samples into numpy array
5
# Returns an error if tensor samples do not have equal shape
6
np_array = ds.my_tensor[0:10].numpy()
7
​
8
# Read multiple tensor samples into a list of numpy arrays
9
np_array_list = ds.my_tensor[0:10].numpy(aslist=True)
Copied!

Connecting Hub Datasets to ML Frameworks

1
# PyTorch Dataloader
2
dataloader = ds.pytorch(batch_size = 16, num_workers = 2)
3
​
4
# TensorFlow Dataset
5
ds_tensorflow = ds.tensorflow()
Copied!
Last modified 2mo ago