Performant Dataloader

Overview of Deep Lake's dataloader built and optimized in C++

How to use Deep Lake's performant Dataloader built and optimized in C++

Deep Lake offers an optimized dataloader implementation built in C++, which is 1.5-3X faster than the pure-python implementation, and it supports distributed training. The C++ and Python dataloaders can be used interchangeably, and their syntax varies as shown below.

Pure-Python Dataloader

train_loader = ds_train.pytorch(num_workers = 8,
                                transform = transform, 
                                batch_size = 32,
                                tensors=['images', 'labels'],
                                shuffle = True)

C++ Dataloader

The C++ dataloader is only accessible to registered and authenticated users, and it applies usage restrictions based on your Deep Lake Plan.

The Deep Lake query engine is only accessible to registered and authenticated users, and it applies usage restrictions based on your Deep Lake Plan.

PyTorch (returns PyTorch Dataloader)

train_loader = ds.dataloader()\
                 .transform(transform)\
                 .batch(32)\
                 .shuffle(True)\
                 .offset(10000)\
                 .pytorch(tensors=['images', 'labels'], num_workers = 8)

TensorFlow

train_loader = ds.dataloader()\
                 .transform(transform)\
                 .batch(32)\
                 .shuffle(True)\
                 .offset(10000)\
                 .tensorflow(tensors=['images', 'labels'], num_workers = 8)

Further Information