Deep Lake Dataloaders

Overview of Deep Lake's dataloader built and optimized in C++

How to use Deep Lake's Dataloaders for Training Models

Deep Lake offers an optimized dataloader implementation built in C++, which is 1.5-3X faster than the pure-python implementation, and it supports distributed training. The C++ and Python dataloaders can be used interchangeably, and their syntax varies as shown below.

The Deep Lake Compute Engine is only accessible to registered and authenticated users, and it applies usage restrictions based on your Deep Lake Plan.

Pure-Python Dataloader

train_loader = ds_train.pytorch(num_workers = 8,
                                transform = transform, 
                                batch_size = 32,
                                tensors=['images', 'labels'],
                                shuffle = True)

C++ Dataloader

The C++ dataloader is only accessible to registered and authenticated users.

The Deep Lake query engine is only accessible to registered and authenticated users, and it applies usage restrictions based on your Deep Lake Plan.

PyTorch (returns PyTorch Dataloader)

train_loader = ds.dataloader()\
                 .transform(transform)\
                 .batch(32)\
                 .shuffle(True)\
                 .offset(10000)\
                 .pytorch(tensors=['images', 'labels'], num_workers = 8)

TensorFlow

train_loader = ds.dataloader()\
                 .transform(transform)\
                 .batch(32)\
                 .shuffle(True)\
                 .offset(10000)\
                 .tensorflow(tensors=['images', 'labels'], num_workers = 8)

Further Information

Training ModelsTraining Reproducibility Using Deep Lake and Weights & Biases