Performant Dataloader

Overview of Deep Lake's dataloader built and optimized in C++

How to use Deep Lake's performant Dataloader built and optimized in C++

Deep Lake offers an optimized dataloader implementation built in C++, which is 1.5-3X faster than the pure-python implementation, and it supports distributed training. The C++ and Python dataloaders can be used interchangeably, and their syntax varies as shown below.

Pure-Python Dataloader

train_loader = ds_train.pytorch(num_workers = 8,
                                transform = transform, 
                                batch_size = 32,
                                tensors=['images', 'labels'],
                                shuffle = True)

C++ Dataloader

The C++ dataloader is installed using pip install "deeplake[enterprise]". Details on all installation options are available here.

PyTorch (returns PyTorch Dataloader)

train_loader = ds.dataloader()\
                 .transform(transform)\
                 .batch(32)\
                 .shuffle(True)\
                 .offset(10000)\
                 .pytorch(tensors=['images', 'labels'], num_workers = 8)

TensorFlow

train_loader = ds.dataloader()\
                 .transform(transform)\
                 .batch(32)\
                 .shuffle(True)\
                 .offset(10000)\
                 .tensorflow(tensors=['images', 'labels'], num_workers = 8)

Further Information

Last updated