How to Write Data Concurrently to Deep Lake Datasets
Deep Lake offers 3 solutions for concurrently writing data, depending on the required scale of the application. Concurrency is not native to the Deep Lake format, so these solutions use locks and queues to schedule and linearize the write operations to Deep Lake.
Concurrency Using External Locks
Concurrent writes can be supported using an in-memory database that serves as the locking mechanism for Deep Lake datasets. Tools such as Zookeper or Redis are highly performant and reliable and can be deployed using a few lines of code. External locks are recommended for small-to-medium workloads.
COMING SOON. Deep Lake will offer a Managed Tensor Database that supports read (search) and write operations at scale. Deep Lake ensures the operations are performant by provisioning the necessary infrastructure and executing the underlying user requests in a distributed manner. This approach is recommended for production applications that require a separate service to handle the high computational loads of vector search.
Concurrency Using Deep Lake Locks
Deep Lake datasets internally support file-based locks. File-base locks are generally slower and less reliable that the other listed solutions, and they should only be used for prototyping.
By default, Deep Lake datasets are loaded in write mode and a lock file is created. This can be avoided by specifying read_only = True to APIs that load datasets.
An error will occur if the Deep Lake dataset is locked and the user tries to open it in write mode. To specify a waiting time for the lock to be released, you can specify lock_timeout = <timeout_in_s> to APIs that load datasets.
Locks can manually be set or released using:
from deeplake.core.lock import lock_dataset, unlock_dataset