Concurrent writes in Deep Lake
Deep Lake offers 3 solutions for concurrently writing data, depending on the required scale of the application. Concurrency is not native to the Deep Lake format, so these solutions use locks and queues to schedule and linearize the write operations to Deep Lake.
COMING SOON. Deep Lake will offer a Managed Tensor Database that supports read (search) and write operations at scale. Deep Lake ensures the operations are performant by provisioning the necessary infrastructure and executing the underlying user requests in a distributed manner. This approach is recommended for production applications that require a separate service to handle the high computational loads of vector search.
Deep Lake datasets internally support file-based locks. File-base locks are generally slower and less reliable that the other listed solutions, and they should only be used for prototyping.
By default, Deep Lake datasets are loaded in write mode and a lock file is created. This can be avoided by specifying
read_only = Trueto APIs that load datasets.
An error will occur if the Deep Lake dataset is locked and the user tries to open it in write mode. To specify a waiting time for the lock to be released, you can specify
lock_timeout = <timeout_in_s>to APIs that load datasets.
Locks can manually be set or released using:
from deeplake.core.lock import lock_dataset, unlock_dataset