Restoring Corrupted Datasets

Restoring Deep Lake datasets that may be corrupted.

How to restore a corrupted Deep Lake dataset

Deliberate of accidental interruption of code may make a Deep Lake dataset or some of its tensors unreadable. At scale, code interruption is more likely to occur, and Deep Lake's version control is the primary tool for recovery.

How to Use Version Control to Retrieve Data

When manipulating Deep Lake datasets, it is recommended to commit periodically in order to create snapshots of the dataset that can be accessed later. This can be done automatically when creating datasets with deeplake.compute, or manually using our version control API.

If a dataset becomes corrupted, when loading the dataset, you may see an error like:

DatasetCorruptError: Exception occured (see Traceback). The dataset maybe corrupted. Try using `reset=True` to reset HEAD changes and load the previous commit. This will delete all uncommitted changes on the branch you are trying to load.

To reset the uncommitted corrupted changes, load the dataset with the reset = True flag:

ds = deeplake.load(<dataset_path>, reset = True)

Note: this operation deletes all uncommitted changes.

Last updated