Version Control and Querying
Understanding Deep Lake's Version control and Querying Layout
Version control is the core of the Deep Lake data format, and it interacts with queries and view as follows:
- Datasets have commits and branches, and they can be traversed or merged using Deep Lake's Python API.
- Queries are applied on top of commits, and in order to save a query result as a
view, the dataset cannot be in an uncommitted state (no changes were performed since the prior commit).
- Each saved
viewis associated with a particular commit, and the view itself contains information on which dataset indices satisfied the query condition.
This logical approach was chosen in order to preserve data lineage. Otherwise, it would be possible to change data on which a query was executed, thereby potentially invalidating the saved view, since the indices that satisfied the query condition may no longer be correct after the dataset was changed.
An example workflow using version control and queries is shown below.