Version Control and Querying

Understanding Deep Lake's Version control and Querying Layout

Understanding the Interaction Between Deep Lake's Versions, Queries, and Dataset Views.

Version control is the core of the Deep Lake data format, and it interacts with queries and view as follows:
  • Datasets have commits and branches, and they can be traversed or merged using Deep Lake's Python API.
  • Queries are applied on top of commits, and in order to save a query result as a view, the dataset cannot be in an uncommitted state (no changes were performed since the prior commit).
  • Each saved view is associated with a particular commit, and the view itself contains information on which dataset indices satisfied the query condition.
This logical approach was chosen in order to preserve data lineage. Otherwise, it would be possible to change data on which a query was executed, thereby potentially invalidating the saved view, since the indices that satisfied the query condition may no longer be correct after the dataset was changed.
Please check out our Getting Stated Guide to learn how to use the Python API to version your data, run queries, and save views.
An example workflow using version control and queries is shown below.