Dataset Visualization

How to visualize Deep Lake datasets

How to visualize machine learning datasets

Deep Lake has a web interface for visualizing, versioning, and querying machine learning datasets. It utilizes the Deep Lake format under-the-hood, and it can be connected to datasets stored in all Deep Lake storage locations.

Visualization can be performed in 3 ways:

  1. In the Deep Lake UI (most feature-rich and performant option)

  2. In the python API using ds.visualize()

  3. In your own application using our integration options.

Requirements for correctly visualizing your own datasets

Deep Lake makes assumptions about underlying data types and relationships between tensors in order to display the data correctly. Understanding the following concepts is necessary in order to use the visualizer:

Visualizer Controls and Modes

Downsampling Data for Faster Visualization

For faster visualization of images and masks, tensors can be downsampled during dataset creation. The downsampled data are stored in the dataset and are automatically rendered by the visualizer depending on the zoom level.

To add downsampling to your tensors, specify the downsampling factor and the number of downsampling layers during tensor creation:

# 3X downsampling per layer, 2X layers
ds.create_tensor('images', htype = 'image', downsampling = (3,2))

Note: since downsampling requires decompression and recompression of data, it will slow down dataset ingestion.