Creating Time-Series Datasets
Deep Lake is a powerful tool for easily storing and sharing time-series datasets with your team.
Deep Lake is a powerful tool for easily storing and sharing time-series datasets with your team.
Deep Lake is intuitive format for storing large time-series datasets and it offers compression for reducing storage costs. This tutorial demonstrates how to convert a time-series data to Deep Lake format and load the data for plotting.
The first step is to download the small dataset below called sensor data.
This is a subset of a , and it contains the iPhone x,y,z acceleration for 24 users (subjects) under conditions of walking and jogging. The dataset has the folder structure below. subjects_info.csv
contains metadata such as height
, weight
, etc. for each subject, and the sub_n.csv
files contains the time-series acceleration data for the nth
subject.
Now that you have the data, let's create a Deep Lake Dataset in the ./sensor_data_deeplake
folder by running:
Next, let's specify the folder path containing the existing dataset, load the subjects metadata to a Pandas DataFrame, and create a list of all of the time-series files that should be converted to Deep Lake format.
Next, let's create the tensors
and add relevant metadata, such as the dataset source, the tensor units, and other information. We leverage groups
to separate out the primary acceleration data from other user data such as the weight and height of the subjects.
Finally, let's iterate through all the time-series data and upload it to the Deep Lake dataset.
Let's check out the first sample from this dataset and plot the acceleration time-series.
It is noteworthy that the Deep Lake dataset takes 36% less memory than the original dataset due to lz4
chunk compression for the acceleration tensors.
Congrats! You just converted a time-series dataset to Deep Lake format! 🎉