Vector Store Quickstart
A jump-start guide to using Deep Lake for Vector Search.
How to Get Started with Vector Search in Deep Lake in Under 5 Minutes
Installing Deep Lake
Deep Lake can be installed using pip. By default, Deep Lake does not install dependencies for the compute engine, google-cloud, and other features. Details on all installation options are available here.
This quickstart also requires LangChain, tiktoken, and OpenIAI
Creating Your First Vector Store
A Vector Store can be created using the Deep Lake integration with LangChain. This abstracts the low-level Deep Lake API from the user, and under the hood, the LangChain integration creates a Deep Lake dataset with text
, embedding
, id
, and metadata
tensors.
Let's embed and store Paul Graham's essays in a Deep Lake vector store. First, we download the data:
Next, let's import the required modules and set the OpenAI environmental variables for embeddings:
Next, lets specify paths for the source text data and the underlying Deep Lake dataset. In this example, we store the dataset locally, but Deep Lake vectors stores can also be created in memory, in the Deep Lake Tensor Database, or in your cloud. Further details are available here.
Next, let's chunk the essay text, create the Vector Store, and populate it with data and embeddings:
The underlying Deep Lake dataset object is accessible in db.ds
, and the data structure can be summarized using db.ds.summary()
, which shows 4 tensors with 100 samples:
Performing Vector Search
Deep Lake offers a variety of vector search option depending on the storage location of the Vector Store is created and infrastructure that should run the computations:
Search Method | Compute Location | Execution Algorithm | Query Syntax | Required Storage |
---|---|---|---|---|
Python | Client-side | Deep Lake OSS Python Code | LangChain API | In memory, local, user cloud, Tensor Database |
Client-side | Deep Lake C++ Compute Engine | LangChain API or TQL | User cloud (must be connected to Deep Lake), Tensor Database | |
Managed Database | Deep Lake C++ Compute Engine | LangChain API or TQL | Tensor Database |
Vector Search in Python
Let's continue from the Vector Store we created above and run an embeddings search based on a user prompt using the LangChain API.
Let's run the prompt and check out the output. Internally, this API performs an embedding search to find the most relevant data to feed into the LLM context.
'The first programs he tried writing were on the IBM 1401 that his school district used for "data processing" in 9th grade.'
Vector Search Using the Compute Engine on the Client Side in LangChain
Vector search using the Compute Engine + LangChain API will be available soon.
Vector Search Using the Compute Engine on the Client Side In the Deep Lake API
To run the C++ Compute Engine on the client-side, please install:
Let's load an existing Vector Store containing embeddings of the Twitter recommendation algorithm. We use the raw Deep Lake API, which loads the same dataset object db.ds
from the LangChain API:
Next, let's define the search term and embed it using OpenAI.
Finally, let's define the TQL query and run it using the Compute Engine on the client.
ds_view.summary()
shows the result contains the top 5 samples by score:
We can lazy-load the data for those samples using:
Vector Search Using the Managed Tensor Database in LangChain
Vector search using the Tensor Database + LangChain API will be available soon.
Vector Search Using the Managed Tensor Database + REST API
The same query above on the Twitter Algorithm can be run on the Managed Tensor Database using a REST API. This step requires Registration and creation of an API token.
Visualizing your Vector Store
Deep Lake enables users to visualize and interpret large datasets, including Vector Stores with embeddings. Visualization is available for each dataset stored in or connected to Deep Lake. Here's an example.
Authentication
To use Deep Lake features that require authentication (Activeloop storage, Tensor Database storage, connecting your cloud dataset to the Deep Lake UI, etc.) you should register in the Deep Lake App and authenticate on the client using the methods in the link below:
User AuthenticationNext Steps
Check out our Getting Started Guide for a comprehensive walk-through of Deep Lake. Also check out tutorials on Running Queries, Training Models, and Creating Datasets, as well as Playbooks about powerful use-cases that are enabled by Deep Lake.
Congratulations, you've created a Vector Store and performed vector search using Deep Lake🤓
Last updated