Vector Store Quickstart
A jump-start guide to using Deep Lake for Vector Search.
A jump-start guide to using Deep Lake for Vector Search.
If you prefer to use Deep Lake with , check out . This quickstart focuses on vector storage and search, instead of end-2-end LLM apps, and it offers more customization and search options compared to the LangChain integration.
Deep Lake can be installed using pip. By default, Deep Lake does not install dependencies for the compute engine, google-cloud, and other features. . This quickstart also requires OpenAI.
Let's embed and store one of in a Deep Lake Vector Store stored locally. First, we download the data:
Next, let's import the required modules and set the OpenAI environmental variables for embeddings:
Let's also read and chunk the essay text based on a constant number of characters.
Next, let's define an embedding function using OpenAI. It must work for a single string and a list of strings, so that it can both be used to embed a prompt and a batch of texts.
The path
parameter is bi-directional:
When a new path
is specified, a new Vector Store is created
When an existing path is specified, the existing Vector Store is loaded
The Vector Store's data structure can be summarized using vector_store.summary()
, which shows 4 tensors with 76 samples:
To create a vector store using pre-compute embeddings instead of the embedding_data
and embedding_function
, you may run
The search_results
is a dictionary with keys for the text
, score
, id
, and metadata
, with data ordered by score. If we examine the first returned text using search_results['text'][0]
, it appears to contain the answer to the prompt.
Next, lets specify paths for the source text and the Deep Lake Vector Store. Though we store the Vector Store locally, Deep Lake Vectors Stores can also be created in memory, in the Deep Lake , or in your cloud. .
Finally, let's create the Deep Lake Vector Store and populate it with data. We use a default tensor configuration, which creates tensors with text (str)
, metadata(json)
, id (str, auto-populated)
, embedding (float32)
.
Deep Lake offers highly-flexible vector search and hybrid search options . In this Quickstart, we show a simple example of vector search using default options, which performs cosine similarity search in Python on the client.
Visualization is available for Vector Stores stored in or connected to Deep Lake. The vector store above is stored locally, so it cannot be visualized, but
To use Deep Lake features that require authentication (Deep Lake storage, Tensor Database storage, connecting your cloud dataset to the Deep Lake UI, etc.) you should and authenticate on the client using the methods in the link below:
Deep Lake provides that stores and runs queries on Deep Lake infrastructure, instead of the client. To use this service, specify runtime = {"tensor_db": True}
when creating the Vector Store.
Check out our for a comprehensive walk-through of Deep Lake Vector Stores. For scaling Deep Lake to production-level applications, check out our and .
Congratulations, you've created a Vector Store and performed vector search using Deep Lake