LlamaIndex Integration
Using Deep Lake as a Vector Store in LlamaIndex
How to Use Deep Lake as a Vector Store in LlamaIndex
Deep Lake can be used as a VectorStore in LlamaIndex for building Apps that require filtering and vector search. In this tutorial we will show how to create a Deep Lake Vector Store in LangChain and use it to build a Q&A App about the Twitter OSS recommendation algorithm. This tutorial requires installation of:
Downloading and Preprocessing the Data
First, let's import necessary packages and make sure the Activeloop and OpenAI keys are in the environmental variables ACTIVELOOP_TOKEN
, OPENAI_API_KEY
.
Next, let's clone the Twitter OSS recommendation algorithm:
Next, let's specify a local path to the files and add a reader for processing and chunking them.
Creating the Deep Lake Vector Store
First, we create an empty Deep Lake Vector Store using a specified path:
The Deep Lake Vector Store has 4 tensors including the text
, embedding
, ids
, and metadata
which includes the filename of the text
.
Next, we create a LlamaIndex StorageContext
and VectorStoreIndex
, and use the from_documents()
method to populate the Vector Store with data. This step takes several minutes because of the time to embed the text.
We observe that the Vector Store has 8286 rows of data:
Use the Vector Store in a Q&A App
We can now use the VectorStore in Q&A app, where the embeddings will be used to filter relevant documents (texts
) that are fed into an LLM in order to answer a question.
If we were on another machine, we would load the existing Vector Store without re-ingesting the data
Next, Let's create the LlamaIndex query engine and run a query:
Most of the SimClusters project is written in Scala.
Congrats! You just used the Deep Lake Vector Store in LangChain to create a Q&A App! 🎉