Step 2: Creating Deep Lake Vector Stores
Creating the Deep Lake Vector Store
How to Create a Deep Lake Vector Store
Let's create a Vector Store in LangChain for storing and searching information about the Twitter OSS recommendation algorithm.
Downloading and Preprocessing the Data
First, let's import necessary packages and make sure the Activeloop and OpenAI keys are in the environmental variables ACTIVELOOP_TOKEN
, OPENAI_API_KEY
.
Next, let's clone the Twitter OSS recommendation algorithm and define paths for for source data and the Vector Store.
Next, let's load all the files from the repo into list of data that will be added to the Vector Store (chunked_text
and metadata
). We use simple text chunking based on a constant number of characters.
Next, let's define an embedding function using OpenAI. It must work for a single string and a list of strings, so that it can both be used to embed a prompt and a batch of texts.
Finally, let's create the Deep Lake Vector Store and populate it with data. We use a default tensor configuration, which creates tensors with text (str)
, metadata(json)
, id (str, auto-populated)
, embedding (float32)
. Learn more about tensor customizability here.
The Vector Store's data structure can be summarized using vector_store.summary()
, which shows 4 tensors with 21055 samples:
To create a vector store using pre-compute embeddings, instead of embedding_data
and embedding_function
, you may run: