Improving Search Accuracy using Deep Memory
Using Deep Memory to improve the accuracy of your Vector Search
Last updated
Using Deep Memory to improve the accuracy of your Vector Search
Last updated
computes a transformation that converts your embeddings into an embedding space that is tailored for your use case, based on several examples for which the most relevant embedding is known. This can increase the accuracy of your Vector Search by up to 22%.
In this example, we'll use Deep Memory to improve the accuracy of Vector Search on the SciFact dataset, where the input prompt is a scientific claim, and the search result is the corresponding abstract.
First let's specify out Activeloop and OpenAI tokens. Make sure to install pip install datasets
because we'll download the source data from HuggingFace.
Next, let's download the dataset locally:
Now let's define an embedding function for the text data and create a Deep Lake Vector Store in our Managed Database. Deep Memory is only available for Vector Stores in our Managed Database.
Next, let's extract the data from the SciFact dataset and add it to our Vector Store. In this example, we embed the abstracts of the scientific papers. Normally, the id
tensor is auto-populated, but in this case, we want to use the ids in the SciFact dataset, in order to use the internal connection between ids, abstracts, and claims, that already exists in SciFact.
We must create a relationship between the claims and their corresponding most relevant abstracts. This correspondence already exists in the SciFact dataset, and we extract that information using the helper function below.
Let's print the first 10 claims and their relevant abstracts. The relevances are a list of tuples, where each the id corresponds to the id
tensor value in the Abstracts Vector Store, and 1 indicates a positive relevance.
Now we can run a Deep Memory training, which runs asynchronously and executes on our managed service.
All of the Deep Memory training jobs for this Vector Store can be listed using the command below. The PROGRESS tells us the state of the training job, as well as the recall improvement on the data.
recall@k
corresponds to the percentage of rows for which the correct (most relevant) answer was returned in the top k
vector search results
Let's evaluate the recall improvement for an evaluation dataset that was not used in the training process. Deep Memory inference, and by extension this evaluation process, runs on the client.
We observe that the recall has improved by p to 16%, depending on the k
value.
To use Deep Memory in your applications, specify the deep_memory = True
parameter during vector search. If you are using the LangChain integration, you may specify this parameter during Vector Store initialization. Let's try searching embedding using a prompt, with and without Deep Memory.
We observe that there are overlapping results for both search methods, but 50% of the answers differ.
Congrats! You just used Deep Memory to improve the accuracy of Vector Search on a specific use-case! 🎉