Managed Database REST API

Running Vector Search in the Deep Lake Tensor Database using the REST API

How to Run Vector Search in the Deep Lake Tensor Database using the REST API

The REST API is currently in Alpha, and the syntax may change without announcement.

To use the REST API, Deep Lake data must be stored in the Managed Tensor Database by specifying the deeplake_path = hub://org_id/dataset_name and runtime = {"tensor_db": True}. Full details on path and storage management are available here.

Performing Vector Search Using the REST API

Let's query this Vector Store stored in the Managed Tensor Database using the REST API. The steps are:

  1. Define the authentication tokens and search terms

  2. Embed the search search term using OpenAI

  3. Reformat the embedding to an embedding_search string that can be passed to the REST API request.

  4. Create the query string using Deep Lake TQL. The dataset_path and embedding_search are a part of the query string.

  5. Submit the request and print the response data data

import requests
import openai
import os

# Tokens should be set in environmental variables.
ACTIVELOOP_TOKEN = os.environ['ACTIVELOOP_TOKEN']
DATASET_PATH = 'hub://activeloop/twitter-algorithm'
ENDPOINT_URL = 'https://app.activeloop.ai/api/query/v1'
SEARCH_TERM = 'What do the trust and safety models do?'
# os.environ['OPENAI_API_KEY'] OPEN AI TOKEN should also exist in env variables

# The headers contains the user token
headers = {
    "Authorization": f"Bearer {ACTIVELOOP_TOKEN}",
}

# Embed the search term
embedding = openai.Embedding.create(input=SEARCH_TERM, model="text-embedding-ada-002")["data"][0]["embedding"]

# Format the embedding array or list as a string, so it can be passed in the REST API request.
embedding_string = ",".join([str(item) for item in embedding])

# Create the query using TQL
query = f"select * from (select text, cosine_similarity(embedding, ARRAY[{embedding_string}]) as score from \"{dataset_path}\") order by score desc limit 5"
          
# Submit the request                              
response = requests.post(ENDPOINT_URL, json={"query": query}, headers=headers)

data = response.json()

print(data)

Congrats! You performed a vector search using the Deep Lake Managed Database! 🎉