TQL Syntax

How to properly format TQL queries

Query syntax for the Tensor Query Language (TQL)

CONTAINS and ==

# Exact match, which generally requires that the sample
# has 1 value, i.e. no lists or multi-dimensional arrays
select * where tensor_name == 'text_value'    # If value is numeric
select * where tensor_name == numeric_value  # If values is text

select * where contains(tensor_name, 'text_value')

Any special characters in tensor or group names should be wrapped with double-quotes:

select * where contains("tensor-name", 'text_value')

select * where "tensor_name/group_name" == numeric_value

SHAPE

select * where shape(tensor_name)[dimension_index] > numeric_value 
select * where shape(tensor_name)[1] > numeric_value # Second array dimension > value

LIMIT

select * where contains(tensor_name, 'text_value') limit num_samples

AND, OR, NOT

select * where contains(tensor_name, 'text_value') and NOT contains(tensor_name_2, numeric_value)
select * where contains(tensor_name, 'text_value') or tensor_name_2 == numeric_value

select * where (contains(tensor_name, 'text_value') and shape(tensor_name_2)[dimension_index]>numeric_value) or contains(tensor_name, 'text_value_2')

UNION and INTERSECT

(select * where contains(tensor_name, 'value')) intersect (select * where contains(tensor_name, 'value_2'))

(select * where contains(tensor_name, 'value') limit 100) union (select * where shape(tensor_name)[0] > numeric_value limit 100)

ORDER BY

# Order by requires that sample is numeric and has 1 value, 
# i.e. no lists or multi-dimensional arrays

# The default order is ASCENDING (asc)

select * where contains(tensor_name, 'text_value') order by tensor_name asc

ANY, ALL, and ALL_STRICT

select * where all_strict(tensor_name[:,2]>numeric_value)

select * where any(tensor_name[0:6]>numeric_value)

all adheres to NumPy and list logic where all(empty_sample) returns True

all_strict is more intuitive for queries so all_strict(empty_sample) returns False

LOGICAL_AND and LOGICAL_OR

select * where any(logical_and(tensor_name_1[:,3]>numeric_value, tensor_name_2 == 'text_value'))

SAMPLE BY

select * sample by weight_choice(expression_1: weight_1, expression_2: weight_2, ...)
        replace True limit N

weight_choice resolves the weight that is used when multiple expressions evaluate to True for a given sample. Options are max_weight, sum_weight. For example, if weight_choice is max_weight, then the maximum weight will be chosen for that sample.
replace determines whether samples should be drawn with replacement. It defaults to True.
limit specifies the number of samples that should be returned. If unspecified, the sampler will return the number of samples corresponding to the length of the dataset

EMBEDDING SEARCH

Deep Lake supports several vector operations for embedding search. Typically, vector operations are called by returning data ordered by the score based on the vector search method.

select * from (select tensor_1, tensor_2, <VECTOR_OPERATION> as score) order by score desc limit 10

# THE SUPPORTED VECTOR_OPERATIONS ARE:

l1_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc

l2_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc

linf_norm(<embedding_tensor> - ARRAY[<search_embedding>]) # Order should be asc

cosine_similarity(<embedding_tensor>, ARRAY[<search_embedding>]) # Order should be desc

VIRTUAL TENSORS

Virtual tensors are tensors that are the result of a computation and are not tensors in the underlying Deep Lake data. They can be treated as tensors in the API.

# "score" is a virtual tensor
select * from (select tensor_1, tensor_2, <VECTOR_OPERATION> as score) order by score desc limit 10

# "box_beyond_image" is a virtual tensor
select *, any(boxes[:,0])<0 as box_beyond_image where ....

# "tensor_sum" is a virtual tensor
select *, tensor_1 + tensor_3 as tensor_sum where ......

When combining embedding search with filtering (where conditions), the filter condition is evaluated prior to the embedding search.

PreviousTensor Query Language (TQL)NextSampling Datasets

Last updated 1 year ago