Hub Tensors are scalable NumPy-like arrays stored on the cloud accessible over the internet as if they’re local NumPy arrays. Their chunkified structure makes it super fast to interact with them.

Tensor represents a single array containing homogeneous data type. It could contain a list of text files, audio, images, or video data. The first dimension represents the batch dimension.

One can specify dtag for the element to specify its nature.


You can initialize a tensor like this and get the first element.

from hub import tensor

t = tensor.from_zeros((10, 512, 512), dtype="uint8")

You can also initialize the tensor object from a numpy array.

import numpy as np
from hub import tensor

t = tensor.from_zeros(np.zeros((10, 512, 512)))

Concat or Stack

Concatenating or stacking tensors works as in other frameworks.

from hub import tensor

t1 = tensor.from_zeros((10, 512, 512), dtype="uint8")
t2 = tensor.from_zeros((20, 512, 512), dtype="uint8")
tensors = [t1, t2]

tensor.concat(tensors, axis=0, chunksize=-1)
tensor.stack(tensors, axis=0, chunksize=-1)


class hub.dataset.Tensor(meta: dict, daskarray, delayed_objs: tuple = None)

Does lazy computation and converts data to numpy array :returns: numpy array of tensor’s data :rtype: np.ndarray

property count

returns: Number of elements on axis 0 if that number is known, -1 otherwise :rtype: int

property dtag

returns: Information about data stored in tensor (iamge, mask, label, …) :rtype: str

property dtype

returns: Tensor data type, equivalent of numpy dtype :rtype: str

property meta

returns: metadata dict of tensor :rtype: dict

property ndim

returns: Number of dimensions (len(shape)) :rtype: int

property shape

returns: Tensor shape :rtype: tuple