API Reference

Tensor

class hub.dataset.Tensor(meta: dict, daskarray, delayed_objs: tuple = None)
compute()

Does lazy computation and converts data to numpy array :returns: numpy array of tensor’s data :rtype: np.ndarray

property count

returns: Number of elements on axis 0 if that number is known, -1 otherwise :rtype: int

property dtag

returns: Information about data stored in tensor (iamge, mask, label, …) :rtype: str

property dtype

returns: Tensor data type, equivalent of numpy dtype :rtype: str

property meta

returns: metadata dict of tensor :rtype: dict

property ndim

returns: Number of dimensions (len(shape)) :rtype: int

property shape

returns: Tensor shape :rtype: tuple

Dataset

class hub.dataset.Dataset(tensors: Dict[str, hub.collections.tensor.core.Tensor], metainfo={})
property citation

Dataset citation

property count

len of dataset (len of tensors across axis 0, yes, they all should be = to each other) Returns -1 if length is unknown

delete(tag, creds=None, session_creds=True) → bool

Deletes dataset given tag(filepath) and credentials (optional)

property description

Dataset description

property howtoload

Dataset howtoload

items()

Returns tensors

keys()

Returns names of tensors

property license

Dataset license

property meta

Dict of meta’s of each tensor meta of tensor contains all metadata for tensor storage

store(tag, creds=None, session_creds=True) → hub.collections.dataset.core.Dataset

Stores dataset by tag(filepath) given credentials (can be omitted)

to_pytorch(transform=None, max_text_len=30)

Transforms into pytorch dataset

Parameters
  • transform (func) – any transform that takes input a dictionary of a sample and returns transformed dictionary

  • max_text_len (integer) – the maximum length of text strings that would be stored. Strings longer than this would be snipped

to_tensorflow(max_text_len=30)

Transforms into tensorflow dataset

Parameters

max_text_len (integer) – the maximum length of text strings that would be stored. Strings longer than this would be snipped

values()

Returns tensors

Transform

class hub.Transform
forward()

Takes a an element of a list or sample from dataset and returns sample of the dataset

Parameters

input – an element of list or dict of arrays

Returns

dict of numpy arrays

Return type

dict

Examples

>>> def forward(input):
>>>    ds = {}
>>>    ds["image"] = np.empty(1, object)
>>>    ds["image"][0] = np.array(256, 256)
>>>    return ds
meta()

Provides the metadata for all tensors including shapes, dtypes, dtags and chunksize for each array in the form

Returns

dict of tensor

Return type

returns

Examples

>>> def meta()
>>>     return {
>>>         ...
>>>         "tesnor_name":{
>>>             "shape": (1,256,256),
>>>             "dtype": "uint8",
>>>             "chunksize": 100,
>>>             "dtag": "segmentation"
>>>         }
>>>         ...
>>>     }

Load

class hub.load(tag, creds=None, session_creds=True)

Load a dataset from repository using given url and credentials (optional)

Init

class hub.init(token: str = '', cloud=False, n_workers=1, memory_limit=None, processes=False, threads_per_worker=1, distributed=True)

Initializes cluster either local or on the cloud

Parameters
  • token (str) – token provided by snark

  • cache (float) – Amount on local memory to cache locally, default 2e9 (2GB)

  • cloud (bool) – Should be run locally or on the cloud

  • n_workers (int) – number of concurrent workers, default to1

  • threads_per_worker (int) – Number of threads per each worker