How to Contribute
Guidelines for open source enthusiasts to contribute to our open-source data format.
Clone the repository:
git clone https://github.com/activeloopai/deeplake
cd deeplake
If you are using Linux, install environment dependencies:
apt-get -y update
apt-get -y install git wget build-essential python-setuptools python3-dev libjpeg-dev libpng-dev zlib1g-dev
apt install build-essential
If you are planning to work on videos, install codecs:
apt-get install -y ffmpeg libavcodec-dev libavformat-dev libswscale-dev
Install the package locally with plugins and development dependencies:
pip install -r deeplake/requirements/plugins.txt
pip install -r deeplake/requirements/tests.txt
pip install -e .
Run local tests to ensure everything is correct:
pytest -x --local .
You can use docker-compose for running tests
docker-compose -f ./bin/docker-compose.yaml up --build local
and even work inside the docker by building the image and bashing into.
docker build -t activeloop-deeplake:latest -f ./bin/Dockerfile.dev .
docker run -it -v $(pwd):/app activeloop-deeplake:latest bash
$ python3 -c "import deeplake"
Now changes done on your local files will be directly reflected into the package running inside the docker.
Deep Lake uses the black python linter. You can auto-format your code by running
pip install black
, and the run black .
inside the directory you want to format.Deep Lake uses static typing for function arguments/variables for better code readability. Deep Lake has a GitHub action that runs
mypy .
, which runs similar to pytest .
to check for valid static typing. You can refer to mypy documentation for more information.To see a list of Deep Lake's custom pytest options, run this command:
pytest -h | sed -En '/custom options:/,/\[pytest\] ini\-options/p'
.memory_storage
: If--memory-skip
is provided, tests with this fixture will be skipped. Otherwise, the test will run with only aMemoryProvider
.local_storage
: If--local
is not provided, tests with this fixture will be skipped. Otherwise, the test will run with only aLocalProvider
.s3_storage
: If--s3
is not provided, tests with this fixture will be skipped. Otherwise, the test will run with only anS3Provider
.storage
: All tests that use thestorage
fixture will be parametrized with the enabledStorageProvider
s (enabled via options defined below). If--cache-chains
is provided,storage
may also be a cache chain. Cache chains have the same interface asStorageProvider
, but instead of just a single provider, it is multiple chained in a sequence, where the last provider in the chain is considered the actual storage.ds
: The same as thestorage
fixture, but the storages that are parametrized are wrapped with aDataset
.
Each
StorageProvider
/Dataset
that is created for a test via a fixture will automatically have a root created, and it will be destroyed after the test. If you want to keep this data after the test run, you can use the --keep-storage
option.Single storage provider fixture:
def test_memory(memory_storage):
# Test will skip if `--memory-skip` is provided
memory_storage["key"] = b"1234" # This data will only be stored in memory
def test_local(local_storage):
# Test will skip if `--local` is not provided
memory_storage["key"] = b"1234" # This data will only be stored locally
def test_local(s3_storage):
# Test will skip if `--s3` is not provided
# Test will fail if credentials are not provided
memory_storage["key"] = b"1234" # This data will only be stored in s3
Multiple storage providers/cache chains:
from deeplake.core.tests.common import parametrize_all_storages, parametrize_all_caches, parametrize_all_storages_and_caches
@parametrize_all_storages
def test_storage(storage):
# Storage will be parametrized with all enabled `StorageProvider`s
pass
@parametrize_all_caches
def test_caches(storage):
# Storage will be parametrized with all common caches containing enabled `StorageProvider`s
pass
@parametrize_all_storages_and_caches
def test_storages_and_caches(storage):
# Storage will be parametrized with all enabled `StorageProvider`s and common caches containing enabled `StorageProvider`s
pass
Dataset storage providers/cache chains:
from deeplake.core.tests.common import parametrize_all_dataset_storages, parametrize_all_dataset_storages_and_caches
@parametrize_all_dataset_storages
def test_dataset(ds):
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider`
pass
@parametrize_all_dataset_storages_and_caches
def test_dataset(ds):
# `ds` will be parametrized with 1 `Dataset` object per enabled `StorageProvider` and all cache chains containing enabled `StorageProvider`s
pass
Deep Lake would not be possible without the work of our community.
Activeloop Deep Lake open-source contributors
Last modified 30d ago