Data scientists spend the majority of their time building infrastructure, transferring data, and writing boilerplate code. Hub streamlines these tasks so that you can focus on building amazing ML models.
Hub enables you to stream unlimited amounts of data from the cloud to any machine without sacrificing performance compared to local storage.
Hub connects datasets to PyTorch and TensorFlow with minimal boilerplate code. It also contains powerful tools for version control, building ML pipelines, and running distributed workloads.
These features are possible because Hub stores data as compressed chunked arrays that can be stored anywhere and accessed through a simple and intuitive API.
Check out Hub's GitHub repository and give us a ⭐ if you like the project!
Join Hub's Slack Community if you need help or have suggestions for improving documentation!