Authentication
How to authenticate using Activeloop storage, AWS S3, and Google Cloud Storage.
Hub datasets can be stored on several cloud storage providers including Activeloop Storage, AWS S3, and Google Cloud Storage. In all cases, the datasets are accessed by choosing the correct prefix for the dataset path that is passed to methods such as hub.load(path), hub.dataset(path), hub.empty(path) and others. The path prefixes are:
Storage
Path
Activeloop
hub://workspace_name/dataset_name
AWS S3
s3://bucket_name/dataset_name
Google Cloud
gcs://bucket_name/dataset_name

Authentication for each cloud storage provider:

Activeloop Storage

In order to authenticate with Activeloop storage, users must register with Activeloop and login through the CLI using:
1
activeloop register
2
3
activeloop login -u username -p password
Copied!
An Activeloop account can also be created on Activeloop Platform (Coming Soon)

AWS S3

Authentication with AWS S3 has 4 options:
    1.
    Use Hub on a machine in the AWS ecosystem that has access to the relevant S3 bucket via AWS IAM, in which case there is no need to pass credentials in order to access datasets in that bucket.
    2.
    Configure AWS through the cli using aws configure. This creates a credentials file on your machine that is automatically access by hub during authentication.
    3.
    Create a dictionary with the AWS_ACCESS_KEY_ID ,AWS_SECRET_ACCESS_KEY , and AWS_SESSION_TOKEN (optional), and pass it to hub using:hub.load('s3://...', creds = {'aws_access_key_id': 'abc', 'aws_secret_access_key': 'xyz', 'aws_session_token': '123'})
      1.
      Note: the dictionary keys must be lowercase!
    4.
    Save the AWS_ACCESS_KEY_ID ,AWS_SECRET_ACCESS_KEY , and AWS_SESSION_TOKEN (optional) in environmental variables of the same name, which are loaded as default credentials if no other credentials are specified.

Google Cloud Storage

Authentication with Google Cloud Storage has 2 options:
    1.
    Create a service account, download the JSON file containing the keys, and then pass that file to the creds parameter in hub.load('gcs://.....', creds = 'path_to_keys.json') . It is also possible to manually pass the information from the JSON file into the creds parameter using:
    hub.load('gcs://.....', creds = {information from the JSON file})
    2.
    Authenticate through the browser using hub.load('gcs://.....', creds = 'browser'). This requires that the project credentials are stored on your machine, which happens after gcloud is initialized and logged in through the CLI.
      1.
      After this step, re-authentication through the browser can be skipped using: hub.load('gcs://.....', creds = 'cache')
Last modified 8d ago