VCTK Dataset

Estimated reading: 3 minutes 5479 views

Visualization of the VCTK dataset in the Deep Lake UI

MNIST dataset

What is VCTK Dataset?

The VCTK dataset includes speech data spoken by 109 native speakers of English with diverse accents. Every speaker reads out about 400 sentences, most of which were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph that identifies the speaker’s accent. The Rainbow Passage and elicitation paragraph are the same for all speakers. The newspaper texts were taken from The Herald (Glasgow), with permission from Herald & Times Group. Each speaker reads a different set of newspaper sentences, and each set was selected using a greedy algorithm.

Download VCTK Dataset in Python

Instead of downloading the VCTK dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load VCTK Dataset Training Subset in Python

import deeplake
ds = deeplake.load("hub://activeloop/vctk")

VCTK Dataset Structure

VCTK Data Fields

audios: tensor containing the audio file in wave format.
texts: tensor containing text transcript of the audio.

VCTK Data Splits

The VCTK dataset training set is composed of 16262.

How to use VCTK Dataset with PyTorch and TensorFlow in Python

Train a model on the VCTK dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line data loader to connect the data to the compute:

dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)

Train a model on the VCTK dataset with TensorFlow in Python

dataloader = ds.tensorflow()

Additional Information about VCTK Dataset

VCTK Dataset Description

Homepage:https://datashare.ed.ac.uk/handle/10283/2950
Paper: Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten. in CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit
Point of Contact: N/A

VCTK Dataset Curators

Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten

VCTK Dataset Licensing Information

Deep Lake users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you’re a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

VCTK Dataset Citation Information

@misc{yamagishi2019vctk,
  author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},
  title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},
  publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},
  year=2019,
  doi={10.7488/ds/2645},
}

VCTK Dataset FAQs

What is the VCTK dataset for Python?

The VCTK dataset is an audio dataset. The dataset was created to build HMM-based text-to-speech synthesis systems, especially for speaker-adaptive HMM-based speech synthesis using average voice models trained on multiple speakers and speaker adaptation technologies.

How to download the VCTK dataset in Python?

You can load the VCTK dataset fast with one line of code using the open-source package Activeloop Deep Lake in Python. See detailed instructions on how to load the VCTK dataset training subset in Python.

How can I use the VCTK dataset in PyTorch or TensorFlow?

You can stream the VCTK dataset while training a model in PyTorch or TensorFlow with one line of code using the open-source package Activeloop Deep Lake in Python. See detailed instructions on how to train a model on the VCTK dataset with PyTorch in Python or train a model on the VCTK dataset with TensorFlow in Python.

VCTK Dataset

MNIST dataset

What is VCTK Dataset?

Download VCTK Dataset in Python

Load VCTK Dataset Training Subset in Python

VCTK Dataset Structure

VCTK Data Fields

VCTK Data Splits

How to use VCTK Dataset with PyTorch and TensorFlow in Python

Train a model on the VCTK dataset with PyTorch in Python

Train a model on the VCTK dataset with TensorFlow in Python

Additional Information about VCTK Dataset

VCTK Dataset Description

VCTK Dataset Curators

VCTK Dataset Licensing Information

VCTK Dataset Citation Information

VCTK Dataset FAQs

What is the VCTK dataset for Python?

How to download the VCTK dataset in Python?

How can I use the VCTK dataset in PyTorch or TensorFlow?

VCTK Dataset

CONTENTS