The Free Spoken Digit Dataset, or the Spoken MNIST (Modified National Institute of Standards and Technology database) dataset contains recordings of spoken digits in wav files at 8kHz. The dataset recordings have been trimmed at beginnings and end so that they have near minimal silence. This dataset contains 50 recordings for each 10 digits per each 6 speakers. This dataset in total contains 3000 recordings in English pronunciations.
Download Spoken MNIST Dataset in Python
Instead of downloading the Spoken MNIST dataset in Python, you can effortlessly load it in Python via our open-source package Hub with just one line of code.
Load Spoken MNIST Dataset in Python
1
import hub
2
ds = hub.load("hub://activeloop/spoken_mnist")
Copied!
Spoken MNIST Dataset Structure
Spoken MNIST Data Fields
audio: tensor containing audio recordings
labels: tensor containing label for their respective audio
spectrograms: tensor containing spectrogram of their recording
speakers: tensor containing speaker id for audio
How to use Spoken MNIST Dataset with PyTorch and TensorFlow in Python
Train a model on Spoken MNIST dataset with PyTorch in Python