GTZAN Music Speech Dataset

Estimated reading: 4 minutes 1807 views

Visualisation of Gtzan Music Speech on the Deep Lake UI

GTZAN Music Speech Dataset

What is GTZAN Music Speech Dataset?

The GTZAN Music Speech dataset was created for the purposes of music/speech discrimination and is similar to the GTZAN Genre dataset. The dataset consists of 120 tracks, each containing 30 seconds of audio. The tracks in the dataset are all 22050Hz Mono 16-bit audio files in .wav format. Also, each class (music/speech) in the GTZAN Music Speech dataset has 60 samples.

Download GTZAN Music Speech Dataset in Python

Instead of downloading the GTZAN Music Speech dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load GTZAN Music Speech Dataset Training Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/gtzan-music-speech")

GTZAN Music Speech Dataset Structure

GTZAN Music Speech Data Fields

audio: tensor contains audio file .wav format.
label: tensor representing the .wav files as music or speech.

GTZAN Music Speech Data Splits

The GTZAN Music Speech dataset training set is composed of 128 audio files.

How to use GTZAN Music Speech Dataset with PyTorch and TensorFlow in Python

Train a model on the GTZAN Music Speech dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line dataloader to connect the data to the compute:

				
					dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)

Train a model on the GTZAN Music Speech dataset with TensorFlow in Python

				
					dataloader = ds.tensorflow()

Additional Information about GTZAN Music Speech Dataset

GTZAN Music Speech Dataset Description

Homepage: http://marsyas.info/index.html
Paper: Tzanetakis, George : GTZAN Music/Speech Collection
Point of Contact: [email protected]

GTZAN Music Speech Dataset Curators

Tzanetakis, George

GTZAN Music Speech Dataset Licensing Information

Deep Lake users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you’re a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

GTZAN Music Speech Dataset Citation Information

				
					@ONLINE {Music Speech,
    author = "Tzanetakis, George",
    title  = "GTZAN Music/Speech Collection",
    year   = "1999",
    url    = "http://marsyas.info/index.html"
}

GTZAN Music Speech Dataset FAQs

What is the GTZAN Music Speech dataset for Python?

The GTZAN Music Speech dataset consists of 120 tracks, each containing 30 seconds of audio. The dataset was created for music/speech discrimination and is similar to the GTZAN Genre dataset. The tracks in the dataset are 22050Hz Mono 16-bit audio files and each class (music/speech) has 60 samples.

What is the GTZAN Music Speech dataset used for?

The GTZAN Music Speech dataset is often used for music-speech classification in the domain of machine learning.

How to download the GTZAN Music Speech dataset in Python?

You can load the GTZAN Music Speech dataset fast with one line of code using the open-source package Activeloop Deep Lake in Python. See detailed instructions on how to load the GTZAN Music Speech dataset training subset in Python.

How can I use the GTZAN Music Speech dataset in PyTorch or TensorFlow?

You can stream the GTZAN Music Speech dataset while training a model in PyTorch or TensorFlow with one line of code using the open-source package Activeloop Deep Lake in Python. See detailed instructions on how to train a model on the GTZAN Music Speech dataset with PyTorch in Python or train a model on the GTZAN Music Speech dataset with TensorFlow in Python.