Datasets ⭐
NIH Chest X-ray Dataset
Load the NIH Chest X-ray dataset in Python with one line of code in seconds and plug it in TensorFlow and PyTorch with Activeloop Hub.
Visualization of the NIH Chest X-ray Dataset on the Activeloop Platform

NIH Chest X-ray Dataset (ChestX-ray14)

What is NIH Chest X-ray Dataset?

The NIH ChestX-ray (ChestX-ray14) dataset contains 112,120 X-ray images of scans from 30,805 unique individuals with fourteen different thorax disease categories. These disease categories are text-mined from related radiological reports using NLP techniques. The publication will enable scientists to access the dataset and enhance their potential to train algorithms how to identify and diagnose disease which lead to physicians making better diagnostic judgments.

Download NIH Chest X-ray Dataset in Python

Instead of downloading the NIH ChestX-ray dataset in Python, you can effortlessly load it in Python via our open-source package Hub with just one line of code.

Load NIH Chest X-ray Dataset Training Subset in Python

import hub
ds = hub.load('hub://activeloop/nih-chest-xray-train')

Load NIH Chest X-ray Dataset Testing Subset in Python

import hub
ds = hub.load('hub://activeloop/nih-chest-xray-test')

NIH Chest X-ray Dataset Structure

NIH ChestX-ray Data Fields

  • images: tensor containing images of size 1024x1024.
  • findings: tensor containing labels that represents the disease category for the image.
  • boxes/bbox: tensor containing the bounding box coordinates.
  • boxes/finding: tensor containing the labels for the bounding box coordinates.
  • metadata/patient_id: tensor containing the patient id.
  • metadata/patient_age: tensor containing the patient age.
  • metadata/patient_gender: tensor containing patient gender.
  • metadata/follow_up_num: tensor containing patient follow up number.
  • metadata/view_position: tensor containing the view of the x-ray positions.
  • metadata/orig_img_w: tensor containing the image width.
  • metadata/orig_img_h: tensor containing the image height.
  • metadata/orig_img_pix_spacing_x: tensor containing original image pixel spacing across x-axiz.
  • metadata/orig_img_pix_spacing_y: tensor containing original image pixel spacing across y-axiz.

NIH Chest X-ray Data Splits

How to use NIH Chest X-ray Dataset with PyTorch and TensorFlow in Python

Train a model on NIH Chest X-ray dataset with PyTorch in Python

Let's use Hub's built-in PyTorch one-line dataloader to connect the data to the compute:
dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)

Train a model on NIH Chest X-ray dataset with TensorFlow in Python

dataloader = ds.tensorflow()

Additional Information about NIH Chest X-ray Dataset

NIH Chest X-ray Dataset Description

NIH Chest X-ray Dataset Curators

Ronald Summers

NIH Chest X-ray Dataset Licensing Information

Hub users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.
If you're a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

NIH Chest X-ray Dataset Citation Information

title={NIH Chest X-ray Dataset of 14 Common Thorax Disease Categories},
author={Summers, RM},