Training Models Using MMDetection
How to Train Deep Learning models using Deep Lake's integration with MMDetection
How to Train Deep Learning models using Deep Lake and MMDetection
This tutorial assumes the reader has experience training models using MMDET and has installed it successfully.
Deep Lake works withmmcv-full<=1.7.1
and mmdet<=2.28.2
Deep Lake offers an integration with MMDetection, a popular open-source object detection toolbox based on PyTorch. The integration enables users to train models while streaming Deep Lake datasets using the transformation, training, and evaluation tools built by MMDet.
Integration Interface
Training using MMDET is typically executed using wrapper scripts like the one provided here in their repo. In the example below, we write a similar wrapper script for training using a Deep Lake dataset.
The integrations with MMDET occurs in the deeplake.integrations.mmdet
module. At a high-level, Deep Lake is responsible for the pytorch dataloader that streams data to the training framework, while MMDET is used for the training, transformation, and evaluation logic.
In the example script below, the user should apply the build_detector
and train_detector
provided by Deep Lake. The build_detector
is mostly boilerplate. and the Deep Lake-related features primarily exist in train_detector
.
Inputs to train_detector
Inputs to the Deep Lake train_detector
are a modified MMDET config file, optional dataset objects (see below), and flags for specifying whether to perform distributed training and validation.
Modifications to the cfg file
The Deep Lake train_detector takes in a standard MMDET config file, but it also expect the inputs highlighted in the ----Deep Lake Inputs----
section in the config file below:
Passing Deep Lake dataset objects to the train_detector
(Optional)
train_detector
(Optional)The Deep Lake dataset object or dataset view can be passed to the train_detector
directly, thus overwriting any dataset information in the config file. Below are the respective modifications that should be made to the training script above:
Congrats! You're now able to train models using MMDET while streaming Deep Lake Datasets! 🎉
What is MMDetection?
MMDetection is a powerful open-source object detection toolbox that provides a flexible and extensible platform for computer vision tasks. Developed by the Multimedia Laboratory (MMLab) as part of the OpenMMLab project, MMDetection is built upon the PyTorch framework and offers a composable and modular API design. This unique feature enables developers to easily construct custom object detection and segmentation pipelines. This article will delve deeper into how to use MMDetection with Activeloop Deep Lake.
MMDetection Features
MMDetection's Modular and Composable API Design MMDetection's API design follows a modular approach, enabling seamless integration with frameworks like Deep Lake and easy component customization. This flexibility allows users to adapt the object detection pipeline to meet specific project requirements.
Custom Object Detection and Segmentation Pipelines
MMDetection streamlines custom pipeline creation, allowing users to construct tailored models by selecting and combining different backbones, necks, and heads for more accurate and efficient computer vision pipelines.
Comprehensive Training & Inference Support
MMDetection's toolbox supports various data augmentation techniques, distributed training, mixed-precision training, and detailed evaluation metrics to help users assess their model's performance and identify areas for improvement.
Extensive Model Zoo & Configurations
MMDetection offers a vast model zoo with numerous pre-trained models and configuration files for diverse computer vision tasks, such as object detection, instance segmentation, and panoptic segmentation.
Primary Components of MMDetection
MMDetection Backbone
Backbones pre-trained convolutional neural networks (CNNs) to extract feature maps. Popular backbones include ResNet, VGG, and MobileNet.
MMDetection Head
These components are meant for specific tasks, e.g. to generate the final predictions, such as bounding boxes, class labels, or masks. Examples include RPN (Region Proposal Network), FCOS (Fully Convolutional One-Stage Object Detector), and Mask Head. Neck: Components, like FPN (Feature Pyramid Network) and PAN (Path Aggregation Network), refine and consolidate features extracted by backbones, connecting them to the head.
MMDetection ROI Extractor
Region of Interest Extractor is a critical MMDetection component extracting RoI features from the feature maps generated by the backbone and neck components, improving the accuracy of final predictions (e.g., bounding boxes and class labels). One of the most popular methods for RoI feature extraction is RoI Align (a technique that addresses the issue of misalignment between RoI features and the input image due to quantization in RoI Pooling).
Loss
The loss component calculates loss values during training, estimating the difference between model predictions and ground truth labels. Users can choose suitable loss functions (e.g., Focal Loss, GHMLoss, L1 Loss) for specific use cases to evaluate and improve the model's performance.
Last updated