Detection and Segmentation

This database collects the major papers that turned image classification backbones into systems that localize, segment, and interactively select objects.

Object Detection

Year	Paper	Topic	Note
2013	Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation	R-CNN	CNN features plus region proposals.
2014	Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition	SPPnet	Handles variable-size regions with spatial pooling.
2015	Fast R-CNN	Detection pipeline	Faster region-based detector with shared computation.
2015	Faster R-CNN	Region proposals	Region Proposal Network inside the detector.
2015	You Only Look Once	YOLO	Single-pass real-time detection.
2015	SSD: Single Shot MultiBox Detector	SSD	Dense multi-scale one-stage detection.
2016	Feature Pyramid Networks for Object Detection	FPN	Multi-scale feature pyramids for detection.
2017	Focal Loss for Dense Object Detection	RetinaNet	Addresses foreground-background imbalance in one-stage detection.
2020	End-to-End Object Detection with Transformers	DETR	Detection as set prediction with Transformers.
2020	Deformable DETR	Efficient DETR	Sparse deformable attention for faster convergence and multi-scale features.
2022	DINO: DETR with Improved DeNoising Anchor Boxes	DETR training	Strong DETR variant with denoising and anchor refinement.

Segmentation

Year	Paper	Topic	Note
2014	Fully Convolutional Networks for Semantic Segmentation	FCN	Converts classification CNNs into dense predictors.
2015	U-Net	Biomedical segmentation	Encoder-decoder with skip connections for precise localization.
2016	DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs	DeepLab	Atrous convolution and CRF refinement.
2017	Mask R-CNN	Instance segmentation	Adds mask prediction to Faster R-CNN.
2018	Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation	DeepLabv3+	Encoder-decoder refinement with atrous separable convolution.
2021	Masked-attention Mask Transformer for Universal Image Segmentation	Mask2Former	Unified semantic, instance, and panoptic segmentation.
2023	Segment Anything	SAM	Promptable segmentation foundation model.
2024	SAM 2: Segment Anything in Images and Videos	SAM 2	Extends promptable segmentation to video with streaming memory.

Reading Path

Step	Read
1	R-CNN, Fast R-CNN, and Faster R-CNN for the region-based lineage.
2	YOLO, SSD, and RetinaNet for one-stage detection.
3	FCN, U-Net, DeepLab, and Mask R-CNN for dense prediction.
4	DETR, Deformable DETR, and DINO for Transformer-based detection.
5	Mask2Former, SAM, and SAM 2 for modern unified and promptable segmentation.