DETR-based (DEtection TRansformer) algorithms are a family of object detection algorithms that involve transformers to identify and locate objects in images. Their fundamental concepts were introduced in the seminal 2020 paper by Carion et al..[1][2]

DETR-based algorithms treat object detection as set prediction, where predictions correspond to a subset of learned queries (candidate objects). A fundamental training component is a loss that involves bipartite matching of predicted and ground-truth objects.

References

edit
  1. Carion, Nicolas; Massa, Francisco; Synnaeve, Gabriel; Usunier, Nicolas; Kirillov, Alexander; Zagoruyko, Sergey (2020-05-28), End-to-End Object Detection with Transformers, arXiv, doi:10.48550/arXiv.2005.12872, arXiv:2005.12872, retrieved 2026-04-23
  2. Carion, Nicolas; Massa, Francisco; Synnaeve, Gabriel; Usunier, Nicolas; Kirillov, Alexander; Zagoruyko, Sergey (2020). Vedaldi, Andrea; Bischof, Horst; Brox, Thomas; Frahm, Jan-Michael (eds.). "End-to-End Object Detection with Transformers". Computer Vision – ECCV 2020. Cham: Springer International Publishing: 213–229. doi:10.1007/978-3-030-58452-8_13. ISBN 978-3-030-58452-8.