Skip to content

sing-group/deep-learning-colonoscopy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Polyp Detection and Classification in Colonoscopy

This repository was created from the following review paper: A. Nogueira-Rodríguez; R. Domínguez-Carbajales; H. López-Fernández; Á. Iglesias; J. Cubiella; F. Fdez-Riverola; M. Reboiro-Jato; D. Glez-Peña (2020) Deep Neural Networks approaches for detecting and classifying colorectal polyps. Neurocomputing.

Please, cite it if you find it useful for your research.

AI4PolypNet

AI4PolypNet

As part of AI4PolypNet, we are involved in a challenge that will be developed at iSMIT (September 2024). In this edition we will focus only on colonoscopy images and, apart from classical polyp detection and segmentation we present an extended version of polyp classification, including the challenging serrated sessile adenoma class. All the information is available here.

About this repository

This repository collects the most relevant studies applying Deep Learning for Polyp Detection and Classification in Colonoscopy from a technical point of view, focusing on the low-level details for the implementation of the DL models. In first place, each study is categorized in three types: (i) polyp detection and localization (through bounding boxes or binary masks, i.e. segmentation), (ii) polyp classification, and (iii) simultaneous polyp detection and classification (i.e. studies based on the usage of a single model such as YOLO or SSD to performs simultaneous polyp detection and classification). Secondly, a summary of the public datasets available as well as the private datasets used in the studies is provided. The third section focuses on technical aspects such as the Deep Learning architectures, the data augmentation techniques and the libraries and frameworks used. Finally, the fourth section summarizes the performance metrics reported by each study.

Suggestions are welcome, please check the contribution guidelines before submitting a pull request.

Table of Contents:

Research

Polyp Detection and Localization

Study Date Endoscopy type Imaging technology Localization type Multiple polyp Real time
Tajbakhsh et al. 2014, Tajbakhsh et al. 2015 Sept. 2014 / Apr. 2015 Conventional N/A Bounding box No Yes
Zhu R. et al. 2015 Oct. 2015 Conventional N/A Bounding box (16x16 patches) Yes No
Park and Sargent 2016 March 2016 Conventional NBI, WL Bounding box No No
Yu et al. 2017 Jan. 2017 Conventional NBI, WL Bounding box No No
Zhang R. et al. 2017 Jan. 2017 Conventional NBI, WL No No No
Yuan and Meng 2017 Feb. 2017 WCE N/A No No No
Brandao et al. 2018 Feb. 2018 Conventional/WCE N/A Binary mask Yes No
Zhang R. et al. 2018 May 2018 Conventional WL Bounding box No No
Misawa et al. 2018 June 2018 Conventional WL No Yes No
Zheng Y. et al. 2018 July 2018 Conventional NBI, WL Bounding box Yes Yes
Shin Y. et al. 2018 July 2018 Conventional WL Bounding box Yes No
Urban et al. 2018 Sep. 2018 Conventional NBI, WL Bounding box No Yes
Mohammed et al. 2018, GitHub Sep. 2018 Conventional WL Binary mask Yes Yes
Wang et al. 2018, Wang et al. 2018 Oct. 2018 Conventional N/A Binary mask Yes Yes
Qadir et al. 2019 Apr. 2019 Conventional NBI, WL Bounding box Yes No
Blanes-Vidal et al. 2019 March 2019 WCE N/A Bounding box Yes No
Zhang X. et al. 2019 March 2019 Conventional N/A Bounding box Yes Yes
Misawa et al. 2019 June 2019 Conventional N/A No Yes No
Zhu X. et al. 2019 June 2019 Conventional N/A No No Yes
Ahmad et al. 2019 June 2019 Conventional WL Bounding box Yes Yes
Sornapudi et al. 2019 June 2019 Conventional/WCE N/A Binary mask Yes No
Wittenberg et al. 2019 Sept. 2019 Conventional WL Binary mask Yes No
Yuan Y. et al. 2019 Sept. 2019 WCE N/A No No No
Ma Y. et al. 2019 Oct. 2019 Conventional N/A Bounding box Yes No
Tashk et al. 2019 Dec. 2019 Conventional N/A Binary mask No No
Jia X. et al. 2020 Jan. 2020 Conventional N/A Binary mask Yes No
Ma Y. et al. 2020 May 2020 Conventional N/A Bounding box Yes No
Young Lee J. et al. 2020 May 2020 Conventional N/A Bounding box Yes Yes
Wang W. et al. 2020 July 2020 Conventional WL No No No
Li T. et al. 2020 Oct. 2020 Conventional N/A No No No
Sánchez-Peralta et al. 2020 Nov. 2020 Conventional NBI, WL Binary mask No No
Podlasek J. et al. 2020 Dec. 2020 Conventional N/A Bounding box No Yes
Qadir et al. 2021 Feb. 2021 Conventional WL Bounding box Yes Yes
Xu J. et al. 2021 Feb. 2021 Conventional WL Bounding box Yes Yes
Misawa et al. 2021 Apr. 2021 Conventional WL No Yes Yes
Livovsky et al. 2021 June 2021 Conventional N/A Bounding box Yes Yes
Pacal et al. 2021 July 2021 Conventional WL Bounding box Yes Yes
Liu et al. 2021 July 2021 Conventional N/A Bounding box Yes Yes
Nogueira-Rodríguez et al. 2021 Aug. 2021 Conventional NBI, WL Bounding box Yes Yes
Yoshida et al. 2021 Aug. 2021 Conventional WL, LCI Bounding box Yes Yes
Ma Y. et al. 2021 Sep. 2021 Conventional WL Bounding box Yes No
Pacal et al. 2022 Nov. 2021 Conventional WL Bounding box Yes Yes
Nogueira-Rodríguez et al. 2022 April 2022 Conventional NBI, WL Bounding box Yes Yes
Nogueira-Rodríguez et al. 2023 March 2023 Conventional NBI, WL Bounding box Yes Yes

Polyp Classification

Study Date Endoscopy type Imaging technology Classes Real time
Ribeiro et al. 2016 Oct. 2016 Conventional WL Neoplastic vs. Non-neoplastic No
Zhang R. et al. 2017 Jan. 2017 Conventional NBI, WL Adenoma vs. hyperplastic
Resectable vs. non-resectable
Adenoma vs. hyperplastic vs. serrated
No
Byrne et al. 2017 Oct. 2017 Conventional NBI Adenoma vs. hyperplastic Yes
Komeda et al. 2017 Dec. 2017 Conventional NBI, WL, Chromoendoscopy Adenoma vs. non-adenoma No
Chen et al. 2018 Feb. 2018 Conventional NBI Neoplastic vs. hyperplastic No
Lui et al. 2019 Apr. 2019 Conventional NBI, WL Endoscopically curable lesions vs. endoscopically incurable lesion No
Kandel et al. 2019 June 2019 Conventional N/A Adenoma vs. hyperplastic vs. serrated (sessile serrated adenoma/traditional serrated adenoma) No
Zachariah et al. 2019 Oct. 2019 Conventional NBI, WL Adenoma vs. serrated Yes
Bour et al. 2019 Dec. 2019 Conventional N/A Paris classification: not dangeours (types Ip, Is, IIa, and IIb) vs. dangerous (type IIc) vs. cancer (type III) No
Patino-Barrientos et al. 2020 Jan. 2020 Conventional WL Kudo's classification: malignant (types I, II, III, and IV) vs. non-malignant (type V) No
Cheng Tao Pu et al. 2020 Feb. 2020 Conventional NBI, BLI Modified Sano's (MS) classification: MS I (Hyperplastic) vs. MS II (Low-grade tubular adenomas) vs. MS IIo (Nondysplastic or low-grade sessile serrated adenoma/polyp [SSA/P]) vs. MS IIIa (Tubulovillous adenomas or villous adenomas or any high-grade colorectal lesion) vs. MS IIIb (Invasive colorectal cancers) Yes
Young Joo Yang et al. 2020 May 2020 Conventional WL 7-class: CRC T1 vs. CRC T2 vs. CRC T3 vs. CRC T4 vs. high-grade dysplasia (HGD) vs. tubular adenoma with or without low grade dysplasia (TA) vs. non-neoplastic lesions

4-class: advanced CRC (T2, T3, and T4) vs. early CRC/HGD (CRC T1 and HGD) vs. TA vs. non-neoplastic lesions

Advanced colorectal lesions (HGD and T1, T2, T3, and T4 lesions) vs. non-advanced colorectal lesions (TA and non-neoplastic lesions)

Neoplastic lesions (TA, HGD, and stages T1, T2, T3, and T4) vs. non-neoplastic lesions
No
Yoshida et al. 2021 Aug. 2021 Conventional WL, LCI Neoplastic vs. hyperplastic Yes

Simultaneous Polyp Detection and Classification

Study Date Endoscopy type Imaging technology Localization type Multiple polyp Classes Real time
Tian Y. et al. 20191 Apr. 2019 Conventional N/A Bounding box Yes Modified Sano's (MS) classification: MS I (Hyperplastic) vs. MS II (Low-grade tubular adenomas) vs. MS IIo (Nondysplastic or low-grade sessile serrated adenoma/polyp [SSA/P]) vs. MS IIIa (Tubulovillous adenomas or villous adenomas or any high-grade colorectal lesion) vs. MS IIIb (Invasive colorectal cancers) No
Liu X. et al. 2019 Oct. 2019 Conventional WL Bounding box Yes Polyp vs. adenoma No
Ozawa. et al. 20202 Feb. 2020 Conventional NBI, WL Bounding box Yes Adenoma vs. hyperplastic vs. sesile serrated adenoma/polyp (SSAP) vs. cancer vs. other types (Peutz-Jeghers, juvenile, or inflammation polyps) Yes
Li K. et al. 20213 Aug. 2021 Conventional N/A Bounding box Yes Adenoma vs. hyperplastic Yes
  1. Tian X. et al. 2019 work is based on the usage of a single model (RetinaNet) that performs simultaneous polyp detection and classification. However, the paper only reports detection results using the ETIS-Larib dataset and therefore this results are included in the Polyp Detection and Localization section.
  2. Ozawa. et al. 2020 work is based on the usage of a single model (Single Show MultiBox Detector, SSD) that performs simultaneous polyp detection and classification. Nevertheless, since the detection and classification results are reported independently, they are included in the sections Polyp Detection and Localization and Polyp Classification, respectively.
  3. Li K. et al. 2021 work is based on the usage of several single models that perform simultaneous polyp detection ad classification. As they report different types of results (frame-based polyp localization, polyp-based classification, and simultaneous frame-based polyp detection and classification), they are included in the three results sections.

Datasets

Public Datasets

Dataset References Description Format Resolution (w x h) Ground truth Used in
CVC-ClinicDB Bernal et al. 2015
https://polyp.grand-challenge.org/CVCClinicDB/
612 sequential WL images with polyps extracted from 31 sequences (23 patients) with 31 different polyps. Image 384 × 288 Polyp locations (binary mask) Brandao et al. 2018, Zheng Y. et al. 2018, Shin Y. et al. 2018, Wang et al. 2018, Qadir et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Jia X. et al. 2020, Ma Y. et al. 2020, Young Lee J. et al. 2020, Podlasek J. et al. 2020, Qadir et al. 2021, Xu J. et al. 2021, Pacal et al. 2021, Liu et al. 2021, Nogueira-Rodríguez et al. 2022
CVC-ColonDB Bernal et al. 2012
Vázquez et al. 2017
300 sequential WL images with polyps extracted from 13 sequences (13 patients). Image 574 × 500 Polyp locations (binary mask) Tajbakhsh et al. 2015, Brandao et al. 2018, Zheng Y. et al. 2018, Sornapudi et al. 2019, Jia X. et al. 2020, Podlasek J. et al. 2020, Qadir et al. 2021, Xu J. et al. 2021, Pacal et al. 2021, Li K. et al. 2021, Nogueira-Rodríguez et al. 2022
CVC-EndoSceneStill Vázquez et al. 2017 912 WL images with polyps extracted from 44 videos (CVC-ClinicDB + CVC-ColonDB). Image 574 × 500, 384 × 288 Locations for polyp, background, lumen and specular lights (binary mask) Sánchez-Peralta et al. 2020
CVC-PolypHD Bernal et al. 2012
Vázquez et al. 2017
Bernal et al. 2021
https://giana.grand-challenge.org
56 WL images. Image 1920 × 1080 Polyp locations (binary mask) Sornapudi et al. 2019, Nogueira-Rodríguez et al. 2022
ETIS-Larib Silva et al. 2014
https://polyp.grand-challenge.org/ETISLarib/
196 WL images with polyps extracted from 34 sequences with 44 different polyps. Image 1225 × 966 Polyp locations (binary mask) Brandao et al. 2018, Zheng Y. et al. 2018, Shin Y. et al. 2018, Tian Y. et al. 2019, Ahmad et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Jia X. et al. 2020, Podlasek J. et al. 2020, Qadir et al. 2021, Xu J. et al. 2021, Pacal et al. 2021, Liu et al. 2021, Pacal et al. 2022, Nogueira-Rodríguez et al. 2022
Kvasir-SEG / HyperKvasir Pogorelov et al. 2017
Jha et al. 2020
Borgli et al. 2020
https://datasets.simula.no/kvasir-seg
https://datasets.simula.no/hyper-kvasir/
1 000 polyp images Image Various resolutions Polyp locations (binary mask and bounding box) Sánchez-Peralta et al. 2020, Podlasek J. et al. 2020, Nogueira-Rodríguez et al. 2022
ASU-Mayo Clinic Colonoscopy Video Tajbakhsh et al. 2016
https://polyp.grand-challenge.org/AsuMayo/
38 small SD and HD video sequences: 20 training videos annotated with ground truth and 18 testing videos without ground truth annotations. WL and NBI. Video 688 × 550 Polyp locations (binary mask) Yu et al. 2017, Brandao et al. 2018, Zhang R. et al. 2018, Ahmad et al. 2019, Sornapudi et al. 2019, Wittenberg et al. 2019, Mohammed et al. 2018, Li K. et al. 2021
CVC-ClinicVideoDB Angermann et al. 2017
Bernal et al. 2018
Bernal et al. 2021
https://giana.grand-challenge.org
38 short and long sequences: 18 SD videos for training. Video 768 × 576 Polyp locations (binary mask) Shin Y. et al. 2018, Qadir et al. 2019, Ma Y. et al. 2020, Xu J. et al. 2021, Nogueira-Rodríguez et al. 2022
Colonoscopic Dataset Mesejo et al. 2016
http://www.depeca.uah.es/colonoscopy_dataset/
76 short videos (both NBI and WL). Video 768 × 576 Polyp classification (Hyperplastic vs. adenoma vs. serrated) Zhang R. et al. 2017, Li K. et al. 2021
PICCOLO Sánchez-Peralta et al. 2020
https://www.biobancovasco.org/en/Sample-and-data-catalog/Databases/PD178-PICCOLO-EN.html
3 433 images (2 131 WL and 1 302 NBI) from 76 lesions from 40 patients. Image 854 × 480, 1920 × 1080 Polyp locations (binary mask)
Polyp classification, including: Paris and NICE classifications, Adenocarcinoma vs. Adenoma vs. Hyperplastic, and histological stratification
Sánchez-Peralta et al. 2020, Pacal et al. 2022, Nogueira-Rodríguez et al. 2022
LDPolypVideo Ma Y. et al. 2021
https://github.com/dashishi/LDPolypVideo-Benchmark
160 videos (40 187 frames: 33 876 polyp images and 6 311 non-polyp images) with 200 labeled polyps.
103 videos (861 400 frames: 371 400 polyp images and 490 000 non-polyp images) without full annotations.
Video 768 x 576 (videos), 560 × 480 (images) Polyp locations (bounding box) Ma Y. et al. 2021, Nogueira-Rodríguez et al. 2022
KUMC dataset Li K. et al. 2021
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR
80 colonoscopy video sequences. It also aggregates the CVC-ColonDB, ASU-Mayo Clinic Colonoscopy Video, and Colonoscopic Dataset datasets. Image Various resolutions Polyp locations (bounding box)
Polyp classification: Adenoma vs. Hyperplastic
Li K. et al. 2021, Nogueira-Rodríguez et al. 2022
CP-CHILD-A, CP-CHILD-B Wang W. et al. 2020
https://figshare.com/articles/dataset/CP-CHILD_zip/12554042
CP-CHILD-A contains 1 000 polyp images and 7 000 non-polyp images.
CP-CHILD-B contains 400 polyp images and 1 100 normal or other pathological images.
Image 256 × 256 Polyp detection: polyp vs. non-polyp annotations Wang W. et al. 2020
SUN Misawa et al. 2021
http://amed8k.sundatabase.org/
49 136 images with polyps from different 100 polyps. 109 554 non-polyp images from 13 video sequences. Image N/A Polyp locations (bounding box) Misawa et al. 2021, Pacal et al. 2022, Nogueira-Rodríguez et al. 2022
Colorectal Polyp Image Cohort (PIBAdb) https://www.iisgaliciasur.es/home/biobanco/colorectal-polyp-image-cohort-pibadb/?lang=en ~31 400 polyp images (~22 600 WL and ~8 800 NBI) from 1 176 different polyps.
~17 300 non-polyp images (including ~2 800 normal-mucosa images and ~500 clean-mucosa images)
Video and image 768 × 576 Polyp locations (bounding box)
Polyp classification: Adenoma vs. Hyperplastic vs. Sessile Serrated Adenoma vs. Traditional Serrated Adenoma vs. Non Epithelial Neoplastic vs. Invasive
Nogueira-Rodríguez et al. 2022, Nogueira-Rodríguez et al. 2023
POLAR (POLyp Artificial Recognition) database https://clinicaltrials.gov/study/NCT03822390
https://www.amc.nl/web/polar-database.htm
Training dataset: 2 637 non-magnified NBI images from 1 339 unique polyps detected during 555 different colonoscopies.
Validation dataset: 730 polyps from 251 patients, prospectively collected by 20 endoscopists from 8 hospitals.
Image N/A Polyp locations (bounding box)
Polyp classification: Adenomas vs. Hyperplastic vs. Sessile Serrated Adenoma
NBIPolyp-UCdb Figueiredo et al. 2019
https://www.mat.uc.pt/~isabelf/Polyp-UCdb/NBIPolyp-UCdb.html
86 NBI images from 11 colonoscopy videos. Image 576 × 720 pixels Polyp locations (binary mask)
WLPolyp-UCdb Figueiredo et al. 2019
Figueiredo et al. 2020
https://www.mat.uc.pt/~isabelf/Polyp-UCdb/WLPolyp-UCdb.html
1 680 polyp images from 42 different polyps (40 images/polyp).
1 360 normal colonic mucosa images.
Image 576 × 720 pixels No ground truth provided.
PolypGen Ali et al. 2023
https://github.com/DebeshJha/PolypGen
https://www.synapse.org/#!Synapse:syn26376615/wiki/613312
1 537 polyp images, 2 225 positive video sequences, and 4,275 negative frames. Video and image N/A Polyp locations (binary mask).

Private Datasets

Study Patients No. Images No. Videos No. Unique Polyps Purpose Comments
Tajbakhsh et al. 2015 N/A 35 000
With polyps: 7 000
Without polyps: 28 000
40 short videos (20 positive and 20 negative) N/A Polyp localization -
Zhu R. et al. 2015 N/A 180 - N/A Polyp localization -
Park and Sargent 2016 N/A 652
With polyps: 92
35 (20’ to 40’) N/A Polyp localization -
Ribeiro et al. 2016 66 to 86 85 to 126 - N/A Polyp classification (neoplastic vs non-neoplastic) 8 datasets by combining: (i) with or without staining mucosa, (ii) 4 acquisition modes (without CVC, i-Scan1, i-Scan2, i-Scan3).
Zhang R. et al. 2017, Zheng Y. et al. 2018 N/A 1930
Without polyps: 1 104
Hyperplastic: 263
Adenomatous: 563
- 215 polyps (65 hyperplastic and 150 adenomatous) Polyp classification (hyperplastic vs. adenomatous) PWH Database.
Images taken under either WL or NBI endoscopy.
Yuan and Meng 2017 35 4 000
Normal WCE images: 3 000 (1 000 bubbles, 1 000 turbid, and 1 000 clear)
Polyp images: 1 000
- N/A Polyp detection -
Byrne et al. 2017 N/A N/A 388 N/A Polyp classification (hyperplastic vs. adenomatous)
Komeda et al. 2017 N/A 1 800
Adenomatous: 1200
Non-adenomatous: 600
- N/A Polyp classification (adenomatous vs. non-adenomatous) -
Chen et al. 2018 N/A 2 441
Training:
- Neoplastic: 1476
- Hyperplastic: 681
Testing:
- Neoplastic: 188
- Hyperplastic: 96
- N/A Polyp classification (hyperplastic vs. neoplastic) -
Misawa et al. 2018 73 N/A 546 (155 positive and 391 negative) 155 Polyp detection -
Urban et al. 2018 > 2000 8 641 - 4 088 Polyp localization Used as training dataset.
Urban et al. 2018 N/A 1 330
With polyps: 672
Without polyps: 658
- 672 Polyp localization Used as independent dataset for testing.
Urban et al. 2018 9 44 947
With polyps: 13 292
Without polyps: 31 655
9 45 Polyp localization Used as independent dataset for testing.
Urban et al. 2018 11 N/A 11 73 Polyp localization Used as independent dataset for testing with “deliberately more challenging colonoscopy videos.”.
Wang et al. 2018 1 290 5 545
With polyps: 3 634
Without polyps: 1 911
- N/A Polyp localization Used as training dataset.
Wang et al. 2018 1 138 27 113
With polyps: 5 541
Without polyps: 21 572
- 1 495 Polyp localization Used as testing dataset.
Wang et al. 2018 110 - 138 138 Polyp localization Used as testing dataset.
Wang et al. 2018 54 - 54 0 Polyp localization Used as testing dataset.
Lui et al. 2019 N/A 8 000
Curable lesions: 4 000
Incurable lesions: 4 000
- Curable lesions: 159
Incurable lesions: 493
Polyp classification (endoscopically curable vs. incurable lesions) Used as training dataset.
This study is focused on larger endoscopic lesions with risk of submucosal invasion and lymphovascular permeation.
Lui et al. 2019 N/A 567 - Curable: 56
Incurable: 20
Polyp classification (endoscopically curable vs. incurable lesions) Used as testing dataset.
This study is focused on larger endoscopic lesions with risk of submucosal invasion and lymphovascular permeation.
Tian Y. et al. 2019 218 871
MS I: 102
MS II: 346
MS IIo: 281
MS IIIa: 79
MS IIIb: 63
- N/A Polyp classification (5 classes) -
Blanes-Vidal et al. 2019 255 11 300
With polyps: 4 800
Without polyps: 6 500
N/A 331 polyps (OC) and 375 (CCE) Polyp localization CCE: Colorectal capsule endoscopy.
OC: conventional optical colonoscopy.
Zhang X. et al. 2019 215 404 - N/A Polyp localization -
Misawa et al. 2019 N/A 3 017 088 - 930 Polyp detection Used as training set.
Misawa et al. 2019 64 (47 with polyps and 17 without polyps) N/A N/A 87 Polyp detection Used as testing set.
Kandel et al. 2019 552 N/A - 963 Polyp classification (hyperplastic, serrated adenomas (sessile/traditional), adenomas)
Zachariah et al. 2019 N/A 5 278
Adenoma: 3 310
Serrated: 1 968
- 5 278 Polyp classification (adenoma vs. serrated) Used as training set.
Zachariah et al. 2019 N/A 634 - N/A Polyp classification (adenoma vs. serrated) Used as testing set.
Zhu X. et al. 2019 283 1 991 - N/A Polyp detection Adenomatous polyps.
Ahmad et al. 2019 N/A 83 716
With polyps: 14 634
Without polyps: 69 082
17 83 Polyp localization White Light Images.
Sornapudi et al. 2019 N/A 55 N/A 67 Polyp localization Wireless Capsule Endoscopy videos.
Used as testing set.
Sornapudi et al. 2019 N/A 1 800
With polyps: 530
Without polyps: 1 270
18 N/A Polyp localization Wireless Capsule Endoscopy videos.
Used as training set.
Wittenberg et al. 2019 N/A 2 484 - 2 513 Polyp localization -
Yuan Y. et al. 2019 80 7 200
Polyp images: 1 200
Normal images (mucosa, bubbles, and turbid): 6 000
80 N/A Polyp detection -
Ma Y. et al. 2019 1 661 3 428 - N/A Polyp localization -
Liu X. et al. 2019 2 000 8 000
Polyp: 872
Adenoma: 1 210
- N/A Polyp localization and classification (polyp vs. adenoma) -
Bour et al. 2019 N/A 785
Not dangerous: 699
Dangerous: 25
Cancer: 61
- N/A Polyp classification (not dangerous vs. dangerous vs. cancer) -
Patino-Barrientos et al. 2020 142 600
Type I: 47
Type II: 90
Type III: 183
Type IV: 187
Type V: 93
- N/A Polyp classification (malignant vs. non-malignant) -
Cheng Tao Pu et al. 2020 N/A 1 235
MS I: 103
MS II: 429
MS IIo: 293
MS IIIa: 295
MS IIIb: 115
- N/A Polyp classification (5 classes) Australian (AU) dataset (NBI).
Used as training set.
Cheng Tao Pu et al. 2020 N/A 20
MS I: 3
MS II: 5
MS IIo: 2
MS IIIa: 7
MS IIIb: 3
- N/A Polyp classification (5 classes) Japan (JP) dataset (NBI).
Used as testing set.
Cheng Tao Pu et al. 2020 N/A 49
MS I: 9
MS II: 10
MS IIo: 10
MS IIIa: 11
MS IIIb: 9
- N/A Polyp classification (5 classes) Japan (JP) dataset (BLI).
Used as testing set.
Ozawa. et al. 2020 3 417 (3 021 with polyps and 396 without polyps) 20 431

WL: 17 566
- Adenoma: 9 310
- Hyperplastic: 2 002
- SSAP: 116
- Cancer: 1 468
- Other types: 657
- Normal mucosa: 4 013

NBI: 2 865
- Adenoma: 2 085
- Hyperplastic: 519
- SSAP: 23
- Cancer: 131
- Other types: 107
- Normal mucosa: 0
- 4 752
Adenoma: 3 513
Hyperplastic: 1 058
SSAP: 22
Cancer: 68
Other types: 91
Polyp localization and classification (Adenoma vs. hyperplastic vs. SSAP vs. cancer vs. other types) Used as training set.
Ozawa. et al. 2020 174 7 077

WL: 6 748
- Adenoma: 639
- Hyperplastic: 145
- SSAP: 33
- Cancer: 30
- Other types: 27
- Normal mucosa: 5 874

NBI: 329
- Adenoma: 208
- Hyperplastic: 69
- SSAP: 8
- Cancer: 3
- Other types: 10
- Normal mucosa: 31
- 309
Adenoma: 218
Hyperplastic: 63
SSAP: 7
Cancer: 4
Other types: 17
Polyp localization and classification (Adenoma vs. hyperplastic vs. SSAP vs. cancer vs. other types) Used as testing set.
Young Lee J. et al. 2020 103 8 075 181 N/A Polyp localization Used as training set.
Young Lee J. et al. 2020 203 420 N/A 322 hyperplastic or sessile serrated adenomas Polyp localization Used as training set.
Young Lee J. et al. 2020 7 108 778
- With polyps: 7 022
- Without polyps: 101 756
7 26 Polyp localization Used as testing set.
Young Joo Yang et al. 2020 1 339 3 828
- Tubular adenoma: 1 316
- Non-neoplastic: 896
- High-grade dysplasia: 621
- N/A Polyp classification Used as training/test set.
Young Joo Yang et al. 2020 240 240
- Tubular adenoma: 116
- Non-neoplastic: 113
- Early CRC/High-grade dysplasia: 8
- Advanced CRC: 3
- N/A Polyp classification External validation dataset.
Li T. et al. 2020 - 7 384
- With polyps: 509
- Without polyps: 6 875
23 N/A Polyp detection Colonoscopy videos obtained from YouTube, VideoGIE, and Vimeo.
Podlasek J. et al. 2020 123 79 284 157 N/A Polyp localization Used as development (train/validation split) dataset.
Podlasek J. et al. 2020 - 2 678 - N/A Polyp localization Used as development (train/validation split) dataset.
Podlasek J. et al. 2020 34 - 42 N/A Polyp localization Used as testing dataset.
Xu J. et al. 2021 262 1 482 - 1 683 Polyp localization RenjiImageDB. Used as testing set.
Xu J. et al. 2021 14 8 837
With polyps: 3 294
Without polyps: 5 543
14 15 Polyp localization RenjiVideoDB. Used as testing set.
Misawa et al. 2021 N/A 56 668
With polyps: 55 644
Without polyps: 1024
N/A N/A Polyp localization Used as development (train/validation split) dataset.
Livovsky et al. 2021 2 487 With polyps: 204 687 (189 994 video frames + 14 693 still images)
Without polyps: 80 M (80 M video frames + 158 646 still images)
3 611 8 471 Polyp localization Used as training set.
Livovsky et al. 2021 1 181 33 M video frames 1 393 3 680 Polyp localization Used as testing set.
Nogueira-Rodríguez et al. 2021 330 28 576
White-light: 21 046
NBI: 7 530
- 941 Polyp localization -
Yoshida et al. 2021 25 N/A N/A 100:
LED endoscope: 53 (25 neoplastic and 28 hyperplastic)
LASER endoscope: 47 (30 neoplastic and 17 hyperplastic)
Polyp localization and classification (neoplastic vs. hyperplastic) Testing set to evaluate the CAD EYE (Fujifilm) system.
Fitting et al. 2022 N/A ENDOTEST validation dataset: 24 polyp and their corresponding non-polyp video sequences (22 856 images: 12 161 with polyps and 10 695 without polyps) ENDOTEST performance dataset: 10 full length colonoscopy videos with 24 different polyps (230 898 images) N/A Polyp locations (bounding box)

Deep Learning Models and Architectures

Deep Learning Architectures

Off-the-shelf Architectures

Study Task Models Framework TL Layers fine-tuned Layers replaced Output layer
Ribeiro et al. 2016 Classification AlexNet, GoogLeNet, Fast CNN, Medium CNN, Slow CNN, VGG16, VGG19 - ImageNet N/A Layers after last CNN layer SVM
Zhang R. et al. 2017 Detection and classification CaffeNet - ImageNet and Places205 N/A Tested connecting classifier to each convolutional layer (5 convolutional layers) SVM (Poly, Linear, RBF, and Tahn)
Chen et al. 2018 Classification Inception v3 - ImageNet N/A Last layer FCL
Tian Y. et al. 2019 Localization and Classification RetinaNet (based on ResNet-50) N/A ImageNet N/A Last layer N/A
Misawa et al. 2018, Misawa et al. 2019 Detection C3D - N/A N/A N/A N/A
Zheng Y. et al. 2018 Localization - YOLOv1 PASCAL VOC 2007 and 2012 All - -
Shin Y. et al. 2018 Localization Inception ResNet-v2 Faster R-CNN with post-learning schemes COCO All - RPN and detector layers
Urban et al. 2018 Localization ResNet-50, VGG16, VGG19 - ImageNet
Also without TL
All Last layer FCL
Wang et al. 2018 Localization VGG16 SegNet N/A N/A N/A N/A
Wittenberg et al. 2019 Localization ResNet101 Mask R-CNN COCO All (incrementally) Last layer FCL
Yuan Y. et al. 2019 Detection DenseNet Tensorflow - All - FCL
Ma Y. et al. 2019 Localization SSD Inception v2 Tensorflow N/A N/A - -
Liu X. et al. 2019 Localization and classification Faster R-CNN with Inception Resnet v2 Tensorflow COCO All - -
Zachariah et al. 2019 Classification Inception ResNet-v2 Tensorflow ImageNet N/A Last layer Graded scale transformation with sigmoid activation
Bour et al. 2019 Classification ResNet-50, ResNet-101, Xception, VGG19, Inception v3 Keras (Tensorflow) Yes N/A Last layer N/A
Patino-Barrientos et al. 2020 Classification VGG16 Keras (Tensorflow) ImageNet None, Last three Last layer Dense with sigmoid activation
Ozawa. et al. 2020 Localization and Classification SSD (Single Shot MultiBox Detector) Caffe N/A All - -
Ma Y. et al. 2020 Localization YOLOv3, RetinaNet N/A ImageNet N/A N/A N/A
Young Lee J. et al. 2020 Localization YOLOv2 N/A N/A N/A N/A N/A
Young Joo Yang et al. 2020 Classification ResNet-152, Inception-ResNet-v2 PyTorch ImageNet All N/A N/A
Wang W. et al. 2020 Detection VGG16, VGG19, ResNet-101, ResNet-152 PyTorch - All Last layer Fully Connected Layer or Global Average Pooling
Li T. et al. 2020 Detection AlexNet Caffe ImageNet N/A N/A N/A
Sánchez-Peralta et al. 2020 Localization Backbone: VGG-16 or Densenet121, Encoder-decoder: U-Net or LinkNet Keras (Tensorflow) No - N/A N/A
Podlasek J. et al. 2020 Localization EfficientNet B4, RetinaNet N/A No - N/A N/A
Misawa et al. 2021 Localization YOLOv3 N/A Yes N/A - FCL
Livovsky et al. 2021 Localization LSTM-SSD N/A No - - -
Nogueira-Rodríguez et al. 2021, Nogueira-Rodríguez et al. 2022, Nogueira-Rodríguez et al. 2023 Localization YOLOv3 MXNet PASCAL VOC 2007 and 2012 All - FCL
Ma Y. et al. 2021 Localization RetinaNet, Faster RCNN, YOLOv3, and CenterNet N/A ImageNet - - N/A

Custom Architectures

Study Task Based on Highlights
Tajbakhsh et al. 2014, Tajbakhsh et al. 2015 Localization None Combination of classic computer vision techniques (detection and location) with DL (correction of prediction).
The ML method proposes candidate polyps. Then, three sets of multi-scale patches around the candidate are generated (color, shape and temporal). Each set of patches is fed to a corresponding CNN.
Each CNN has 2 convolutional layers, 2 fully connected layers, and an output layer.
The maximum score for each set of patches is computed and averaged.
Zhu R. et al. 2015 Localization LeNet-5 CNN fed with 32x32 images taken from patches generated via a sliding window of 16 pixels over the original images.
The LeNet-5 network inspires the CNN architecture. ReLU used as activation function.
Last two layers replaced with a cost-sensitive SVM.
Positively selected patches are combined to generate the final output.
Park and Sargent 2016 Localization None Based on a previous work with no DL techniques.
An initial quality assessment and preprocessing step filters and cleans images, and proposes candidate regions of interest (RoI).
CNN replaces previous feature extractor. Three convolutional layers with two interspersed subsampling layers followed by a fully connected layer.
A final step uses a Conditional Random Field (CRF) for RoI classification.
Yu et al. 2017 Localization None Two 3D-FCN are used:
- An offline network trained with a training dataset.
- An online network initialized with the offline weights and updated each 60 frames with the video frames. Only the last two layers are updated.

The last 16 frames are used for predicting each frame.
Two convolutional layers followed by a pooling layer each, followed by two groups of two convolutional layers followed by a pooling layer each and finished with two convolutional layers converted from fully connected layers.
The output of each network is combined to generate the final output.
Yuan and Meng 2017 Detection Stacked Sparse AutoEncoder (SSAE) A modification of a Sparse AutoEncoder to include an image manifold constraint, named Stacked Sparse AutoEncoder with Image Manifold Constraint (SSAEIM).
SSAEIM is built by stacking three SAEIM layers followed by an output layer. Image manifold information is used on each layer.
Byrne et al. 2017 Classification Inception v3 Last layer replaced with a fully connected layer.
A credibility score is calculated for each frame with the current frame prediction and the credibility score of the previous frame.
Komeda et al. 2017 Classification None Two convolutional layers followed by a pooling layer each, followed by a final fully connected output layer.
Brandao et al. 2018, Ahmad et al. 2019 Localization AlexNet, GoogLeNet, ResNet-50, ResNet-101, ResNet-152, VGG Networks pre-trained with PASCAL VOC and ImageNet datasets where converted into fully-connected convolutional networks by replacing the fully connected and scoring layers with a convolution layer. A final deconvolution layer with an output with the same size as the input.
A regularization operation is added between every convolutional and activation layer.
VGG, ResNet-101 and ResNet-152 were tested also using shape-form-shading features.
Zhang R. et al. 2018 Localization YOLO Custom architecture RYCO that consist of two networks:
1. A regression-based deep learning with residual learning (ResYOLO) detection model to locate polyp in a frame.
2. A Discriminative Correlation Filter (DCF) based method called Efficient Convolution Operators (ECO) to track the detected polyps.

The ResYOLO network detects new polyps in a frame, starting the polyp tracking.
During tracking, both ResYOLO and ECO tracker are used to determine the polyp location.
Tracking stops when a confidence score calculated using last frames is under a threshold value.
Urban et al. 2018 Detection None Two custom CNNs a proposed. First CNN is built just with convolutional, maximum pooling and fully connected layers. Second CNN also includes batch normalization layers and inception modules.
Urban et al. 2018 Localization YOLO The 5 CNNs used for detection (two custom, VGG16, VGG19 and ResNet-50) are modified by replacing the fully connected layers with convolutional layers.
The last layer has 5 filter maps that have its outputs spaced over a grid over the input image. Each grid cell predicts its confidence with a sigmoid unit, the position of the polyp relative to the grid cell center, and its size. The final output is the weighted sum of all the adjusted positions and size predictions, weighted with the confidences.
Mohammed et al. 2018 Detection Y-Net The frame-work consists of two fully convolution encoder networks which are connected to a single decoder network that matches the encoder network resolution at each down-sampling operation. The network are trained with encoder specific adaptive learning rates that update the parameters of randomly initialized encoder network with a larger step size as compared to the encoder with pre-trained weights. The two encoders features are merged with a decoder network at each down-sampling paththrough sum-skip connection.
Lui et al. 2019 Classification ResNet Network with 5 convolutional layers and 2 fully connected layers but based on a pre-trained ResNet CNN backbone.
Qadir et al. 2019 Localization None Framework for false positive (FP) reduction is proposed.
The framework adds a FP reduction unit to an RPN network. This unit exploits temporal dependencies between frames (forward and backward) to correct the output.
Faster R-CNN and SSD RPNs were tested.
Blanes-Vidal et al. 2019 Localization R-CNN with AlexNet Several modifications done to AlexNet:
- Last fully connected layer replaced to output two classes.
- 5 convolutional and 3 fully connected layers were fine-tuned.
- Max-Pooling kernels, ReLU activation function and dropout used to avoid overfitting and build robustness to intra-class deformations.
- Stochastic gradient descent with momentum used as the optimization algorithm.
Zhang X. et al. 2019 Localization SSD SSD was modified to add three new pooling layers (Second-Max Pooling, Second-Min Pooling and Min-Pooling) and a new deconvolution layer whose features are concatenated to those from the Max-Pooling layer that are fed into the detection layer.
Model was pre-trained on the ILSVRC CLS-LOC dataset.
Kandel et al. 2019 Classification CapsNet A convolutional layer followed by 7 convolutional capsule layers and finalized with a global average pool by capsule type.
Sornapudi et al. 2019 Localization Mask R-CNN The region proposal network (RPN) uses a Feature Pyramid Network with a ResNet backbone. ResNet-50 and ResNet-101 were used, improved by extracting features from 5 different levels of layers. ResNet networks were initialized with COCO and ImageNet. Additionally, 76 random balloon images from Flickr were used to fine-tune networks initialized with COCO.
The regions proposed by the RPN were filtered before the ROIAlign layer.
The ROIAlign layer is followed by a pixel probability mask network, comprised of 4 convolutional layers followed by a transposed convolutional layer and a final convolutional layer with a sigmoid activation function that generates the final output. All convolutional layers except final are built with ReLU activation function.
Tashk et al. 2019 Localization U-Net The U-Net architecture was modified to use as input any image or video formats associated with optical colonoscopy modalities.
Patino-Barrientos et al. 2020 Classification None The model is composed by four convolutional layers, each one of them followed by a max pooling layer. After that, the model has a dropout layer to reduce overfitting and then add a final dense layer with sigmoid activation that outputs the probability of the current polyp being malignant. The model was trained using the RMSprop optimizer with a learning rate of 1×10−4.
Jia X. et al. 2020 Localization ResNet-50, Feature Pyramid Network, and Faster R-CNN Authors propose a two-stage framework, where the polyp proposal stage (stage I) is constructed as a region-level polyp detector that is capable of guiding the pixel-level learning in the polyp segmentation stage (stage II), aiming to accurately segment the area the polyp occupies in the image. This framework has a backbone network composed by a ResNet-50 followed by a Feature Pyramid Network, producing a set of feature maps that are used by the two-stage framework. The polyp proposal stage was created as as an extension of faster R-CNN, which performs as a region-level polyp detector to recognize the lesion area as a whole. Then, the polyp segmentation stage is built in a fully convolutional fashion for pixelwise segmentation. This two-stage framework has a feature sharing strategy in which the learned semantics of polyp proposals of stage I are transferred to the segmentation task of stage II.
Qadir et al. 2021 Localization Resnet34 and MDeNet Authors propose a modified version of MDeNet, proposed them in Qadir et al. 2019. See section 2.3. F-CNN models for polyp detection of Qadir et al. 2021 for more details.
Xu J. et al. 2021 Localization YOLOv3 Authors present a framework based on YOLOv3 to improve detection. This frameworks adds: (i) a False Positive Relearning Module (FPRM) to make the detector network learning more about the features of FPs for higher precision; (ii) an Image Style Transfer Module (ISTM) to enhance the features of polyps for higher sensitivity; (iii) an Inter-Frame Similarity Correlation unit (ISCU) to integrate spatiotemporal information, which is combined with the image detector network to improve performance in video detection in order to reduce FPs.
Pacal et al. 2021 Localization YOLOv4 Authors propose several models based on YOLOv4. To create their "Proposed Model1 (Small)" they first replaced the whole structure with Cross Stage Partial Networks (CSPNet), then substitute the Mish activation function for the Leaky ReLu activation function and also substituted the Distance Intersection over Union (DIoU) loss for the Complete Intersection over Union (CIoU) loss.
Liu et al. 2021 Localization Resnet101 and Domain adaptive Faster R-CNN Authors propose a consolidated domain adaptive framework with a training free style transfer process, a hierarchical network, and a centre besiegement loss for accurate cross-domain polyp detection and localization.
Pacal et al. 2022 Localization YOLOv3, YOLOv4 Authors propose modified versions of YOLOv3 and YOLOv4 by integrating Cross Stage Partial Network (CSPNet). With the aim of improving the detection performance, they also use the Sigmoid-weighted Linear Unit (SiLU) activation function and the Complete Intersection over Union (CIoU) loss functions.

Data Augmentation Strategies

  Rotation Flipping (Mirroring) Shearing Crop Random brightness Translation (Shifting) Scale Zooming Gaussian smoothing Blurring Saturation adjustment Gaussian distortion Resize Random contrast Exposure adjustment Color augmentations in HSV Mosaic Mix-up Histogram equalization Skew Random erasing Color distribution adjust Clipping Sharpening Cutmix Color jittering Random image expansion
Num. Studies 28 26 12 9 9 8 8 6 4 4 3 3 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1
Tajbakhsh et al. 2015 X X X X X
Park and Sargent 2016 X X
Ribeiro et al. 2016 X X
Yu et al. 2017 X X
Byrne et al. 2017 X X X
Brandao et al. 2018 X
Zhang R. et al. 2018 X X X X X
Zheng Y. et al. 2018 X
Shin Y. et al. 2018 X X X X X X
Urban et al. 2018 X X X
Mohammed et al. 2018 X X X X X X
Qadir et al. 2019 X X X X
Tian Y. et al. 2019 X X X X X
Blanes-Vidal et al. 2019 X X X
Zhang X. et al. 2019 X
Zhu X. et al. 2019 X X X
Sornapudi et al. 2019 X X X X X X
Wittenberg et al. 2019 X X
Yuan Y. et al. 2019 X X X X X
Ma Y. et al. 2019 X X X
Bour et al. 2019 X X X X X X X X X
Patino-Barrientos et al. 2020 X X X X X
Cheng Tao Pu et al. 2020 X X X
Ma Y. et al. 2020 X X X X X
Young Lee J. et al. 2020 X X X X
Young Joo Yang et al. 2020 X
Wang W. et al. 2020 X X X
Li T. et al. 2020 X X X X
Podlasek J. et al. 2020 X X
Qadir et al. 2021 X X X X
Xu J. et al. 2021 X X X
Misawa et al. 2021 X X X
Livovsky et al. 2021 X X
Pacal et al. 2021a X X X X X X X X X X X X X X
Liu et al. 2021 X X X X X X
Nogueira-Rodríguez et al. 2021 X X X X
Ma Y. et al. 2021 X X X X X
Pacal et al. 2021B X X X X X

Frameworks and Libraries

Framework/Library # Studies Used by
Tensorflow 9 Chen et al. 2018, Shin Y. et al. 2018, Mohammed et al. 2018, Yuan Y. et al. 2019, Ma Y. et al. 2019, Liu X. et al. 2019, Zachariah et al. 2019, Bour et al. 2019, Patino-Barrientos et al. 2020, Sánchez-Peralta et al. 2020
Caffe 8 Zhu X. et al. 2019, Yu et al. 2017, Brandao et al. 2018, Wang et al. 2018, Zhang X. et al. 2019, Ozawa. et al. 2020, Jia X. et al. 2020, Li T. et al. 2020
Keras 6 Urban et al. 2018, Mohammed et al. 2018, Sornapudi et al. 2019, Wittenberg et al. 2019, Bour et al. 2019, Patino-Barrientos et al. 2020, Sánchez-Peralta et al. 2020, Xu J. et al. 2021
PyTorch 5 Young Joo Yang et al. 2020, Wang W. et al. 2020, Pacal et al. 2021, Liu et al. 2021, Pacal et al. 2022
MXNet 3 Nogueira-Rodríguez et al. 2021, Nogueira-Rodríguez et al. 2022, Nogueira-Rodríguez et al. 2023
C3D 2 Misawa et al. 2018, Misawa et al. 2019
DarkNet 2 Pacal et al. 2021, Pacal et al. 2022
MatConvNet (MATLAB) 1 Ribeiro et al. 2016

Performance

Note: Some performance metrics are not directly reported in the papers, but were derived using raw data or confusion matrices provided by them.

Polyp Detection and Localization

Performance metrics on public and private datasets of all polyp detection and localization studies.

  • Between parentheses it is specified the type of performance metric: i = image-based, bb = bounding-box-based, p = polyp-based, pa = patch, and pi = pixel-based.
  • Between curly brackets it is specified the training dataset, where "P" stands for private.
  • Between square brackets it is specified the test dataset used for computing the performance metric, where "P" stands for private.
  • For instance, [{P}] means that development and test splits of the same private dataset have been used for training and testing respectively.
  • Performances marked with an * are reported on training datasets (e.g. k-fold cross-validation).
  • AP stands for Average Precision.

Note: Since february 2022, the former frame-based (f) type was split into image-based and bounding-box-based, which accurately reflects the type of evaluation done. Please, note that our review paper uses frame-based and includes both.

Study Recall (sensitivity) Precision (PPV) Specificity Others Manually selected images?
Tajbakhsh et al. 2015 70% (bb) {[P]} 63% (bb) {[P]} 90% (bb) {[P]} F1: 0.66, F2: 0.68 (bb) {[P]} No
Zhu R. et al. 2015 79.44% (pa) {[P]} N/A 79.54% (pa) {[P]} Acc: 79.53% (pa) {[P]} Yes
Park and Sargent 2016 86% (bb) {P} * - 85% (bb) {P} * AUC: 0.86 (bb) {P} * Yes (on training)
Yu et al. 2017 71% (bb) {[ASU-Mayo]} 88.1% (bb) {[ASU-Mayo]} N/A F1: 0.786, F2: 0.739 (bb) {[ASU-Mayo]} No
Zhang R. et al. 2017 97.6% (i) {[P]} 99.4% (i) {[P]} N/A F1: 0.98, F2: 0.98, AUC: 1.00 (i) {[P]} Yes
Yuan and Meng 2017 98% (i) {P} * 97% (i) {P} * 99% (i) {P} * F1: 0.98, F2: 0.98 (i) [P] Yes
Brandao et al. 2018 ~90% (bb) {CVC-ClinicDB + ASU-Mayo} [ETIS-Larib]

~90% (bb) {CVC-ClinicDB + ASU-Mayo} [CVC-ColonDB]
~73% (bb) {CVC-ClinicDB + ASU-Mayo} [ETIS-Larib]

~80% (bb) {CVC-ClinicDB + ASU-Mayo} [CVC-ColonDB]
N/A F1: ~0.81, F2: ~0.86 (bb) {CVC-ClinicDB + ASU-Mayo} [ETIS-Larib]

F1: ~0.85, F2: ~0.88 (bb) {CVC-ClinicDB + ASU-Mayo} [CVC-ColonDB]
Yes
Zhang R. et al. 2018 71.6% (bb) {[ASU-Mayo]} 88.6% (bb) {[ASU-Mayo]} 97% (bb) {[ASU-Mayo]} F1: 0.792, F2: 0.744 (bb) {[ASU-Mayo]} No
Misawa et al. 2018 90% (i) {[P]}

94% (p) {[P]}
55.1% (i) {[P]}

48% (p) {[P]}
63.3% (i) {[P]}

40% (p) {[P]}
F1: 0.68 (i) 0.63 (p), F2: 0.79 (i) 0.78 (p) {[P]}

Acc: 76.5% (i) 60% (p) {[P]}
No
Zheng Y. et al. 2018 74% (bb) {CVC-ClinicDB + CVC-ColonDB} [ETIS-Larib] 77.4% (bb) {CVC-ClinicDB + CVC-ColonDB} [ETIS-Larib] N/A F1: 0.757, F2: 0.747 (bb) {CVC-ClinicDB + CVC-ColonDB} [ETIS-Larib] Yes
Shin Y. et al. 2018 80.3% (bb) {CVC-ClinicDB} [ETIS-Larib]

84.2% (bb) {CVC-ClinicDB} [ASU-Mayo]

84.3% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]
86.5% (bb) {CVC-ClinicDB} [ETIS-Larib]

82.7% (bb) {CVC-ClinicDB} [ASU-Mayo]

89.7% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]
N/A F1: 0.833, F2: 0.815 (bb) {CVC-ClinicDB} [ETIS-Larib]

F1: 0.834, F2: 0.839 (bb) {CVC-ClinicDB} [ASU-Mayo]

F1: 0.869, F2: 0.853 (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]
Yes (ETIS-Larib)

No (ASU-Mayo, CVC-ClinicVideoDB)
Urban et al. 2018 93% (bb) {P1} [P2]

100% (p) {P1} [P2]

93% (p) {P1} [P3]
74% (bb) {P1} [P2]

35% (p) {P1} [P2]

60% (p) {P1} [P3]
93% (bb) {P1} [P2] F1: 0.82, F2: 0.88 (bb) {P1} [P2]

F1: 0.52, F2: 0.73 (p) {P1} [P2]

F1: 0.73, F2: 0.84 (p) {P1} [P3]
No
Wang et al. 2018 88.24% (bb) {P} [CVC-ClinicDB]

94.38% (bb) {P} [P (dataset A)]

91.64% (bb), 100% (p) {P} [P (dataset C)]
93.14% (bb) {P} [CVC-ClinicDB]

95.76% (bb) {P} [P (dataset A)]
95.40% (bb) {P} [P (dataset D)] F1: 0.91, F2: 0.89 (bb) {P} [CVC-ClinicDB]

F1: 0.95, F2: 0.95, AUC: 0.984 (bb) {P} [P (dataset A)]
Yes (dataset A, CVC-ClinicDB)

No (dataset C/D)
Mohammed et al. 2018 84.4% (bb) {[ASU-Mayo]} 87.4 % (bb) {[ASU-Mayo]} N/A F1: 0.859, F2: 0.85 (bb) {[ASU-Mayo]} No
Qadir et al. 2019 81.51% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] 87.51% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] 84.26% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] F1: 0.844, F2: 0.83 (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] No
Tian Y. et al. 2019 64.42% (bb) {P} [ETIS-Larib] 73.6% (bb) {P} [ETIS-Larib] - F1: 0.687, F2: 0.66 (bb) {P} [ETIS-Larib] Yes
Blanes-Vidal et al. 2019 97.1% (bb) {[P]} 91.4% (bb) {[P]} 93.3% (bb) {[P]} Acc: 96.4%, F1: 0.94, F2: 0.95 (bb) {[P]} N/A (not clear in the paper)
Zhang X. et al. 2019 76.37% (bb) {[P]} 93.92% (bb) {[P]} N/A F1: 0.84, F2: 0.79(bb) {[P]} Yes
Misawa et al. 2019 86% (p) {P1} [P2] N/A 74% (i) {P1} [P2] - No
Zhu X. et al. 2019 88.5% (i) {P1} [P2] N/A 96.4% (i) {P1} [P2] - No
Ahmad et al. 2019 91.6% (bb) {P} [ETIS-Larib]

84.5% (bb) {ETIS-Larib + P} [P]
75.3% (bb) {P} [ETIS-Larib] 92.5% (bb) {ETIS-Larib + P} [P] F1: 0.83, F2: 0.88 (bb) {P} [ETIS-Larib] Yes (ETIS-Larib)

No (private)
Sornapudi et al. 2019 91.64% (bb) {CVC-ClinicDB} [CVC-ColonDB]

78.12% (bb) {CVC-ClinicDB} [CVC-PolypHD]

80.29% (bb) {CVC-ClinicDB} [ETIS-Larib]

95.52% (bb) {[P]}
89.94% (bb) {CVC-ClinicDB} [CVC-ColonDB]

83.33% (bb) {CVC-ClinicDB} [CVC-PolypHD]

72.93% (bb) {CVC-ClinicDB} [ETIS-Larib]

98.46% (bb) {[P]}
N/A F1: 0.9073, F2: 0.9127 (bb) {CVC-ClinicDB} [CVC-ColonDB]

F1: 0.8065, F2: 0.7911 (bb) {CVC-ClinicDB} [CVC-PolypHD]

F1: 0.7643, F2: 0.7870 (bb) {CVC-ClinicDB} [ETIS-Larib]

F1: 0.966, F2: 0.961 (bb) {[P]}
Yes (CVC-ClinicDB, CVC-ColonDB, ETIS-Larib)

No (WCE video)
Wittenberg et al. 2019 86% (bb) {P} [CVC-ClinicDB]

83% (bb) {P} [ETIS-Larib]

93% (bb) {[P]}
80% (bb) {P} [CVC-ClinicDB]

74% (bb) {P} [ETIS-Larib]

86% (bb) {[P]}
N/A F1: 0.82, F2: 0.85 (bb) {P} [CVC-ClinicDB]

F1: 0.79, F2: 0.81 (bb) {P} [ETIS-Larib]

F1: 0.89, F2: 0.92 (bb) {[P]}
Yes
Yuan Y. et al. 2019 90.21% (i) {[P]} 74.51% (i) {[P]} 94.07% (i) {[P]} Accuracy: 93.19%, F1: 0.81, F2: 0.86 (i) {[P]} Yes
Ma Y. et al. 2019 93.67% (bb) {[P]} N/A 98.36% (bb) {[P]} Accuracy: 96.04%, AP: 94.92% (bb) {[P]} Yes
Tashk et al. 2019 82.7% (pi) {[CVC-ClinicDB]}

90.9% (pi) {[ETIS-Larib]}

82.4% (pi) {[CVC-ColonDB]}
70.2% (pi) {[CVC-ClinicDB]}

70.2 (pi) {[ETIS-Larib]}

62% (pi) {[CVC-ColonDB]}
- Accuracy: 99.02%, F1: 0.76, F2: 0.798 (pi) {[CVC-ClinicDB]}

Accuracy: 99.6%, F1: 0.7923, F2: 0.858 (pi) {[ETIS-Larib]}

Accuracy: 98.2%, F1: 0.707, F2: 0.773 (pi) {[CVC-ColonDB]}
Yes (CVC-ClinicDB, CVC-ColonDB, ETIS-Larib)
Jia X. et al. 2020 92.1% (bb) {CVC-ColonDB} [CVC-ClinicDB]

59.4% (pi) {CVC-ColonDB} [CVC-ClinicDB]

81.7% (bb) {CVC-ClinicDB} [ETIS-Larib]
84.8% (bb) {CVC-ColonDB} [CVC-ClinicDB]

85.9% (pi) {CVC-ColonDB} [CVC-ClinicDB]

63.9% (bb) {CVC-ClinicDB} [ETIS-Larib]
- F1: 0.883, F2: 0.905 (bb) {CVC-ColonDB} [CVC-ClinicDB]

F1: 0.702, F2: 0.633, Jaccard: 74.7±20.5, Dice: 83.9±13.6 (pi) {CVC-ColonDB} [CVC-ClinicDB]

F1: 0.717, F2: 0.774 (bb) {CVC-ClinicDB} [ETIS-Larib]
Yes (CVC-ClinicDB, ETIS-Larib)
Ozawa. et al. 2020 92% (bb) {P1} [P2]

90% (bb) {P1} [P2: WL]

97% (bb) {P1} [P2: NBI]

98% (p) {P1} [P2]
86% (bb) {P1} [P2]

83% (bb) {P1} [P2: WL]

97% (bb) {P1} [P2: NBI]
N/A F1: 0.88, F2: 0.88 (bb) {P1} [P2]

F1: 0.86, F2: 0.84 (bb) {P1} [P2: WL]

F1: 0.97, F2: 0.97 (bb) {P1} [P2: NBI]
Yes
Ma Y. et al. 2020 92% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] 87.50% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] N/A F1: 0.897, F2: 0.911 (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB] No
Young Lee J. et al. 2020 96.7% (bb) {[P]}

90.2% (bb) {P} [CVC-ClinicDB]
97.4% (bb) {[P]}

98.2% (bb) {P} [CVC-ClinicDB]
N/A F1: 0.97, F2: 0.97 (bb) {[P]}

F1: 0.94, F2: 0.96 (bb) {P} [CVC-ClinicDB]
Yes (CVC-ClinicDB, private)
Wang W. et al. 2020 97.5% (i) {[CP-CHILD-A]}

98% (i) {CP-CHILD-A} [CP-CHILD-B]
N/A 99.85% (i) {[CP-CHILD-A]}

99.83% (i) {CP-CHILD-A} [CP-CHILD-B]
Accuracy: 99.25% (i) {[CP-CHILD-A]}

Accuracy: 99.34% (i) {CP-CHILD-A} [CP-CHILD-B]
Yes
Li T. et al. 2020 73% (i) {[P]} 93% (i) {[P]} 96% (i) {[P]} NPV: 83%, Acc: 86%, AUC: 0.94 (i) {[P]} Yes
Sánchez-Peralta et al. 2020 74.73% (pi) {PICCOLO} [Kvasir-SEG]

71.88% (pi) {PICCOLO} [CVC-EndoSceneStill]

72.89% (pi) {[PICCOLO]}

69.77% (pi) {PICCOLO} [PICCOLO-WL]

63.31% (pi) {CVC-EndoSceneStill} [Kvasir-SEG]

79.22% (pi) {[CVC-EndoSceneStill]}

45.09% (pi) {CVC-EndoSceneStill} [PICCOLO]

57.06% (pi) {CVC-EndoSceneStill} [PICCOLO-WL]

88.98% (pi) {[Kvasir-SEG]}

83.46% (pi) {Kvasir-SEG} [CVC-EndoSceneStill]

58.11% (pi) {Kvasir-SEG} [PICCOLO]

54.63% (pi) {Kvasir-SEG} [PICCOLO-WL]
81.31% (pi) {PICCOLO} [Kvasir-SEG]

84.35% (pi) {PICCOLO} [CVC-EndoSceneStill]

77.58% (pi) {[PICCOLO]}

71.33% (pi) {PICCOLO} [PICCOLO-WL]

77.80% (pi) {CVC-EndoSceneStill} [Kvasir-SEG]

87.88% (pi) {[CVC-EndoSceneStill]}

52.84% (pi) {CVC-EndoSceneStill} [PICCOLO]

60.93% (pi) {CVC-EndoSceneStill} [PICCOLO-WL]

81.68% (pi) {[Kvasir-SEG]}

83.54% (pi) {Kvasir-SEG} [CVC-EndoSceneStill]

59.54% (pi) {Kvasir-SEG} [PICCOLO]

63.61% (pi) {Kvasir-SEG} [PICCOLO-WL]
97.41% (pi) {PICCOLO} [Kvasir-SEG]

98.85% (pi) {PICCOLO} [CVC-EndoSceneStill]

97.96% (pi) {[PICCOLO]}

97.37% (pi) {PICCOLO} [PICCOLO-WL]

98.15% (pi) {CVC-EndoSceneStill} [Kvasir-SEG]

99.00% (pi) {[CVC-EndoSceneStill]}

97.30% (pi) {CVC-EndoSceneStill} [PICCOLO]

91.12% (pi) {CVC-EndoSceneStill} [PICCOLO-WL]

96.49% (pi) {[Kvasir-SEG]}

97.65% (pi) {Kvasir-SEG} [CVC-EndoSceneStill]

93.29% (pi) {Kvasir-SEG} [PICCOLO]

98.06% (pi) {Kvasir-SEG} [PICCOLO-WL]
F1: 0.779, F2: 0.760, Jaccard: 65.33±30.66, Dice: 73.54±30.15 (pi) {PICCOLO} [Kvasir-SEG]

F1: 0.776, F2: 0.741, Jaccard: 64.18±33.04, Dice: 71.66±32.98 (pi) {PICCOLO} [CVC-EndoSceneStill]

F1: 0.752, F2: 0.738, Jaccard: 64.01±36.23, Dice: 70.10±36.45 (pi) {[PICCOLO]}

F1: 0.705, F2: 0.701, Jaccard: 58.70±38.90, Dice: 64.51±39.18 (pi) {PICCOLO} [PICCOLO-WL]

F1: 0.698, F2: 0.658, Jaccard: 56.12±34.29, Dice: 64.26±35.35 (pi) {CVC-EndoSceneStill} [Kvasir-SEG]

F1: 0.833, F2: 0.808, Jaccard: 72.16±30.93, Dice: 78.61±29.48 (pi) {[CVC-EndoSceneStill]}

F1: 0.487, F2: 0.465, Jaccard: 39.52±37.9, Dice: 45.5±41.51 (pi) {CVC-EndoSceneStill} [PICCOLO]

F1: 0.589, F2: 0.578, Jaccard: 45.00±35.60, Dice: 52.81±38.33 (pi) {CVC-EndoSceneStill} [PICCOLO-WL]

F1: 0.852, F2: 0.874, Jaccard: 74.52±22.81, Dice: 82.68±21.28 (pi) {[Kvasir-SEG]}

F1: 0.835, F2: 0.835, Jaccard: 71.82±29.87, Dice: 78.78±28.14 (pi) {Kvasir-SEG} [CVC-EndoSceneStill]

F1: 0.588, F2: 0.584, Jaccard: 44.92±37.37, Dice: 51.87±39.79 (pi) {Kvasir-SEG} [PICCOLO]

F1: 0.588, F2: 0.562, Jaccard: 47.74±39.55, Dice: 53.62±41.68 (pi) {Kvasir-SEG} [PICCOLO-WL]
Yes
Podlasek J. et al. 2020 91.2% (bb) {P} [CVC-ClinicDB]

88.2% (bb) {P} [Hyper-Kvasir]

74.1% (bb) {P} [CVC-ColonDB]

67.3% (bb) {P} [ETIS-Larib]
97.4% (bb) {P} [CVC-ClinicDB]

97.5% (bb) {P} [Hyper-Kvasir]

92.4% (bb) {P} [CVC-ColonDB]

79% (bb) {P} [ETIS-Larib]
N/A F1: 0.942, F2: 0.924 (bb) {P} [CVC-ClinicDB]

F1: 0.926, F2: 0.899 (bb) {P} [Hyper-Kvasir]

F1: 0.823, F2: 0.771 (bb) {P} [CVC-ColonDB]

F1: 0.727, F2: 0.693 (bb) {P} [ETIS-Larib]
Yes
Qadir et al. 2021 86.54% (bb) {CVC-ClinicDB} [ETIS-Larib]

91% (bb) {CVC-ClinicDB} [CVC-ColonDB]
86.12% (bb) {CVC-ClinicDB} [ETIS-Larib]

88.35% (bb) {CVC-ClinicDB} [CVC-ColonDB]
N/A F1: 0.863, F2: 0.864 (bb) {CVC-ClinicDB} [ETIS-Larib]

F1: 0.896, F2: 0.904 (bb) {CVC-ClinicDB} [CVC-ColonDB]
Yes
Xu J. et al. 2021 75.70% (bb) {CVC-ClinicDB + CVC-ColonDB + ETIS-Larib + CVC-ClinicVideoDB} [P]

71.63% (bb) {CVC-ClinicDB} [ETIS-Larib]

66.36% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]
85.54% (bb) {CVC-ClinicDB + CVC-ColonDB + ETIS-Larib + CVC-ClinicVideoDB} [P]

83.24% (bb) {CVC-ClinicDB} [ETIS-Larib]

88.5% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]
N/A F1: 0.799, F2: 0.773 (bb) {CVC-ClinicDB + CVC-ColonDB + ETIS-Larib + CVC-ClinicVideoDB} [P]

F1: 0.77, F2: 0.737 (bb) {CVC-ClinicDB} [ETIS-Larib]

F1: 0.7586, F2: 0.698 (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]
Yes (ETIS-Larib, Private)

No (CVC-ClinicVideoDB)
Misawa et al. 2021 98% (p) {P} [SUN]

90.5% (i) {P} [SUN]
88.2% (i) {P} [SUN] 93.7% (i) {P} [SUN] F1: 0.893, F2: 0.900, NPV: 94.96% (i) {P} [SUN] No.
Livovsky et al. 2021 97.1% (p) {P1} [P2] N/A N/A N/A No.
Pacal et al. 2021 82.55% (bb) {CVC-ClinicDB} [ETIS-Larib]

96.68% (bb) {CVC-ClinicDB} [CVC-ColonDB]
91.62% (bb) {CVC-ClinicDB} [ETIS-Larib]

96.04% (bb) {CVC-ClinicDB} [CVC-ColonDB]
N/A F1: 0.868, F2: 0.842 (bb) {CVC-ClinicDB} [ETIS-Larib]

F1: 0.964, F2: 0.965 (bb) {CVC-ClinicDB} [CVC-ColonDB]
Yes
Liu et al. 2021 87.5% (bb) {CVC-ClinicDB} [ETIS-Larib] 77.8% (bb) {CVC-ClinicDB} [ETIS-Larib] - F1: 0.824, F2: 0.854 (bb) {CVC-ClinicDB} [ETIS-Larib] Yes (ETIS-Larib)
Li K. et al. 2021 86.2% (bb) {[KUMC]} 91.2% (bb) {[KUMC]} N/A F1: 0.886, F2: 0.8715, AP: 88.5% (bb) {[KUMC]} Yes
Nogueira-Rodríguez et al. 2021 87% (bb) {[P]}

89.91% (p) {[P]}
89% (bb) {[P]} 54.97% (p) {[P]} F1: 0.881, F2: 0.876 (bb) {[P]} Yes
Yoshida et al. 2021 83% (p) {CAD EYE} [P-LED WLI]

87.2% (p) {CAD EYE} [P-LASER WLI] 88.7% (p) {CAD EYE} [P-LED LCI]

89.4% (p) {CAD EYE} [P-LASER LCI]

N/A N/A N/A -
Ma Y. et al. 2021 64% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]

47% (bb) {CVC-ClinicDB} [LDPolypVideo]
85% (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]

65% (bb) {CVC-ClinicDB} [LDPolypVideo]
N/A F1: 0.73, F2: 0.67 (bb) {CVC-ClinicDB} [CVC-ClinicVideoDB]

F1: 0.55, F2: 0.50 (bb) {CVC-ClinicDB} [LDPolypVideo]
Yes (CVC-ClinicDB)

No (LDPolypVideo, CVC-ClinicVideoDB)
Pacal et al. 2022 91.04% (bb) {SUN + PICCOLO + CVC-ClinicDB} [ETIS-Larib]

90.57% (bb) {SUN + CVC-ClinicDB} [ETIS-Larib]

88.24% (bb) {SUN} [ETIS-Larib]

75.53% (bb) {PICCOLO} [ETIS-Larib]

79.85% (bb) {[PICCOLO]}

86.48% (bb) {[SUN]}
90.61% (bb) {SUN + PICCOLO + CVC-ClinicDB} [ETIS-Larib]

90.14% (bb) {SUN + CVC-ClinicDB} [ETIS-Larib]

88.24% (bb) {SUN} [ETIS-Larib]

87.29% (bb) {PICCOLO} [ETIS-Larib]

92.60% (bb) {[PICCOLO]}

96.49% (bb) {[SUN]}
N/A F1: 0.908, F2: 0.909 (bb) {SUN + PICCOLO + CVC-ClinicDB} [ETIS-Larib]

F1: 0.903, F2: 0.902 (bb) {SUN + CVC-ClinicDB} [ETIS-Larib]

F1: 0.882, F2: F1: 0.882 (bb) {SUN} [ETIS-Larib]

F1: 0.809, F2: 0.846 (bb) {PICCOLO} [ETIS-Larib]

F1: 0.857, F2: 0.821 (bb) {[PICCOLO]}

F1: 0.912, F2: 0.883 (bb) {[SUN]}
Yes (ETIS-Larib, PICCOLO)

No (SUN)
Nogueira-Rodríguez et al. 2022 82% (bb) {P} [CVC-ClinicDB]

84% (bb) {P} [CVC-ColonDB]

75% (bb) {P} [CVC-PolypHD]

72% (bb) {P} [ETIS-Larib]

78% (bb) {P} [Kvasir-SEG]

60% (bb) {P} [PICCOLO]

80% (bb) {P} [CVC-ClinicVideoDB]

81% (bb) {P} [KUMC dataset]

76% (bb) {P} [KUMC dataset–Test]

78% (bb) {P} [SUN]

49% (bb) {P} [LDPolypVideo]
87% (bb) {P} [CVC-ClinicDB]

81% (bb) {P} [CVC-ColonDB]

86% (bb) {P} [CVC-PolypHD]

71% (bb) {P} [ETIS-Larib]

84% (bb) {P} [Kvasir-SEG]

76% (bb) {P} [PICCOLO]

75% (bb) {P} [CVC-ClinicVideoDB]

83% (bb) {P} [KUMC dataset]

81% (bb) {P} [KUMC dataset–Test]

83% (bb) {P} [SUN]

56% (bb) {P} [LDPolypVideo]
- F1: , F2: 0.83, AP: 0.82 (bb) {P} [CVC-ClinicDB]

F1: 0.83, F2: 0.83, AP: 0.85 (bb) {P} [CVC-ColonDB]

F1: 0.80, F2: 0.77, AP: 0.79 (bb) {P} [CVC-PolypHD]

F1: 0.72, F2: 0.72, AP: 0.69 (bb) {P} [ETIS-Larib]

F1: 0.81, F2: 0.82, AP: 0.79 (bb) {P} [Kvasir-SEG]

F1: 0.67, F2: 0.62, AP: 0.63 (bb) {P} [PICCOLO]

F1: 0.77, F2: 0.79, AP: 0.77 (bb) {P} [CVC-ClinicVideoDB]

F1: 0.82, F2: 0.81, AP: 0.83 (bb) {P} [KUMC dataset]

F1: 0.78, F2: 0.77, AP: 0.79 (bb) {P} [KUMC dataset–Test]

F1: 0.81, F2: 0.79, AP: 0.81 (bb) {P} [SUN]

F1: 0.52, F2: 0.50, AP: 0.44 (bb) {P} [LDPolypVideo]
Yes (CVC-ClinicDB, CVC-ColonDB, CVC-PolypHD, ETIS-Larib, Kvasir-SEG, PICCOLO, KUMC)

No (CVC-ClinicVideoDB, SUN, LDPolypVideo)
Nogueira-Rodríguez et al. 2023 87.2% (bb) {P} [P]
86.7% (bb) {P2} [P]
87.5% (bb) {P5} [P]
85% (bb) {P10} [P]
88% (bb) {P15} [P]
Intra-dataset Evaluation
Nogueira et al. 2021
89% (bb) {P} [P]
88.2% (bb) {P} [P2]
87.1% (bb) {P} [P5]
85.2% (bb) {P} [P10]
83.6% (bb) {P} [P15]

Not-polyp images increment 2%
89.4% (bb) {P2} [P]
89% (bb) {P2} [P2]
88.6% (bb) {P2} [P5]
87.9% (bb) {P2} [P10]
87.1% (bb) {P2} [P15]

Not-polyp images increment 5%
90.2% (bb) {P5} [P]
89.9% (bb) {P5} [P2]
89.5% (bb) {P5} [P5]
88.8% (bb) {P5} [P10]
88.1% (bb) {P5} [P15]

Not-polyp images increment 10%
90.4% (bb) {P10} [P]
90.2% (bb) {P10} [P2]
90.1% (bb) {P10} [P5]
89.7% (bb) {P10} [P10]
89.5% (bb) {P10} [P15]

Not-polyp images increment 15%
91% (bb) {P15} [P]
90.9% (bb) {P15} [P2]
90.7% (bb) {P15} [P5]
90.4% (bb) {P15} [P10]
90.1% (bb) {P15} [P15]
-
Intra-dataset Evaluation
Nogueira et al. 2021
F1:0.881 (bb) {P} [P]
F1:0.882 (bb) {P} [P2]
F1:0.871 (bb) {P} [P5]
F1:0.852 (bb) {P} [P10]
F1:0.836 (bb) {P} [P15]

Not-polyp images increment 2%
F1:0.880 (bb) {P2} [P]
F1:0.890 (bb) {P2} [P2]
F1:0.886 (bb) {P2} [P5]
F1:0.879 (bb) {P2} [P10]
F1:0.871 (bb) {P2} [P15]

Not-polyp images increment 5%
F1:0.888 (bb) {P5} [P]
F1:0.899 (bb) {P5} [P2]
F1:0.895 (bb) {P5} [P5]
F1:0.888 (bb) {P5} [P10]
F1:0.881 (bb) {P5} [P15]

Not-polyp images increment 10%
F1:0.876 (bb) {P10} [P]
F1:0.902 (bb) {P10} [P2]
F1:0.901 (bb) {P10} [P5]
F1:0.897 (bb) {P10} [P10]
F1:0.895 (bb) {P10} [P15]

Not-polyp images increment 15%
F1:0.895 (bb) {P15} [P]
F1:0.909 (bb) {P15} [P2]
F1:0.907 (bb) {P15} [P5]
F1:0.904 (bb) {P15} [P10]
F1:0.901 (bb) {P15} [P15]

Inter-dataset Evaluation
LDPolypVideo
F1:0.522 (bb) {P} [LDPolypVideo]
F1:0.563 (bb) {P2} [LDPolypVideo]
F1:0.516 (bb) {P5} [LDPolypVideo]
F1:0.491 (bb) {P10} [LDPolypVideo]
F1:0.564 (bb) {P10} [LDPolypVideo]

CVC-ClinicVideoDB
F1:0.774 (bb) {P} [CVC-ClinicVideoDB]
F1:0.803 (bb) {P2} [CVC-ClinicVideoDB]
F1:0.813 (bb) {P5} [CVC-ClinicVideoDB]
F1:0.809 (bb) {P10} [CVC-ClinicVideoDB]
F1:0.800 (bb) {P15} [CVC-ClinicVideoDB]

KUMC dataset
F1:0.818 (bb) {P} [KUMC dataset]
F1:0.811 (bb) {P2} [KUMC dataset]
F1:0.819 (bb) {P5} [KUMC dataset]
F1:0.762 (bb) {P10} [KUMC dataset]
F1:0.831 (bb) {P15} [KUMC dataset]

PICCOLO
F1:0.667 (bb) {P} [PICCOLO]
F1:0.601 (bb) {P2} [PICCOLO]
F1:0.691 (bb) {P5} [PICCOLO]
F1:0.759 (bb) {P10} [PICCOLO]
F1:0.691 (bb) {P15} [PICCOLO]

CVC-ClinicDB
F1:0.845 (bb) {P} [CVC-ClinicDB]
F1:0.843 (bb) {P2} [CVC-ClinicDB]
F1:0.867 (bb) {P5} [CVC-ClinicDB]
F1:0.786 (bb) {P10} [CVC-ClinicDB]
F1:0.824 (bb) {P15} [CVC-ClinicDB]

CVC-ColonDB
F1:0.826 (bb) {P} [CVC-ColonDB]
F1:0.848 (bb) {P2} [CVC-ColonDB]
F1:0.883 (bb) {P5} [CVC-ColonDB]
F1:0.689 (bb) {P10} [CVC-ColonDB]
F1:0.797 (bb) {P15} [CVC-ColonDB]

SUN
F1:0.805 (bb) {P} [SUN]
F1:0.764 (bb) {P2} [SUN]
F1:0.738 (bb) {P5} [SUN]
F1:0.765 (bb) {P10} [SUN]
F1:0.746 (bb) {P15} [SUN]

Kvasir-SEG
F1:0.807 (bb) {P} [Kvasir-SEG]
F1:0.800 (bb) {P2} [Kvasir-SEG]
F1:0.797 (bb) {P5} [Kvasir-SEG]
F1:0.840 (bb) {P10} [Kvasir-SEG]
F1:0.830 (bb) {P15} [Kvasir-SEG]

ETIS-Larib
F1:0.718 (bb) {P} [ETIS-Larib]
F1:0.732 (bb) {P2} [ETIS-Larib]
F1:0.679 (bb) {P5} [ETIS-Larib]
F1:0.594 (bb) {P10} [ETIS-Larib]
F1:0.685 (bb) {P15} [ETIS-Larib]

CVC-PolypHD
F1:0.800 (bb) {P} [CVC-PolypHD]
F1:0.729 (bb) {P2} [CVC-PolypHD]
F1:0.826 (bb) {P5} [CVC-PolypHD]
F1:0.820 (bb) {P10} [CVC-PolypHD]
F1:0.820 (bb) {P15} [CVC-PolypHD]
Yes (PIBAdb, CVC-ClinicDB, CVC-ColonDB, CVC-PolypHD, ETIS-Larib, Kvasir-SEG, PICCOLO, KUMC)
No (CVC-ClinicVideoDB, SUN, LDPolypVideo)

Polyp Classification

Performance metrics on public and private datasets of all polyp classification studies.

  • Between curly brackets it is specified the training dataset, where "P" stands for private.
  • Between square brackets it is specified the test dataset used for computing the performance metric, where "P" stands for private.
  • For instance, [{P}] means that development and test splits of the same private dataset have been used for training and testing respectively.
  • Performances marked with an * are reported on training datasets (e.g. k-fold cross-validation).
Study Classes Recall (sensitivity) Specificity PPV NPV Others Polyp-level vs. frame-level Dataset type
Zhang R. et al. 2017 Adenoma vs. hyperplastic

Resectable vs. non-resectable

Adenoma vs. hyperplastic vs. serrated
92% (resectable vs. non-resectable) {[Colonoscopic Dataset]}

87.6% (adenoma vs. hyperplastic) {[P]}
89.9% (resectable vs. non-resectable) {[Colonoscopic Dataset]}

84.2% (adenoma vs. hyperplastic) {[P]}
95.4% (resectable vs. non-resectable) {[Colonoscopic Dataset]}

87.30% (adenoma vs. hyperplastic) {[P]}
84.9% (resectable vs. non-resectable) {[Colonoscopic Dataset]}

87.2% (adenoma vs. hyperplastic) {[P]}
Acc: 91.3% (resectable vs. non- resectable) {[Colonoscopic Dataset]}

Acc: 86.7% (adenoma vs. serrated adenoma vs. hyperplastic) {[Colonoscopic Dataset]}

Acc: 85.9% (adenoma vs. hyperplastic) {[P]}
frame video (manually selected images)
Byrne et al. 2017 Adenoma vs. hyperplastic 98% {P1} [P2] 83% {P1} [P2] 90% {P1} [P2] 97% {P1} [P2] - polyp unaltered video
Chen et al. 2018 Neoplastic vs. hyperplastic 96.3% {P1} [P2] 78.1% {P1} [P2] 89.6% {P1} [P2] 91.5% {P1} [P2] N/A frame image dataset
Lui et al. 2019 Endoscopically curable lesions vs. endoscopically incurable lesions 88.2% {P1} [P2] 77.9% {P1} [P2] 92.1% {P1} [P2] 69.3% {P1} [P2] Acc: 85.5% {P1} [P2] frame image dataset
Kandel et al. 2019 Hyperplastic vs. serrated adenoma (near focus)

Hyperplastic vs. adenoma (far focus)
57.14% (hyperplastic vs. serrated) {P} *

75.63% (hyperplastic vs. adenoma) {P} *
68.52% (hyperplastic vs. serrated) {P} *

63.79% (hyperplastic vs. adenoma) {P} *
N/A N/A Acc: 67.21% (hyperplastic vs. serrated) {P} *

Acc: 72.48% (hyperplastic vs. adenoma) {P} *
frame image dataset
Zachariah et al. 2019 Adenoma vs. serrated 95.7% {P} * 89.9% {P} * 94.1% {P} * 92.6% {P} * Acc: 93.6%, F1: 0.948, F2: 0.953 {P} * polyp image dataset
Bour et al. 2019 Not dangerous vs. dangerous vs. cancer 88% (Cancer vs. others) [P]

84% (Not dangerous vs. others) [P]

90% (Dangerous vs. others) [P]
94% (Cancer vs. others) [P]

93% (Not dangerous vs. others) [P]

93% (Dangerous vs. others)
88% (Cancer vs. others) [P]

87% (Not dangerous vs. others) [P]

86% (Dangerous vs. others)
N/A Acc: 87.1% [P]

F1: 0.88 (Cancer vs. others) [P]

F1: 0.86 (Not dangerous vs. others) [P]

F1: 0.88 (Dangerous vs. others)
frame image dataset
Patino-Barrientos et al. 2020 Malignant vs. non-malignant 86% {[P]} N/A 81% {[P]} N/A Acc: 83% {[P]}

F1: 0.83 {[P]}
frame image dataset
Cheng Tao Pu et al. 2020 5-class (I, II, IIo, IIIa, IIIb)

Adenoma (classes II + IIo + IIIa) vs. hyperplastic (class I)
97% (adenoma vs. hyperplastic) {P: AU} *

100% (adenoma vs. hyperplastic) {P: AU} [P: JP-NBI]

100% (adenoma vs. hyperplastic) {P: AU} [P: JP-BLI]
51% (adenoma vs. hyperplastic) {P: AU} *

0% (adenoma vs. hyperplastic) {P: AU} [P: JP-NBI]

0% (adenoma vs. hyperplastic) {P: AU} [P: JP-BLI]
95% (adenoma vs. hyperplastic) {P: AU} *

82.4% (adenoma vs. hyperplastic) {P: AU} [P: JP-NBI]

77.5% (adenoma vs. hyperplastic) {P: AU} [P: JP-BLI]
63.5% (adenoma vs. hyperplastic) {P: AU} *

- (adenoma vs. hyperplastic) {P: AU} [P: JP-NBI]

- (adenoma vs. hyperplastic) {P: AU} [P: JP-BLI]
AUC (5-class): 94.3% {P: AU} *

AUC (5-class): 84.5% {P: AU} [P: JP-NBI]

AUC (5-class): 90.3% {P: AU} [P: JP-BLI]

Acc: 72.3% (5-class) {P: AU} *

Acc: 59.8% (5-class) {P: AU} [P: JP-NBI]

Acc: 53.1% (5-class) {P: AU} [P: JP-BLI]

Acc: 92.7% (adenoma vs. hyperplastic) {P: AU} *

Acc: 82.4% (adenoma vs. hyperplastic) {P: AU} [P: JP-NBI]

Acc: 77.5% (adenoma vs. hyperplastic) {P: AU} [P: JP-BLI]
frame image dataset
Ozawa. et al. 2020 Adenoma vs. hyperplastic vs. SSAP vs. cancer vs. other types 97% (adenoma vs. other classes) {P1} [P2: WL]

90% (adenoma vs. hyperplastic) {P1} [P2: WL]

97% (adenoma vs. other classes) {P1} [P2: NBI]

86% (adenoma vs. hyperplastic) {P1} [P2: NBI]


81% (adenoma vs. hyperplastic) {P1} [P2: WL]

88% (adenoma vs. hyperplastic) {P1} [P2: NBI]
86% (adenoma vs. other classes) {P1} [P2: WL]

98% (adenoma vs. hyperplastic) {P1} [P2: WL]

83% (adenoma vs. other classes) {P1} [P2: NBI]

98% (adenoma vs. hyperplastic) {P1} [P2: NBI]
85% (adenoma vs. other classes) {P1} [P2: WL]

48% (adenoma vs. hyperplastic) {P1} [P2: WL]

91% (adenoma vs. other classes) {P1} [P2: NBI]

54% (adenoma vs. hyperplastic) {P1} [P2: NBI]
Acc: 83% (5-class) {P1} [P2: WL]

F1: 0.91, F1: 0.88 (adenoma vs. other classes) {P1} [P2: WL]

F1: 0.94, F2: 0.96 (adenoma vs. hyperplastic) {P1} [P2: WL]

Acc: 81% (5-class) {P1} [P2: NBI]

F1: 0.89, F2: 0.85 (adenoma vs. other classes) {P1} [P2: NBI]

F1: 0.92, F2: 0.95 (adenoma vs. hyperplastic) {P1} [P2: NBI]
frame image dataset
Young Joo Yang et al. 2020 7-class (CRC T1 vs. CRC T2 vs. CRC T3 vs. CRC T4 vs. high-grade dysplasia (HGD) vs. tubular adenoma with or without low grade dysplasia (TA) vs. non-neoplastic lesions)

4-class (advanced CRC (T2, T3, and T4) vs. early CRC/HGD (CRC T1 and HGD) vs. TA vs. non-neoplastic lesions)

Advanced colorectal lesions vs. non-advanced colorectal lesions

Neoplastic lesions vs. non-neoplastic lesions
94.1% (Neoplastic vs. non-neoplastic) {[P1]}

83.2% (Advanced vs. non-advanced) {[P1]}
34.1% (Neoplastic vs. non-neoplastic) {[P1]}

89.7% (Advanced vs. non-advanced) {[P1]}
86.1% (Neoplastic vs. non-neoplastic) {[P1]}

84.5% (Advanced vs. non-advanced) {[P1]}
65% (Neoplastic vs. non-neoplastic) {[P1]}

88.7% (Advanced vs. non-advanced) {[P1]}
Acc: 0.795, F1: 0.899, F2: 0.923, AUC: 0.832 (Neoplastic vs. non-neoplastic) {[P1]}

Acc: 93.5%, F1: 0.838, F2: 0.934, AUC: 0.935 (Advanced vs. non-advanced) {[P1]}

Acc: 71.5%, AUC: 0.760 (Neoplastic vs. non-neoplastic) {P1} [P2]

Acc: 87.1%, AUC: 0.935 (Advanced vs. non-advanced) {P1} [P2]

Acc (7-class): 60.2% {[P1]} 74.7% {P1} [P2]

Acc (4-class): 67.7% {[P1]} 76% {P1} [P2]
frame image dataset
Li K. et al. 2021 Adenoma vs. hyperplastic 86.8% {[KUMC]} N/A 85.8% {[KUMC]} N/A F1: 0.863 {[KUMC]} polyp image dataset
Yoshida et al. 2021 Neoplastic vs. hyperplastic 91.7% {CAD EYE} [P non-magnified BLI]

90.9% {CAD EYE} [P-magnified BLI]
86.8% {CAD EYE} [P non-magnified BLI]

85.2% {CAD EYE} [P-magnified BLI]
82.5% {CAD EYE} [P non-magnified BLI]

83.3% {CAD EYE} [P-magnified BLI]
93.9% {CAD EYE} [P non-magnified BLI]

92.0% {CAD EYE} [P-magnified BLI]
Acc: 88.8% {CAD EYE} [P non-magnified BLI]

Acc: 87.8% {CAD EYE} [P-magnified BLI]
polyp live video

Simultaneous Polyp Detection and Classification

Performance metrics on public and private datasets of all simultaneous polyp detection and classification studies.

  • Between curly brackets it is specified the training dataset, where "P" stands for private.
  • Between square brackets it is specified the test dataset used for computing the performance metric, where "P" stands for private.
  • For instance, [{P}] means that development and test splits of the same private dataset have been used for training and testing respectively.
  • APIoU stands for Average Precision and mAPIoU for Mean Average Precision (i.e. the mean of each class AP), calculated at the specified IoU (Intersection over Union) level.
Study Classes AP mAP Recall (sensitivity) Specificity PPV NPV Others Manually selected images?
Liu X. et al. 2019 Polyp vs. adenoma Polyp: AP0.5 = 83.39% {[P]}
Adenoma: AP0.5 = 97.90% {[P]}
mAP0.5 = 90.645% {[P]} N/A N/A N/A N/A N/A Yes
Li K. et al. 2021 Adenoma vs. Hyperplastic Adenoma: AP = 81.1% {[KUMC]}
Hyperplastic: AP = 65.9% {[KUMC]}
mAP = 73.5% {[KUMC]} 61.3% {[KUMC]} 86.3% {[KUMC]} 92.2% {[KUMC]} 49.1% {[KUMC]} F1: 0.736 {[KUMC]} Yes

List of Acronyms and Abbreviations

  • AP: Average Precision.
  • BLI: Blue Light Imaging.
  • LCI: Linked-Color Imaging.
  • mAP: Mean Average Precision.
  • NBI: Narrow Band Imaging.
  • SSAP: Sesile Serrated Adenoma/Polyp.
  • WCE: Wireless Capsule Endoscopy.
  • WL: White Light.

References and Further Reading

Reviews

Datasets

Randomized Clinical Trials

Study Title Date Number of patients
Wang et al. 2019 Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study Sep. 2019 1058
Gong et al. 2020 Detection of colorectal adenomas with a real-time computer-aided system (ENDOANGEL): a randomised controlled study Jan. 2020 704
Wang et al. 2020 Effect of a deep-learning computer-aided detection system on adenoma detection during colonoscopy (CADe-DB trial): a double-blind randomised study Jan. 2020 1010
Liu et al. 2020 Study on detection rate of polyps and adenomas in artificial-intelligence-aided colonoscopy Feb. 2020 1026
Su et al. 2019 Impact of a real-time automatic quality control system on colorectal polyp and adenoma detection: a prospective randomized controlled study (with videos) Feb. 2020 659
Repici et al. 2020 Efficacy of Real-Time Computer-Aided Detection of Colorectal Neoplasia in a Randomized Trial Aug. 2020 685