Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qin Wang, Hui Che, Weizhen Ding, Li Xiang, Guanbin Li, Zhen Li, Shuguang Cui

Abstract

Differentiation of colorectal polyps is an important clinical examination. A computer-aided diagnosis system is required to assist in performing accurate diagnosis from colonoscopy images. Most previous studies attempt to develop models for polyp differentiation by using Narrow-Band Imaging (NBI) or other enhanced images. However, the lagging broad application of those imaging techniques limits the clinical usage scenario of those developed models. Considering the priority to use white light (WL) in the examination, in this paper we propose a novel framework based on teacher-student architecture for the colorectal polyp classification (CPC) task directly from WL colonoscopy images. In practice, NBI data is utilized to train a teacher network and guide a student network to learn more rich feature representation from WL images. The feature transfer is realized by domain alignment and contrastive learning. Eventually the final student network has the ability to extract aligned features from WL images to facilitate CPC. Besides, we release the first public-available paired CPC dataset containing WL-NBI pairs for alignment training purpose. Quantitative and qualitative evaluation indicates that the proposed method outperforms the previous method in CPC, improving the accuracy by 5.6%.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_3

SharedIt: https://rdcu.be/cyl7Y

Link to the code repository

https://github.com/qinwang-ai/PolypsAlign

Link to the dataset(s)

https://drive.google.com/drive/folders/1e2t5HhQf08sTAE_CPRNVgpi6YUKgQSHn?usp=sharing


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes colorectal poly image classification in white light image by training a network with paired near-band images (NBI) and white-light (WL) images. The network training incorporates student-teacher learning algorithm to extract features from WL (student) similar to teacher (NBI). This is further refined with contrastive learning triplet loss maximizing the positive-paired (WL-NBI) and minimizing the positive-negative (aligned-unaligned WL) distributions. The dataset of paired WL-NBI images is also made available.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Training the network to learn NBI-type features from WL images.
    2. Combining student-teacher algorithm with contrastive learning to improve domain alignment.
    3. Ablation studies to evaluate the contribution of the different loss functions.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The dataset is an imbalanced dataset with 307 adenomas and 116 hyperplastic images. Accuracy as a metric is thus insufficient for evaluating the proposed network. Sensitivity, Specificity and F1-score need to be included to better evaluate the model.
    2. The proposed approach is missing in generalizability analysis. Training on the hospital data and testing on the public ISIT-UMR dataset (and possibly vice-versa) to better understand the generalizability can be included.
    3. The dataset used crops the image to focus on the polyp region. This does not account for the challenge of localizing the polyp in real-time colonoscopy images. An evaluation and/or discussion on the same is missing.
    4. The approach in [14] which is used a baseline was trained on whole polyp images and not cropped ones. This makes the comparison unfair. Alternate baseline approaches need to be evaluated and/or training the proposed algorithm on whole polyp images.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset appears to have been stored on a temporary file storage and is currently not available (link has expired). Since this is one of the contributions listed, it would be better to store the data in a more permanent location.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. With the dataset being imbalanced, including additional evaluation metrics such as sensitivity, specificity and F1-score would be beneficial.
    2. The proposed analysis can be strengthened further with generalizability analysis i.e. training on the hospital data and testing on the public ISIT-UMR dataset (and possibly vice-versa). Use of other publicly available polyp datasets as an external validation sets is another avenue.
    3. The dataset used crops the image to focus on the polyp region. This does not account for the challenge of localizing the polyp in real-time colonoscopy images. An evaluation and/or discussion on the same is missing.
    4. Additional baseline algorithms can be evaluated.
    5. In Figure 3, the similarity between aligned and NBI features is not clearly visible specifically in columns 5 & 6. Also, a colormap/scale for the feature representation in the image would be useful to interpret the feature strengths/magnitude.
    6. Another interesting analysis that can be included in future work would be to observe the change in alignment space with inclusion of each loss individually.
    7. Were any serrated adenomas included in the dataset? If so, providing additional class labels in releasing the dataset could be beneficial to the research community.
  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary concerns were:

    1. Insufficient evaluation metrics
    2. Potentially unfair baseline approach which was designed for whole polyp images.
    3. Unknown performance on whole polyp images.
  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The authors propose a method to transform features extracted from white light images into a more discriminative domain (similar to the ones that could be obtained using NBI).

    The authors propose a method that uses student-teacher networks in a GAN formulation in order to train from paired WL and NBI images to try to approximate the domain of the features of WL to the ones in NBI, which is known to be more discriminative.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very interesting, fuses relevant state-of-the-art techniques to improve the accuracy of methods in WL images by “learning to translate” them to a different, more discriminative domain, similar to when NBI imaging modality is available.

    While the techniques used in the work are not 100% new, the formulation is interesting and can be applied in many other similar problems. Fuses solutions from image translation (but in the feature space) together with student-teacher architectures in order to solve the domain transferability problem.

    The approach could be seen as “teaching the network to artificially see beyond WL images as if they were NBI images” and it reminds me to similar problems like “monocular depth estimation” or “MRI <> CT translation”. Only in this case the translation is not the output but happens in the background as a mean to improve the accuracy when WL only is available.

    Authors release a new dataset of paired images (WL <> NBI).

    Domain transferability is a very relevant topic not studied enough.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper is generally good, well written and easy to follow. Very interesting, but with limited comparisons to other (or similar) SOTA methods.

    Methods used in the paper are not novel and similar techniques have been used in other computer vision fields. Application is novel.

    Limited technical novelty and in depth analysis of proposed functions. E.g. components of loss function in eq. (4) are uniformly weighted - I assume different components would have different scale and their impact would have to be weighted - but there is limited analysis on the impact of the different losses or methods in the final accuracy of the system.

    The datasets in this work are rather small - unless I misunderstood the number of images, in which case the next need clarified. The work would benefit from further validation in larger datasets.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors propose to release the dataset, and provide a link to it - but the link was dead at the time of my review.. Source code is provided in a ZIP as supplementary material and I suppose will be uploaded to github or other public repository later.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Read above.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach is interesting and relevant. But the experimental results lack sufficient comparison and the dataset seems small, which makes difficult to validate the technical novelty of the paper.

    The approach seems to be novel and interesting.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    The authors proposed a deep learning model that utilizes the domain alignment strategy to classify colorectal polyp from white-light colonoscopy images. Besides, they release the 1st public-available paired CPC dataset containing white light narrow-band imaging pairs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The release of a new dataset is good to promote the research in this field. (2) The definition of colorectal polyp classification task is well described. (3) Authors compared the baseline method and the proposed method on the dataset and achieved superior classification performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The compared method is limited to several deep learning structures. However, when tackling small-scale medical imaging datasets, the traditional machine learning strategy is very competitive. (2) The performance of domain alignment is not disscussed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide enough information for reproducing the reported results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    (1) Since the release of a new dataset is among the main contributions of this submission. More traditional machine learning methods (handcrafted features + classifier structures) should be discussed. (2) It’s better to list the visualization (such as t-sne) of domain aligned vs. non domain aligned features in the latent space to demonstrate whether the proposed could learn the common representation across different image domains.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (1) Presentation and discussion of the proposed method. (2) The novelty and reproducibility of this study.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    7

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Reviewers commented that the strength of the paper is the incorporation of transfer learning using the student-teacher learning architecture. Another strength is the public release of the colorectal polyp classification dataset containing white-light Narrow-Band colonoscopy images for training.

    The novelty is in the application, but combined with the public release of the data, adds to the strength of the paper.

    The authors are invited to comment on insufficient metrics for evaluation, size of the dataset, and concerns of the experimental results.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

We thank all the reviewers for their careful considerations and beneficial suggestions. We will further polish our paper. The common questions are firstly answered, then we reply to individual reviewer.

Novelties and Contributions. In this paper, we focus on the direct colorectal polyp classification (CPC) using WL images. This task is of great clinical significance to avoid the manual switch for different modalities, which is also confirmed by reviewers and meta-reviewer. Besides, we release the first public-available polyp classification dataset named CPC-Paired, including WL-NBI image pairs. Furthermore, we propose a teacher-student model with domain alignment to improve CPC accuracy. Reviewer #1

Q1: Additional evaluation metrics for imbalance data A1: For experiment results, the imbalanced and balanced dataset classification accuracies of adenomas and hyperplastic lesions are 91.5%, 77.4% and 87.2%, 80.6% respectively. Besides, the F1-score for latter one is 88.7%. The ROC Curve will be included in the final version.

Q2: Generalized analysis A2: Thanks for your suggestions of the generalized analysis. While training on hospital data and testing on ISIT-UMR, the accuracies are 81.3% for adenomas and 71.4% for hyperplastic lesions, F1-score is 83.9%, which shows a superior generalized capacity. More results are added in the final version.

Q3: Cropped regions of polyp for real-time applications. A3: We only focus on CPC task with WL images in this paper. Such cropped regions can be easily obtained by previous methods, e.g., HarDNet-MSEG achieves high accuracy of 0.969 with fast inference speed 86.7FPS. Besides, for the CPC task, our model achieves 58.6FPS. Hence, if combined two models, the overall FPS is around 35.0 FPS, which still satisfies the real-time requirement in the clinic.

Q4: Training on whole polyp images.

A4: For a fair comparison with Yang [14], our model is also trained with resolution 400x400 but our original images are much larger than that. Hence, cropping is applied. Besides, all models are trained on the same datasets to keep fair comparison.

Q5: Experimental details. A5: We focus on hyperplastic and adenomatous categories in our dataset and ISIT-UMR dataset, which are more important for clinic diagnosis. Serrated adenomas would be labeled in the next work. And columns 5 & 6 in Figure 3 would be improved accordingly.

Reviewer #3 Q1: Comparison with other SOTA methods. A1: This is a new task with great clinical importance. For limited existed work, they study the classification performance of various popular CNNs on multi-modal data. In the experiment, we select VGG, InceptionV3, and ResNet50 as backbones. In the CPC task with WL colonoscopy image as input, our method outperforms all the previous methods.

Q2: The work would benefit from further validation in larger datasets. A2: Our paired dataset includes white and NBI images for 307 adenomatous and 116 hyperplastic lesions. Our model contains in total 846 images, which is the largest for CPC task currently. Note that in the clinic, it is difficult for endoscopists to collect paired images in the same position because of the colonoscopy device and bowel movement. We would progressively collect a larger dataset. Besides, the dataset and code would be publicly available once acceptance.

Q3: Details for loss function in eq. (4). A3: We will add the weighted hyper-parameters in the final version.

Reviewer #4 Q1: Comparisons with traditional machine learning methods. A1: The classification accuracy using SVM with WL images is only 78%, much lower than ours, which would be included in the final version.

Q2: Visualization (such as t-sne) for aligned feature in the latent space. A2: Thanks for the comments, we will provide a better-visualized feature representation figure using t-sne.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors incorporate transfer learning using the student-teacher learning architecture in a novel manner and apply it to the problem of colorectal polyp classification. Additionally, the authors propose to release the dataset consisting of white-light Narrow-Band colonoscopy images publicly so other networks can be trained on them. This is commendable.

    The novelty is in the application, but combined with the public release of the data, adds to the strength of the paper.

    The authors are invited to comment on insufficient metrics for evaluation, size of the dataset, and concerns of the experimental results.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors mentioned they will include the AUC results, but it’d have been good to see them in the rebuttal. The authors also mentioned that they will include generalisability analysis with Training on the hospital data and testing on the public ISIT-UMR dataset (and vice-versa), but again, it’d have been good to see them in the rebuttal. Experimental details were clarified. Given the remaining issues, I believe the paper is premature to be accepted to MICCAI, so it should be rejected.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    17



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The valid link for the released data is still not provided in the rebuttal. Considering the major contribution of this paper is to release a dataset and insufficient experimental evaluation, the data need to be released publicly.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



back to top