Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Anuja Vats, Marius Pedersen, Ahmed Mohammed, Øistein Hovde

Abstract

The progress in Computer Aided Diagnosis (CADx) of Wireless Capsule Endoscopy (WCE) is thwarted by the lack of data. The inadequacy in richly representative healthy and abnormal conditions results in isolated analyses of pathologies, that can not handle realistic multi-pathology scenarios. In this work, we explore how to learn more for free, from limited data through solving a WCE multicentric, multi-pathology classification problem. Learning more implies to learning more than full supervision would allow with the same data. This is done by combining self supervision with full supervision, under multi task learning. Additionally, we draw inspiration from the Human Visual System (HVS) in designing self supervision tasks and investigate if seemingly ineffectual signals within the data itself can be exploited to gain performance, if so, which signals would be better than others. Further, we present our analysis of the high level features as a stepping stone towards more robust multi-pathology CADx in WCE. Code accompanying this work will be made available on github.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_1

SharedIt: https://rdcu.be/cyl7W

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

For the work author has proposed a novel Multi Task Learning (MTL) framework as a combination of two types of tasks, supervised pathology classification task (SPT) and self-supervised (SS) distortion level classification task (SSDT).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Some strength of the papers are mentioned below.
1. It is claimed to be first ones to perform multi-centric pathology classification on WCE using MTL and SS.
2. Increased the pathology classification accuracy by 7% from a Single Task (ST) based pathology classification.
3. Discovered that the clusters of similarity that automatically emerge in a high level feature space correspond to those manually identified in other works.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. some text needs to be convert in to the tabular forms so they would be more visible and clear to the reader
2. Image in figures pasted are not of good quality needs to be changed accordingly as per the provided template
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

NA
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
The author has contributed for paper entitled “Learning More for Free - A Multi Task Learning Approach for Improved Pathology Classification in Capsule Endoscopy”. The paper is very interesting and written in an amazing way. Here I would like to provide some comments mentioned below:
1. The author can use more tables to explain the techniques and datasets for clear understanding .
2. How the method work on the WCE dataset where any pathology is shaded/blurred out behind bubbles or other bile materials present in the same WCE frames?
3. How this experiment correlates task uncertainties with the nature?
4. How did you create the dataset for various cluster if many of the characteristics might overlap to each other?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Author has provide a good technical work for the proposed methodology and have clarified on the techniques and the datasets used for this work. The result analysis and further investigation provide the desired outcome for the validation of the work.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

Applying multi-task learning + multi-level supervision to classify pathologies on wireless endoscopy images
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel objective function integrating multi-task function with semi-supervised classification.
- Fig 4 is hard to parse (too small), but is interesting in revealing commonalities in representations.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Not clear what the clinical relevance is (do experts do this task badly? what is gained?)
- Not super clear why MTL and levels of supervision are required here. Would a more naive approach perform poorly?
- The 2 tasks appear to be (1) classifying for a pathology, and (2) classifying for presence of an artifact (termed distortion). Why classify distortion level?
- Without validation curves, Fig 2b doesn’t reveal that there is less overfitting.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Data description is incomplete. Experimental setup is not fully described.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- In Equation 2, please clarify what sigma represents: variance of what? Classifier variance is different from classifier uncertainty.
- Clarify experimental design: was this a one-v-all setup or a 3-class problem? What was the class distribution? How much optimization?
- What is Config 1 & 2 as “design choices”?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Core idea is interesting, but experimental design and validation are lacking details. Also no strong motivation for the work as a whole, or the approach specifically.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

Authors use MTL to classify lesions in WCE images, using two levels of supersion to overcome this problem. They use a public database (CAD-CAP) in this work, and try to study the influence of SSDTs in the performance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written, and the method seems novel and has innovation. It is clear the improvements that MTL gives to this problem, and the explanation of the method itself is well constructed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- I have a lot of doubts about the experiments done by the authors. They use a “public” database (CAD-CAP) and state that use “1800 images belonging to three classes “normal”,”inflammatory lesion including ulcers” and “vascular lesion””. This database has more than 25000 images, and looking at the division of this database, the 3 classes the author state, have more thamn 24000 images. It was not explained why only using some of the images, and if it was so, the authors should specify which images were used and which were not. Another point that should be clarified is the one where authors analyse the “medically-perceived similarity” and tthe “representational similarity”. They analyze different “medical features”, but they never state how this classification was done, and in fact how many images lay in other clusters. For example, it is hard to see the differences between the lumen and the mucosal folds clusters, when looking at the images. Was this done by physicians?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although it is stated that the database is public, it is not available anywhere. It seems it was made available only for a specific challenge and was not yet been publicly available. The authors did not use all the images of the database and did not specify why and what images were used.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

You should improve the explanations given about the experiments made, and improve the result analysis of the “A note on similarity” section.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although it is an innovative method, there are a lot of questions about the experiments made by the authors, so it is difficult to make sure that these results are in fact reproducible and that the conclusions mantain the same.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a framework for multi-task learning with applications to WCE (lesion classification and distortion classification). The work seems novel for this application and the topic is definitely of great interest. However, the reviewers have brought up several questions that need to be clarified. Please respond to these. In particular, some of the issues to focus on would be : 1) What is the need for distortion classification? 2) Clarification of design and experimental choices 3) Clarification about the dataset used
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

We would like to thank the reviewers for their feedback and appreciate that they find our work innovative , formulation of objective function ‘novel’ and explanations well constructed. The following are our responses to some concerns.

What is the need for distortion classification? Our method derives inspiration from how medical practitioners diagnose. They can quickly identify distortions in addition to pathologies. The question under consideration in our work is - does this knowledge of distortion, change their way of looking at an image i.e., is an unconscious adjustment inspiring their decision. If yes, then learning a distortion can provide a useful inductive bias for pathology classification. We consider this to be an important question in wireless capsule endoscopy (and thus of clinical significance [R2 #3]), as pathology appearances can change considerably under distortions. Multi Task Learning (MTL) & Self supervision allow us a way to mimic this idea algorithmically [R2 #4.2]. Further to address the question by R2, ‘Why classify distortion Levels and not just the presence of distortion?’ We do this for 2 reasons : a.) From preliminary experiments we verified that classifying the presence of distortion (to say which one of Motion Blur, Contrast or Brightness) is globally most pervasive, is an easy (Config 2) task. Such task has early plateauing effect and loses advantage in MTL. Considering that awareness about the presence of distortion may inform networks understanding of pathologies, by classifying distortion levels we can match the level of complexities between two tasks (main and auxiliary) such that plateauing can be prevented. The levels help to tune MTL to reach the beneficial configuration that would not be possible by just predicting the presence of distortions. b.) We choose only those distortions that are inherently present in capsule endoscopy locally, in varying degrees. By adding distortions additionally and forcing network to discriminate the level may force comparison between different local areas, which is eventually beneficial in a representation.

Clarification of design and experiment choices Experimental – 3-way pathology classification with balanced class distribution, Adam optimizer, batch size of 64, an initial learning rate of 0.01 and decay rate of 0.1 every 50 epochs on Nvidia Twin Titan RTX. Design - For Brightness and Contrast the design choice is discrimination between levels. Config 1 has discrimination 0.1 or less between levels e.g. [.8,.9,1,1.1] whereas config 2 has 0.3 or more, e.g. [.7,1,1.3,1.6]. For motion blur the design choice is the kernel size with each size [3,5,10,15] being in config 2.

Clarification about the dataset – CAD CAP [14] is a multicentric database of 25000 images, that has been used in MICCAI 2017 & 2018 challenges [https://endovis.grand-challenge.org/]. However, as shown in Fig 2 in [14], 20,000 images out of 25,000 comprise of only normal images & rest 5000 are images of different pathologies of varying relevance, for different tasks (detection, localization). For a more challenging classification task, in accordance with clinical expectations, a balanced dataset of 1812 images with 3 classes–inflammatory, vascular lesion & normal was created. This dataset has been used in our work. It is balanced & has 600 images each of the three conditions.

If physicians were involved? [R3 #4.1]- We would like to address a misunderstanding here. As mentioned in Section ‘A note on similarity’, a gastroenterologist involved in this work brings medical perspective and has helped in identifying the dominant factors presented in Fig 4.

We have updated our manuscript to accommodate all helpful comments and suggestions (including additional validation curves and explanations) by the reviewers without extending the paper.

[14.] Cad-cap: a 25,000-image database serving the development of artificial intelligence for capsule endoscopy.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I think the authors have addressed the concerns about clarifications of the dataset and design. Since there is novelty in the method that they propose, I think this paper can contribute to the miccai publications.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a Multi Task Learning (MTL) framework with supervised classification task (SPT) and self-supervised (SS) distortion level classification task. Although authors have replied the reasons for using distortion for the diagnosis, the argument is still not clear and convincing. What kind of distoration utilized in this work (parameters for the different distortion?). Moreover, this paper just utilized the existing frameworks including the uncertainty part for the diagnosis, and lacks the novelty. Although the authors clarify the datasets issues in the rebuttal, the reasons of how to choosing 1812 images in this experiment is still not clear. If it includes 1812 images, why the author mentioned “600 images each of the three conditions”?
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

13

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I believe the authors addressed most of the concerns from the reviewers, therefore,I would like to recommend to accept.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

back to top

Learning More for Free - A Multi Task Learning Approach for Improved Pathology Classification in Capsule Endoscopy