Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Zihao Liu, Ruiqin Xiong, Tingting Jiang

Abstract

Automated skin lesion recognition of dermoscopy images is effective for improving diagnostic performance. Current popular solutions either leverage single images to learn better feature representations or take advantage of pairwise images for more discriminative recognition. However, they ignore modelling the relationship between important regions within the central lesion area, or mining the deeper semantic correlation between different images. In this paper, we propose a novel Multi-level Relationship Capture Network (MRCN), which focuses on the relationship mining at two different levels, the region level and the image level. Specifically, with the guidance of the expertise of dermatologists, a region-correlation learning module is proposed to model the relationship between different important regions in the central lesion area. Meanwhile, a cross-image learning module is designed to model the deep semantic correlation between multiple images. Besides, a lesion discerning module and a consistency regularization module are adopted to extract the feature of the lesion area and to serve as an extra consistency constraint respectively. Comprehensive experiments are conducted on five challenging datasets, and the experimental results show that our method can achieve the state-of-the-art performance compared to previous work, which demonstrates the advantages and superiority of our method. For reproducible scientific research, our code will be publicly available.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_15

SharedIt: https://rdcu.be/cyl8a

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Based on the practical cues that doctors might use to perform skin lesion diagnosis, the authors proposed a set of modules that capture different aspects mainly based on attention mechanisms: 1-a segmentation module to delimit the lesion 2-a region correlation module to compute the similarity between different regions of the same lesion 3-a cross-image correlation module that captures similarities between images from two different images 4-a fusion module that merges the outputs of the previous modules 5-a regularizer that ensures that information flowing from different images can be assigned correctly to the respective image The resulting model is evaluated on 3 versions of the ISIC challenge and an ablation study shows the effectiveness of each of the modules.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1- The introduced modules are inspired from practical cues or difficulties and well motivated. Although the theoretical foundation of each of those methods is not new, arranging all of them together in a pipeline is novel and interesting 2- the conducted ablation study shows that the introduced modules let the network learn more about different aspects of the lesion and that the combination of those information effectively leads to better results.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1- the paper is hard to follow since all learned functions are defined in the implementation details 2- although the ablation study shows the effectiveness of each of the introduced modules, there is no evidence that each of them is doing what it is supposed to do. This does not apply to the LD block which is trained to perform semantic segmentation with explicit supervision. 3- reporting results on ISIC 2016 and ISIC 2017 in the main paper and pushing the larger and more challenging ISIC 2019 to supplementary material is not justified and counter intuitive 4- missing information about the size and depth of the proposed model and other SOTA methods raises questions about the fairness of the comparison 5- reporting a single number for each model raises questions about the statistical significance of the recorded improvement
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This paper presents enough information about the training pipeline, the architecture and the datasets to make the results reproducible. The authors missed to report the required time to train their model. They also do not report the results in the form of mean+-std which would help to evaluate the statistical significance of the recorded improvement.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Since the paper relies on a heavily engineered framework, it is crucial for the reader to fully understand each component before moving to the next one. This is unfortunately not the case in this paper since details to the learned functions are introduced late in the paper. Similarly, it would have been great to tell the reader from the beginning that the segmentation backbone is trained separately with the suitable supervision. For sake of fairness, authors should also think of reporting the size of all considered networks. It is known that larger and deeper networks generally outperform their shallow counterparts. So a direct comparison to current SOTA makes only sense if all networks have roughly the same capacity and got updated the same number of times. Also, it is crucial to report the performance in the form of mean+-standard deviation to better understand the significance of the realized improvement. An extended ablation study showing the effect of each single module would be nice to have in order to understand the potential of each component and the possible limitations.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although this paper relies on a heavily engineered framework, every single module is well motivated. The results are consistent and show substantial improvement over considered baselines. Minor missing details: standard deviation in the evaluation, depth and number of parameter for each model, prevent from accepting this paper since they are necessary for a better understanding of the potential and limitations of the proposed method
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper presents a Multi-level Relationship Capture Network (MRCN) for image-based skin lesion classification, which focuses on relationship mining from two levels: the region and the image levels. Specifically, it contains four blocks: a region-relation learning module, a cross-image module, a lesion discerning module and a consistency regularization module. The proposed MRCN achieves state-of-the-art performance on three ISIC benchmark datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea of mining both intra- and inter-image, region-based relationship for skin lesion recognition seems interesting.
- SOTA results in the active competition.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Description of the method is unclear and confusing, and worsened by bad English writing.
- The manuscript is not carefully prepared, e.g., (1) typo: “(e.g., “mechtnism”)”, (2) unpaired “ sign, etc.
- The innovation is quite incremental. Despite the idea of exploiting inter-region relationship, it is implemented as using a self-attention mechanism similar to the classical squeeze-and-excitation method. Also, the performance improvement upon competing methods might come from the complex engineering (e.g., sequential segmentation network followed by complicated classification networks).
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is unlikely to reproduce the code based on the manuscript. However, the authors promise to release the code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Introduction: “Most methods utilize a single image for the final recognition.” I understand training of the proposed method uses paired input. However, doesn’t it utilize paired images for testing, too? If not, then the proposed method is the same as “most methods”.
- The “intra-channel attention learning block”: with a pooling function and \varphi implemented as a conv layer and a ReLU layer, I believe this block totally relies on inter-channel computations. Why is it named “intra-channel attention”?
- Below Eq. (2), “where “*” means matrix multiplication”: inconsistent with Fig. 5, which indicates “Element-wise multiplication”.
- Eq. (3) is confusing. I understand d_i,j as a scalar, yet it also looks like an operator (function).
- Page 6 first paragraph: acronyms CL and RC used before definition.
- Please add spaces before parentheses, where applicable.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Unclear and confusing method description, reckless manuscript preparation, and incremental innovation.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

3
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper proposed a novel approach (Multi-level Relationship Capture Network) for skin lesion classification using dermoscopy images. Specifically, it explored both region level relationship within an image and image level information across images and designed the architecture to model such correlations. It evaluated on three public ISIC challenges and achieved SOTA results compared to the other methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The approach the author proposed is pretty novel. Papers in the past have explored either region level information using attention mechanism, or cross-image semantic relationship, but never both. The network proposed in the paper incorporated both, and worked really well on the three public challenges.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- More qualitative/quantitative results may be shown to ‘explain’ why the network is able to capture all the relationship, using different architecture components, compared to the rest.
- More discussions of the top ranking methods in the ISIC challenges and how they differ from this paper would be helpful.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors agreed to publish both training and evaluation code, and the datasets are all public, so reproducibility shouldn’t be an issue,
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See what’s mentioned in the weakness section.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I recommend this paper because of its novelty in the methodology, the SOTA results, and its good reproducibility.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposes a novel approach (Multi-level Relationship Capture Network) for skin lesion classification using dermoscopy images. The method explores both region level relationship within an image and image level information across images, and completes SOTA results compared to the other methods on three public ISIC challenges. In general, the proposed method is interesting, but English writing is not perfect.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

R1 Q1: No evidence that each of the modules is doing what it is supposed to do. On the last four lines of Sec. 3.3, we compare the effectiveness of each module and give the discussions. Besides, we also provide visualization results in the Supplementary Material to analyze the proposed modules.

Q2: Unreasonable to put the results of ISIC2019 in the Supplementary Material. Since the results of ISIC2019 are visible and active on the platform, and due to the limited page, we put the results in the Supplementary Material.

Q3: Missing information: the number of parameters, the standard deviation of the evaluation results. To the best of our knowledge, the requested information of previous works is not reported, but we will add this information of our work in the final version.

R2 Q1: Pair images for testing? As described in L3-4 in P7, only a single image is used for testing.

Q2: Typos: Eq. (2) inconsistent with Fig.5; CL/RC used before definition; add spaces before parentheses. We will fix these typos in the final version.

R3: Q1: More qualitative/quantitative results and discussions of the top-ranking methods. We will add more results and discussions in the final version.

back to top

Multi-level Relationship Capture Network for Automated Skin Lesion Recognition