Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Youbao Tang, Jinzheng Cai, Ke Yan, Lingyun Huang, Guotong Xie, Jing Xiao, Jingjing Lu, Gigin Lin, Le Lu

Abstract

Accurately segmenting a variety of clinically significant lesions from whole body computed tomography (CT) scans is a critical task on precision oncology imaging, denoted as universal lesion segmentation (ULS). Manual annotation is the current clinical practice, being highly time-consuming and inconsistent on tumor’s longitudinal assessment. Effectively training an automatic segmentation model is desirable but relies heavily on a large number of pixel-wise labelled data. Existing weakly-supervised segmentation approaches often struggle with regions nearby the lesion boundaries. In this paper, we present a novel weakly-supervised universal lesion segmentation method by building an attention enhanced model based on the High-Resolution Network (HRNet), named AHRNet, and propose a regional level set (RLS) loss for optimizing lesion boundary delineation. AHRNet provides advanced high-resolution deep image features by involving a decoder, dual-attention and scale attention mechanisms, which are crucial to performing accurate lesion segmentation. RLS can optimize the model reliably and effectively in a weakly-supervised fashion, forcing the segmentation close to lesion boundary. Extensive experimental results demonstrate that our method achieves the best performance on the publicly large-scale DeepLesion dataset and a hold-out test set.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_48

SharedIt: https://rdcu.be/cyl2Z

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a deep learning-based approach that is applied to segmentation of tumors or lesions in CT images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

They propose a unique architecture from previously proposed blocks and a loss function that includes regional level set terms as proposed recently with a modification to enable weakly supervised learning. This is an interested method with broad applicability, especially where accurately labelled data is difficult to come by. The clarity and quality of writing is strong. The method outperforms state of the art for automatic tumor measurement, and the ablation study well-demonstrates the utility of each proposed contribution.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The validation study is strong but could have been more comprehensive (see below)
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It appears reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The training dataset used is relatively large, and it would have enhanced the work to evaluate how sensitive the method is to the size of the training dataset. If the weak supervision method performs well even on smaller datasets, it would show that it is even more widely applicable.

The intersection over union loss should be referenced/defined.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is an interested method with the local use of regional level set terms in the loss function. I think it has broad applicability, especially where accurately labelled data is difficult to come by. This interesting work should be presented at MICCAI.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper proposed a weakly-supervised universal lesion segmentation method for more precise tumor boundary tumor delineation. The authors first propose an attention-enhanced high-resolution network named AHRNet, which further incorporated a decoder (DE) and an attention mechanism containing both dual attention (DA) and scale attention (SA). After that, a region-level set loss (RLS) is proposed for optimizing lesion boundary delineation reliably and effectively in a weakly supervised fashion. The experimental results on the publicly large-scale DeepLesion dataset and a hold-out test dataset demonstrate the effectiveness of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

– The authors constructed an attention-based HRNet which renders rich, high-resolution representations with strong position sensitivity by augmenting with a decoder, dual attention, and scale attention mechanisms. This model can perform more accurate lesion segmentation compared with UNet. – The proposed RLS loss is reformulated to adapt to lesion segmentation by some special modifications. This loss can force the segmentation results to be as close as possible to the actual lesion boundary. – The segmentation results of AHRNet achieved the best performance on the DeepLesion dataset and a hold-out test dataset in both quantitative and qualitative analyses.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

– Some detailed descriptions of some modules, formulas, and figures are missing. – The diagram of the framework and related elaboration is not very clear. For instance, where is the decoder? – For some conclusions in the paper, there is no corresponding experiment to support their claims.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The implementation details of this paper are detailed, so it is not hard to reproduce them.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

– There is no description of the dual attention mechanism. Some simple detailed descriptions should be added. – Some details are missing in Fig.2. E.g., the dimensions of input and output feature maps are not indicated, making it difficult for readers to understand the full-scale attention (SA) process. This figure is also confusing whether the SA acts on the feature maps of each scale separately or all scales together. Moreover, there is no ablation study for that SA can better address lesion segmentation under different scales. – The RLS loss is modified from the classic level set energy function based on Eq.1 by removing the Length(ϕ) and Area(ϕ) to adapt to the lesion of different sizes. However, there is no corresponding comparative experiment in the following part, which makes me question its effectiveness. – The constrained region I^’ is lesion-adaptive, which is four times the sizes of the pseudo mask g. The size of this region may significantly impact calculating the RLS loss, but it is not discussed in the paper.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

– The result of this paper outperforms previous methods. – There are good ablation experiments and necessary visualization results in this paper. – The model and RLS loss proposed in this paper are innovative.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper proposes a segmentation framework consisting of a neural network architecture, and novel loss which can be trained using weak supervision (bounding boxes) to segment lesions of a variety of types. Results beating the current SOTA are demonstrated on the deep lesion dataset, and a private dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper compares to relevant other studies, and uses ablation studies to identify the contributions of the components of their approach.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It would have been interesting to analyse if the methods rank consistently on an image by image basis, or if some methods work better on some images and worse on others. Cross validation, or at least multiple runs would have been valuable to ascertain variability. Reproducibility is limited. No code, details of hyperparameters not presented.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Unfortunately, the authors choose not to release their code. The test set collected from the hospital is not described in any detail - types of lesions, criteria for inclusion. No discussion of the ethics of collection of this dataset is present despite the reproducibility statement saying it is.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The wrong paper may be cited for HRNet in sec 2.1 - paper 11 is cited, but it seems you actually refer to paper 24 Fig 4 should not drop to zero near DICE 1, it is cumulative.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a novel approach to a challenging segmentation problem and achieves a new SOTA. Unfortunately the authors do not release their code or their new dataset.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The clarity and relevance of the proposed paper for the MICCAI community is underlined by all reviewers. The method appears novel, performs well and the experiments seem to be well designed and led. Please see reviewers comments for suggestions of further improvement
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We highly appreciate all the insightful and helpful comments from the reviewers (R1, R2 and R3) and the Area Chair. We would like to clarify the following issues: Q1: It would have enhanced the work to evaluate how sensitive the method is to the size of the training dataset. [R1] A1: This is a good suggestion! We will consider it in our future work. Q2: There is no description of the dual attention mechanism. [R2] A2: We have simply described dual attention in the paper due to the limited space. For more details, it could be better to refer its original paper. Q3: Some parts in Fig. 1 and Fig. 2 are not very clear. [R2] A3: Thanks for the comments. We will clarify them in next version. Q4: There is no ablation study for that SA can better address lesion segmentation under different scales. [R2] A4: We did an ablation study about with or without using SA module. From Table 2, we can see that using SA can improve the overall segmentation performance, meaning that it can further refine the multi-scale features for our task. We may consider to conduct the performance evaluation in terms of stratified lesion scales in the future. Q5: The RLS loss is modified from the classic level set energy function based on Eq.1 by removing the Length(ϕ) and Area(ϕ) to adapt to the lesion of different sizes. However, there is no corresponding comparative experiment in the following part, which makes me question its effectiveness. [R2] A5: We did use the loss without removing the first two terms that is sensitive to the object sizes for training and found that it harmed the training convergence and segmentation performance. We did describe these in the paper, because removing them is intuitive and understandable for our task of segmenting lesions with different sizes and a previous work [17] also removed them. Q6: The size of this region may significantly impact calculating the RLS loss. [R2] A6: Actually, we tested the model using different sizes (2/3/4/5 times the size of pseudo mask) of the constrained region for RLS loss computation. We found that (1) using 4 times provides the best performance and (2) the performance slightly changes when using 3/4/5 times. Q7: It would have been interesting to analyze if the methods rank consistently on an image by image basis. [R3] A7: For some specific cases, the proposed method did not produce the best segmentation results. But overall, it achieved the best performance for a large part of samples. Q8: Cross validation, or at least multiple runs would have been valuable to ascertain variability. [R3] A8: We trained the model until convergence multiple times and tested their results. From our observation, the conclusions are consistent for multiple runs.

back to top

Weakly-Supervised Universal Lesion Segmentation with Regional Level Set Loss