Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Bhavani Sambaturu, Ashutosh Gupta, C.V. Jawahar, Chetan Arora

Abstract

Semantic segmentation of medical images is an essential first step in computer-aided diagnosis systems for many applications. However, given many disparate imaging modalities and inherent variations in the patient data, it is difficult to consistently achieve high accuracy using modern deep neural networks (DNNs). This has led researchers to propose interactive image segmentation techniques where a medical expert can interactively correct the output of a DNN to the desired accuracy. However, these techniques often need separate training data with the associated human interactions, and do not generalize to various diseases, and types of medical images. In this paper, we suggest a novel conditional inference technique for DNNs which takes the intervention by a medical expert as test time constraints and performs inference conditioned upon these constraints. Our technique is generic can be used for medical images from any modality. Unlike other methods, our approach can correct multiple structures simultaneously and add structures missed at initial segmentation. We report an improvement of 13.3, 12.5, 17.8, 10.2, and 12.4 times in user annotation time than full human annotation for the nucleus, multiple cells, liver and tumor, organ, and brain segmentation respectively. We report a time saving of 2.8, 3.0, 1.9, 4.4, and 8.6 fold compared to other interactive segmentation techniques. Our method can be useful to clinicians for diagnosis and post-surgical follow-up with minimal intervention from the medical expert. The source-code and the detailed results are available here [1].

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_58

SharedIt: https://rdcu.be/cyl29

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper
- Proposing a new segmentation workflow using test-time model refinement for image segmentation.
- No joint training with ground-truth labels and user corrections is required.
- The proposed method can be generalized to various medical image modalities.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors conducted extensive performance evaluation by comparing with various conventional and recent segmentation methods using multiple publicly available datasets and various image quality metrics (segmentation accuracy) as well as user interaction metrics (interaction time) for evaluating interactivity, usability, and accuracy of the method.
- No assumption of specific target organs of interest is required. The performance degradation of the pretrained network due to the training set and testing set distribution mismatches or missing labels during training can be handled well using interactive user input.
- Multiple objects can be segmented at the same time.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Technical novelty seems weak. A similar idea has been introduced in the previous work.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The paper has simple description of their framework.
- The optimization function and the workflow of the algorithm are well written.
- Hardware specification is well listed.
- There is no detailed description about the publicly available state-of-the-art DNN model used in the experiment.
- They have used dataset which is publicly available and commonly used.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
This paper proposes an interesting interactive image segmentation method for medical images. The main idea of this paper is to perform a fine-tune of the pre-trained CNN model using user scribbles at the inference time. Even though the proposed idea seems to work well for interactively editing the medical image segmentation, it is unclear how this method is intrinsically different from Wang et al. [24] except for minor technical differences (neural net models, numerical methods, etc). If possible, please clarify the novelty of the proposed idea during the rebuttal period.

User interaction is not clearly explained. The authors mentioned the capability of the method to accept point, box, and scribble as feedback (Table 1). However, the paper focuses only on scribbles only in the main text. The ablation study on user input type in the supplementary page does not fully explain information about the mentioned capability except scribble is the fastest. The authors used user interaction time as one of the metrics to measure the usability of the system, but it is also not clear unless more detailed analysis is given (e.g., how many user clicks or mouse movement is required. Even though time taken is shorter, one can draw many scribbles (or vice versa)).

The compared state-of-the-art interactive segmentation methods based on deep neural networks use different types of user input. It would be better to describe how user correction was provided for each method.

Recording dice coefficient improvements after every user interaction would be more helpful to assess the accuracy of the proposed method compared to the other methods in Table 2.

Misc:
- In abstract, “Our technique is generic can be used…” -> “Our technique is generic to be used…”
- Page 8, the reference to table 3 is switched.
- Table 3 left, it would be better visualized in form of a plot as same as plots in the supplementary page.
- For the plots in supplementary page, the maximum visible scale for dice coefficient would likely be 1.0 as it is the maximum value possible. Used value higher than 1.0 shrinks the plot making it harder to notice the details.
- Scribble region growing method seems to assume grayscale one channel input images as the intensity difference can be calculated by pixel intensity. If the intensity difference is further formulated for RGB 3 channel images, the proposed method can be applied for color images as well.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The proposed algorithm is simple and intuitive thus is generally applicable.
- The experiments involve different modalities and multiple datasets within each modality.
- However, the experiments seem insufficient in diversity of the evaluation.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

The authors proposed a method for improving interactive annotation of segmented regions in medical imaging applications using a pre-trained deep neural network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Enhance the annotation speed for medical image segmentation
- The idea of segmentation enhancement is not new but interesting. The authors implemented a new technique for increasing the time required for segmentation annotation.
- Optimize the benefiting user’s annotation in segmentation improvement using a restricted optimization.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Weak representation of method and results. It is so difficult to follow the manuscript technically. A huge portion of the paper is about explaining the problem and itemizing methods (till the middle of page 4). In general, the algorithm is complicated for non-expert readers.
- The manuscript is more about a restricted optimization method than using Deep learning for enhancing data segmentation.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It seems that users used a software for medical experts, and it was better to provide that for readers to evaluate and test the idea.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The idea of paper is so interesting to improve segmentation results by an interactive feedback after segmentation process. I want to say that I think your idea of restricting a neural network to reiterate and optimize its output is a great idea. However, your manuscript is so difficult to follow with so weak explanation. First you explained so many concepts but unfinished. For example, DNN and neural network. I couldn’t understand what was the detailed structure of your network. Howmany layers? filter size and so on? Second, lots of databases and experiments with no qualitative results. What was the idea of using Table 1? Another important thing is that although time is so vital in clinical applications, but segmentation accuracy is of paramount importance. You just focused to enhance time while the accuracy was not really considered. This has a controversy with the method. Your proposed method focused on restricting neural network to provide better segmented results from annotations. However, reported results focused on time saving!!
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the representation of the manuscript is so weak, I think the idea is so useful for MICCAI community. I give acceptance to this manuscript only after a major revision by better explanation of the method (simpler structure), more qualitative results, with more detailed technique explanation.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper presents a method for correcting segmentation mistakes, integrated within a DNN framework. The method is well-designed and has general appeal and applications for segmentation problems.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The technical details provided indicate very good contribution and the evaluation is valuable.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While evaluation shows a comparison of time taken for user interaction, it would be nice to show an evaluation of the segmentation accuracy with/without user interaction and its comparison with similar methods. While the formulation and results show segmentation correction through scribble for under-segmentation, it hasn’t been discssed whether the method is suitable to correct for over-segmentation. this should be discussed in the paper
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There are sufficient details to reproduce the work
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
This paper presents a method for correcting segmentation mistakes, integrated within a DNN framework. The method is well-designed and has general appeal and applications for segmentation problems. The technical details provided indicate very good contribution and the evaluation is valuable. While evaluation shows a comparison of time taken for user interaction, it would be nice to show an evaluation of the segmentation accuracy with/without user interaction and its comparison with similar methods. While the formulation and results show segmentation correction through scribble for under-segmentation, it hasn’t been discssed whether the method is suitable to correct for over-segmentation. this should be discussed in the paper

Minor remarks:
- may need proof correction (e.g. 1st sentence of section 1 - ‘Image segmentation is an important imaging processing techniques..’)
- the manuscript’s premise and statement in abstract is not substantiated and a broad statement ‘modern DNNs have generally shown unsatisfactory performance for clinical use’ and is focused entirely on DNN methods. there are several areas where DNN based segmentation is satisfactorial used (e.g. digital pathology), thus this statement/premise need toning down
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The technical details provided indicate very good contribution and the evaluation is valuable.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work presents a generic framework for segmentation correction through interaction. The paper is clearly written and the proposed idea of restricting the network is novel and interesting. The evaluation of the method focuses in reporting the user interaction and machine time but, it does not report accuracy. While this is not the goal of the paper, it is important to report it as a way to show that despite the reduced interaction time, the system remains performant. The paper could be greatly improved by adding this information. Therefore, the authors are recommended to report accuracy of their experiments.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

We thank the program chairs, area chairs and the reviewers for provisionally accepting our paper and considering it as belonging to the top 13% of the papers. The reviews indicate several minor areas for clarification, such as the comparison of improvement in segmentation accuracy (R1, R2, R3, meta-reviewer) and to highlight the differences between our method and that of Wang et al. [24] (R1). We will be happy to incorporate these changes into the camera-ready version.

Segmentation Accuracy (R1, R2, R3 and meta reviewer) - Our method obtains the best segmentation accuracy (0.95-0.99) while saving the highest amount of user time (12.4 - 17.8 x) for various image modalities compared to other segmentation methods. In the submitted paper, we had mainly presented the time required to obtain a segmentation accuracy (dice coefficient) of 0.95 compared to other methods (Tables 2 and 5). However, in Table 3a, we had presented results for obtaining dice coefficient improvement with every user interaction. We agree with reviewers on more evaluation for segmentation accuracy, given its importance in the medical scenario. We will add more results to show the improvement in dice coefficient by our method compared to others.

Comparison of our paper with Wang et al [24] (R1) - Our method is significantly different from Wang et al. in following ways: (a) Deep Neural Network - Wang et al. use their own custom neural networks for interactive segmentation. However, our method can use pre-existing segmentation networks. We have demonstrated the same using Hovernet, Autofocus and U-Net in our experiments. This would allow our method to use new segmentation architectures which may be proposed in the future as well. (b) Optimization - Wang et al. use CRF based regularization for label correction. We propose a novel restricted Lagrangian based formulation. This enables us to do a sample specific fine-tuning of the network, and allows our method to do multiple label corrections in a single iteration. This is a novel capability. (c) User Inputs - Wang et al. use scribbles and bounding boxes as user inputs. Our method can carry out label correction irrespective of the type of user input provided, which is unique.

Neural Network Details (R2) - We were unable to provide additional details about the neural networks due to space constraints. We will create an extended version on arXiv where we will add all the details.

Software (R1 and R2) - We used LabelMe for 2D images and 3D slicer for 3D images in our method (also see Section 4 in the paper).

User Interaction (R1) - Our method can work with point, box and scribble inputs. We first performed experiments to determine the most suitable user input type for segmentation correction. We found that scribbles required the least number of user interactions (30% lesser mouse clicks), as well as user and machine time. Hence, remaining experiments were done with scribbles only. Our primary aim was to highlight the performance of our method in the main paper for various datasets. To provide a justification of using scribbles, we have described the experiment comparing various types of inputs in the supplementary. We will add the number of user interactions as an additional column in Table 4.

Oversegmentation (R3) – We thank the reviewer for raising this question. Fixing over-segmentation can be accomplished by providing user inputs where the predicted label does not match the ground truth. We would like to confirm that our existing method can also correct over-segmentation.

Insufficient diversity in evaluation (R1) - We have shown a variety of experiments on 2D Microscopy, CT and MRI images within the space constraints. We will include more results in the extended version on arXiv.

In summary, we will incorporate necessary changes to clarify accuracy, and user interaction using our method. We will also be happy to do language corrections and improve the readability as pointed out by R2.

back to top

Efficient and Generic Interactive Segmentation Framework to Correct Mispredictions during Clinical Evaluation of Medical Images