Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Pak-Hei Yeung, Ana I. L. Namburete, Weidi Xie

Abstract

The objective of this work is to segment any arbitrary structures of interest (SOI) in 3D volumes by only annotating a single slice, (i.e. semi-automatic 3D segmentation). We show that high accuracy can be achieved by simply propagating the 2D slice segmentation with an affinity matrix between consecutive slices, which can be learnt in a self-supervised manner, namely slice reconstruction. Specifically, we compare our proposed framework, termed as Sli2Vol, with supervised approaches and two other unsupervised/ self-supervised slice registration approaches, on 8 public datasets (both CT and MRI scans), spanning 9 different SOIs. Without any parameter-tuning, the same model achieves superior performance with Dice scores (0-100 scale) of over 80 for most of the benchmarks, including the ones that are unseen during training. Our results show generalizability of the proposed approach across data from different machines and with different SOIs: a major use case of semi-automatic segmentation methods where fully supervised approaches would normally struggle.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_7

SharedIt: https://rdcu.be/cyl1x

Link to the code repository

https://github.com/pakheiyeung/Sli2Vol

Link to the dataset(s)

http://medicaldecathlon.com/

https://sliver07.grand-challenge.org/

https://chaos.grand-challenge.org/

https://www.cancerimagingarchive.net/

https://www.ircad.fr/research/3dircadb/

https://www.kaggle.com/c/second-annual-data-science-bowl


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a model for segmenting 3D volume based on sparse manual annotations and template matching. The model is evaluated against baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Reasonable idea. It is intuitive and under the assumption that the volume has strict boundaries that are distinct in their texture, the model works.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The technical novelty is weak. The presentation can be improved. See details below.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The model and results are reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The overall technical novelty in the paper is weak. The crux of this proposal is similar to a variety of template matching, contour matching, and related co-segmentation methods proposed and well studies in medical imaging. To that end, it is important to showcase the efficacy and impact of this work via evaluations and presentation. Some points to address.

    Please do not include tables that are sideways. This is a technical paper and the authors need to follow the standards and structure. The authors need to find a way to report the results in top-down tables!! It is critical to add visual examples of the dataset and evaluations to get a sense of the low-level and visual textural summaries of the different organs. What is the simplest way to interpret a 1-pt change in the dice coefficient? During the inference stage does it matter which slice the initialization user annotation needs to be present? i.e., how does this first slice for annotation chosen?
    The correspondance flow network is not really a deep network isnt it, or am I missing something here? The flow network essentially captures local pixel neighbourhood wise similarity between two consecutive slices? So essentially its a template matching filter between two slices?

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This work presents a semi-automatic 3D segmentation (termed as Sli2Vol) with the ability to annotate any arbitrary 3D structure from a single annotated slice. Sli2Vol is trained with self-supervised learning to produce affinity matrices between consecutive slices which are used for propagating 2D slice segmentation. They compared their proposed framework with supervised, unsupervised/self-supervised on 8 public datasets (both MRI and CT). They achieved Dice scores for most of the benchmarks above 80/100 without parameter tuning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • They developed a general semi-automatic segmentation approach that can segment and analyze any SOIs using one annotated slice without prior parameter tuning.

    • A major strength of the paper is that they validated their method on large-scale and diverse datasets using both MRI and CT scans.

    • The addition of verification module is interesting.

    • It’s great to see that the author intends to make their methods available to the research community.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The methodological contributions/models are not very novel per say but I don’t think this detracts much from the overall importance of the work in this case.

    • I believe that the methods the authors compared against (VoxelMorph & Optical Flow) were not specifically designed for semi-supervised tasks and hence their performance is not a true benchmark for comparison.

    • Fully supervised model results (the baseline benchmark to compare against) are not reported for all the tasks.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    ● The mathematical setting, model and algorithms are described clearly (Section2, Fig1 and Fig2) ● The model and algorithm assumptions are defined clearly ● The details about the training/testing datasets are provided, they used public datasets ● Different experiments are well categorized ● Experimental settings are defined clearly with details in both tables (tables 2 and 3)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Developing a general self-supervised segmentation approach is highly interesting/timely due to the limitations of fully supervised approaches including requiring large, annotated dataset and their sensitivities to domain shift. The reviewer appreciated the development and evaluation of this method.

    For Fully supervised-single slice models, the models are trained on 20 annotated slices and then used to segment the whole volume. This approach utilizes the same amount of manual annotation as Sli2Vol, so it is used as a baseline comparison. Based on the results provided in table1 for Sli2Vol (row i), a large boost in performance occured when the verification module was applied, which is a post-processing approach to improve/refine the initial segmentation. Hence, it is not a fair comparison to compare a fully supervised-single slice model to a self-supervised approach in which post-processing is applied.

    It would be great if computation time (training and testing time) will be provided and compared against all fully supervised and self-supervised approaches.

    For future analyses, it would be great to evaluate the model performance on a multi-class segmentation like tumours (Brats dataset). This dataset is a Brain MRI dataset with multi-class brain lesions.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well-written, organized and validated against different approaches on large-scale and diverse datasets. It is also an interesting contribution and new general approach to solve and improve segmentation in a self-supervised manner. In addition, it has a potential clinical impact since it can provide users with more flexibility to segment and analyze different SOIs using a semi-automatic segmentation tool without any prior parameter tuning or a large, annotated dataset.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work presents an interesting method for segmentation of multiple organs using a unique model which is trained using scribbles annotations from a single slice. While the idea is very interesting, there are several aspects that have been raised by the reviewers and which need to be clarified. This include:

    • The baseline benchmark to compare against is not always reported. Please discuss.
    • How is the slice to be annotated chosen? Please discuss
    • Results of VoxelMorph & Optical Flow are not directly comparable. These should be removed or better justify their inclusion.
    • Visual examples should be included.

    Furthermore:

    • The authors claim to be the first ones to test a segmentation model in multiple organs. This is inaccurate. The medical Segmentation decathlon (http://medicaldecathlon.com/index.html) has been addressing this problem. See [1]. Please position the work with respect to it.
    • The idea of propagating the segmentation has been first explored by [2] for placenta segmentation. Please position the contributions with respect to this work.

    [1] Simpson, A.L. et al.: A large an-notated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019) [2] Wang G. et al. (2015) Slic-Seg: Slice-by-Slice Segmentation Propagation of the Placenta in Fetal MRI Using One-Plane Scribbles and Online Learning. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    11




Author Feedback

We thank all the reviewers for their constructive comments and suggestions.

  1. Novelty of our proposed approach (R1, R2 & MR) The proposed method falls within the category of self-supervised learning, with the goal of efficient volume segmentation by only querying user annotation for a single slice. The idea originates from learning pixel-wise correspondences between consecutive slices through reconstruction. We introduce the edge profile and verification modules (considered by R2 as a strength), through extensive ablation studies and show their effects (Table 1, rows g-j), which surpasses strong baselines.

In order to demonstrate generalizability, we conducted comprehensive experiments. To our knowledge, this is the first work that shows a single model, trained purely from self-supervised learning, applicable across diverse datasets of 9 different structures from 2 modalities (Table 2 in Supplementary Materials), which implies strong clinical potential (as pointed out by R2). Note that our work is fundamentally different from the medicalSegmentation decathlon (i.e. about the collection of a large scale diverse dataset) or Slic-Seg, which is only tested on a single modality (i.e. MRI placenta) and requires an individual online-trained model for each testing volume. Nevertheless, we thank the MR for suggesting these two related works and will update the references accordingly.

  1. Response to R1 We respectfully disagree with R1 on most of the comments, for example, R1 mentions, “this proposal is similar to… template matching…” and “The correspondence flow network is not really a deep network…”. As stated in Section 2.2, “the idea is to task a deep network for slice re-construction…”, our proposed approach is in fact based on deep networks trained end-to-end with self-supervised learning. Therefore, we believe R1’s criticism on novelty is mainly derived from fundamental misinterpretations of our paper.

Dice coefficient is the standard metric used in segmentation for decades, which, we assume, is well understood by the community. However, to avoid confusion, we would like to add more detailed visual examples (in addition to the ones already included in Fig. 3 of Supplementary Materials), and reformat Table 1, if space allows.

  1. Baselines (R2 & MR) (1) Both VoxelMorph and Optical Flow share the same spirit as our model by relying on pixel-wise correspondences between consecutive slices. In addition, we compare the fully-supervised approach against ours, trained with the same amount of annotated data for a fair comparison. Our ablation studies manifest (i) Sli2Vol w/o post-processing (Table 1, row g), which ensures fair comparison, already surpasses the baselines, and (ii) introduction of the novel edge profile and verification modules further boosts the performance.

(2) In contrast to the fully-supervised approach, our proposed Sli2Vol requires “zero training” at inference time, even on unseen structures. This is discussed in Section 4.2: “Sli2Vol… is agnostic to SOIs and domains.”

(3) We only include the results for fully supervised methods in the literature as a reference of the approximated upper bound for performance, as well as its limitation in generalizability (Section 3.2, 2nd paragraph). It is not meant to be used as a comparison to our proposed approach, but we will include results for 3DIRCADb-Liver (and potentially more if found in the literature) in the camera-ready version.

  1. Initial slice for annotation (R1 & MR) In Section 3.2 (last paragraph), we have explained how to select the first slice from a practical perspective, “… pick one of the ±3 slices around the slice with the largest ground-truth annotation as the initial mask. This simulates the process of a user sliding through the whole volume…”

  2. Computation time (R2) We thank R2 for the suggestion and have benchmarked the time during inference. Our model can process 2.27 slices per second and we will include that in the camera-ready paper.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal fails to address most of the points raised during the reviews and the authors mostly re-state what is written in the paper. The following points remain unaddressed: 1) Some relevant related works had been pointed out. No comments on this were provided. 2) R1 asked to clarify if the method was a template matching strategy. The authors say to disagree with the claim but do not provide any elements to show that the reviewer’s claim is inaccurate. 3) R1 asked to discuss how can a change of 1-pt in the Dice can be interpreted. The authors focus on justifying the choice of the Dice as a metric, which had never been put into question, and fail to answer the raised point. 4) The selected benchmark had been questioned as it cannot be directly compared with the proposed method. The authors simply state that VoxelMorph and Optical Flow are comparable but, as before, without providing any concrete evidence to support the claim.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    19



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is a highly interesting work that tackles efficient annotation from a new and refreshing angle. The authors use tracking-based approach to train a network to predict affinity matrices between slices, which allows for a propagation of one slice annotation to the entire volume. Along the way the authors introduce edge profile generator and a verification module to overcome some issues with the basic framework.

    I found the experiments comprehensive. While I do agree with R1 that comparisons to VoxelMorph/Optical Flow may not seem sufficient, I don’t believe there are many other baselines designed for the authors’ problem definition. I would suggest they also try DEEDS (slow but quite accurate) and newer deep learning extensions to VM, but for MICCAI I believe the current experiments are more than sufficient.

    The authors’ rebuttal was quite convincing. However, I shared reviewer confusion as to how to pick the slice in inference. The authors state that this was explained in the last paragraph of 3.2, but the paragraph in question is referring to the VM and Optical Flow approaches. This needs to be made clearer as it’s an essential question. A sensitivity analysis on the choice of inference slice (e.g., run a repeatability analysis on the same volume) is vital, but for the purposes of MICCAI I don’t think it’s necessary provided that they make it much more clear how each inference slice is picked.

    A very good read and I hope to see it accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper only has two reviews with strongly diverging scores. While both reviewers see merit, they also criticise a lack of technical novelty. The authors address this in the rebuttal. While the technical similarity to prior work can’t be denied, the use of self-supervised learning for this particular task seems novel and, in my opinion, is an interesting approach.

    The “probable reject” score of R1 seems to be mostly based on a number of perceived shortcomings, which unfortunately are vague and not fully supported in the review (e.g. “the crux of this proposal is similar to … template matching, contour matching, and related co-segmentation methods” – how similar, and which methods?). It is difficult to rebut such claims without having been given specifics, however, I believe the authors have sufficiently addressed R1’s points and I recommend following the recommendation of R2, “accept”.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



back to top