Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Tan Nguyen, Binh-Son Hua, Ngan Le

Abstract

Medical image segmentation has been so far achieving promising results with Convolutional Neural Networks (CNNs). However, it is arguable that in traditional CNNs, its pooling layer tends to discard important information such as positions. Moreover, CNNs are sensitive to rotation and affine transformation. Capsule network is a data-efficient network design proposed to overcome such limitations by replacing pooling layers with dynamic routing and convolutional strides, which aims to preserve the part-whole relationships. Capsule network has shown a great performance in image recognition and natural language processing, but applications for medical image segmentation, particularly volumetric image segmentation, has been limited. In this work, we propose 3D-UCaps, a 3D voxel-based Capsule network for medical volumetric image segmentation.We build the concept of capsules into a CNN by designing a network with two pathways: the first pathway is encoded by 3D Capsule blocks,whereas the second pathway is decoded by 3D CNNs blocks. 3D-UCaps,therefore inherits the merits from both Capsule network to preserve the spatial relationship and CNNs to learn visual representation. We conducted experiments on various datasets to demonstrate the robustness of 3D-UCaps including iSeg-2017, LUNA16, Hippocampus, and Cardiac,where our method outperforms previous Capsule networks and 3D-UNets. Our code is available at https://github.com/VinAIResearch/3D-UCaps.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_52

SharedIt: https://rdcu.be/cyhMv

Link to the code repository

https://github.com/VinAIResearch/3D-UCaps

Link to the dataset(s)

https://iseg2017.web.unc.edu/download/

https://luna16.grand-challenge.org/Download/

http://medicaldecathlon.com/

Reviews

Review #1

Please describe the contribution of the paper

The authors present an investigation of how to exploit the capsules network in medical image segmentation tasks. Compared to previous SegCaps, the proposed 3D-UCaps make 3D volumetric segmentation possible and enjoy the merits of both capsule blocks (rotation and translation invariant) and CNN (feature extraction). With only 17 layers, the results outperform previous leading approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The exploration of how to effectively use the capsules network in the medical image segmentation task itself stands for its novelty.
- The work presents a thorough and extensive experimental section using iSeg, LUNA16, Hippocampus, and Cardiac datasets for evaluation.
- The current method presents a design of more capsule types in the lower-level layers, which contrasts with the SegCaps design. The resulting network obtains better results when compared to the previous work.
- Paper is clear and well written. I enjoy reading this paper.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I couldn’t find major weaknesses.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors listed “yes” for both code and pre-trained models. In this case, it can be an easy task for both training and testing. If the reproduction was only based on the descriptions in the paper, it could be somewhat difficult.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Minor concerns:
- It would be nice if the authors could show some examples for demonstration.
- I have some concerns about the empirical finding, i.e., the expanding path has negligible effects. It would be great if the authors can further discuss this finding.
- I know the optimization could be an obstacle, but I am still interested to see if the proposed network can handle multi-class segmentation, e.g., the 2019 head and neck MICCAI challenge.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The investigation of how to exploit the capsule networks in medical image segmentation stands out as its novelty. The paper also has some unique designs and fruitful findings, which are neat and in principle. Thus, I would recommend this paper to be accepted.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper investigates a 3D voxel-based capsule network where 3D capsule blocks in the encoder branch and 3D deconv blocks in the decoder branch for medical image segmentation. It uses publicly available datasets and shows the robustness of the proposed network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper uses capsule networks to tackle some of issues with CNN due to rotation and affine transformation and pooling. The paper includes the contextual information along the temporal axis. The proposed methodology is an extension to the volumetric data with additional improvements and it can be considered as novel.

Some design selections are explained well with their reasons in the paper, which is helpful.

Extensive experiments are done in the paper and comparisons as well as discussions are provided. Rotation variance and motion artifacts are studied. Considerable performance improvements are shown.

For a meaningful comparison, the paper also implements a 3D-SegCaps which is the 3D extenstion of the existing SegCaps.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

One major weakness is that the paper does not provide any results for 3D SegCaps for the LUNA16, Hippocampus, cardiac datasets even though the authors have already implemented the network. What is the reasoning behind this?

Also, why does the paper compare 3D SkipDense instead of 3D U-Net which is more popular in the literature when considering rotation invariance and motion artifacts?

The motion artifacts created in the paper may be not realistic and it does not cover whole range of motion artifacts that could occur in clinical scenario. If the authors could address this point, that would help.

Lastly, when the the rotation variance is created for the testing dataset, was this random? Also, what would happen if the rotation was in x-axis or y-axis or both x- and y- axes or all axes? How is the performance affected if those rotations happen?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides the code according to the reproducibility checklist and the datasets are publicly available so it should not be difficult to reproduce the results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I think overall the paper is presented well however, as a reviewer and a reader, it would help if the authors answers the questions above related to the reasoning of the missing results of 3D SegCaps for other datasets, selection of 3D SkipDense instead of 3D U-Net, rotation invariance and motion artifacts.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel metholodogy, extensive experiments and well explained design choices.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Based on the reviewers’ comments, we think this is a high-quality paper with significant technical contributions. Authors should address reviewers’ comments in the camera-ready submission.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

We thank the reviewers for their feedback. We are glad to receive positive comments that value the contribution of our work. We clarify the key points and then address specific concerns individually.

MAJOR CONCERNS

KP1. Reproducibility and expandability

We will release both source code and pre-trained models. Our code is implemented with Pytorch Lightning with MONAI framework for processing data, running sliding window inference and computing dice score with fixed seed. We hope that by having up-to-date frameworks, the community can reproduce our results and extend our method to other problems.

KP2. Motion artifacts

Motion artifact is a complex problem with many partial solutions as given in Zaitsev et al., Motion artifacts in MRI: A complex problem with many partial solutions, Journal of Magnetic Resonance Imaging, 2015. In this work, we simplify the motion artifacts caused by patient movement during scanning by rotating 20\% number of slides with angle randomly chosen from (-5, 5) degrees around x/y/z-axis. More realistic experiments with various k-space motion artifacts argumentation will be examined with TorchIO in our future study.

KP3. “Performance of 3D SegCaps on the LUNA16, Hippocampus, cardiac datasets”

Our best trial for 3D SegCaps produced result no better than [24], and so we have shifted the focus to our proposed method. The possible answer for this problem can be seen in the third concern of Reviewer 1. As per request, we will provide additional segmentation performance of 3D SegCaps to get a full comparison.

INDIVIDUAL CONCERNS

Reviewer 1

Reproducibility: Please refer KP1.

Demonstration: Qualitative results will be made as a supplementary and will be added to the code release.

The empirical finding, i.e., the expanding path has negligible effects” Further investigation is needed to understand the effects of capsules in the expanding path. In our hypothesis, routing algorithm works based on routing-by-agreement between part to whole relationship, however the relationship between whole to part is one-to-many, and it is ambiguous to find the agreement between whole to active corresponding parts.

Multi-class segmentation: We have conducted the experimental results on multi-class segmentation with 3D UCaps as given in Tables 1, 2, 3 on iSeg dataset with 3 classes i.e. WT, GM, CSF.

Reviewer 3

“3D SkipDense instead of 3D U-Net which is more popular in the literature when considering rotation invariance and motion artifacts?”

We are motivated by the iSeg2017 challenge [26] in which all participating methods performed poorly on the testing subjects acquired with motion artifacts and unusual scan poses. We choose to report 3D SkipDense as this method is proposed in 2019 and has the state-of-the-art performance. Their code and pretrained model are also publicly available.

“When the the rotation variance is created for the testing dataset, was this random? Also, what would happen if the rotation was in x-axis or y-axis or both x- and y- axes or all axes? How is the performance affected if those rotations happen?”

We trained our network with provided data in the dataset without any rotation data augmentation. During testing, we choose an axis to rotate the volume, and apply the rotation with angle values fixed to 5, 10, 15, .., 90 degrees. In our experiment, the performance tends to drop slightly when the rotation angles increases. We also found that rotating about a single axis or all axes does not have significant performance difference, which explains the capability to handle rotation invariance in a capsule network.

back to top

3D-UCaps: 3D Capsules Unet for Volumetric Image Segmentation