Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Hongyi Wang, Lanfen Lin, Hongjie Hu, Qingqing Chen, Yinhao Li, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, Ruofeng Tong

Abstract

3D medical image segmentation with high resolution is an important issue for accurate diagnosis. The main challenge for this task is its large computational cost and GPU memory restriction. Most of the existing 3D medical image segmentation methods are patch-based methods, which ignore the global context information for accurate segmentation and also reduce the efficiency of inference. To tackle this problem, we propose a patch-free 3D medical image segmentation method, which can realize high-resolution (HR) segmentation with low-resolution (LR) input. It contains a multi-task learning framework (Semantic Segmentation and Super-Resolution (SR)) and a Self-Supervised Guidance Module (SGM). SR is used as an auxiliary task for the main segmentation task to restore the HR details, while the SGM, which uses the original HR image patch as a guidance image, is designed to keep the high-frequency information for accurate segmentation. Besides, we also introduce a Task-Fusion Module (TFM) to exploit the inter connections between the segmentation and SR tasks. Since the SR task and TFM are only used in the training phase, they do not introduce extra computational costs when predicting. We conduct the experiments on two different datasets, and the experimental results show that our framework outperforms current patch-based methods as well as has a 4× higher speed when predicting.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_13

SharedIt: https://rdcu.be/cyhLF

Link to the code repository

https://github.com/Dootmaan/PFSeg

Link to the dataset(s)

https://www.med.upenn.edu/cbica/brats2020/

Reviews

Review #1

Please describe the contribution of the paper
- Improve the segmentation of medical images by exploiting global context using low resolution images as input and using a multi task approach that combines image super-resolution and semantic segmentation
- Addition of a self guidance module to better capture high frequency information to guide segmentation and a task fusion module to effectively combine the segmentation and super resolution tasks
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Addition of self supervised guidance modules to both segmentation and super resolution branches and task fusion module to combine the two tasks
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors state that the main objective is to use larger/global context to improve segmentation using patch free models but they use a patch size of 96 x 96 x 64 in their experiments.

The method is a minor improvement on [1]. However the increase in model size with the addition of self guidance modules is not provided.

Some of the claims are unwarranted as there is no data to support these claims. For e.x., 4 higher speed when predicting.

[1] Wang et.al., Dual super-resolution learning for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3774-3783 (2020)
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There is not enough information in the paper to reimplement / reproduce the proposed model
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The statement “if the network is trained with patches, it also have to use patches (such as sliding window strategy) in inference stage” is not true. Most segmentation models (Unet or FCN models) are fully convolutional models which can be trained on patches and inferred on the entire image.

The first point in the contribution “We propose a patch-free 3D medical image segmentation method, which can realize HR segmentation with LR input” is misleading as the authors use a patch of 96 x 96 x 64 in their experiments. No results are provided with the entire / larger 3D volumes.

How is the patch for self-supervised guidance cropped: randomly within the input patch in high resolution or is it the central patch?

Fig 3: Are the SGM modules in super resolution and segmentation branches shared similar to the shared encoder or are they separate?

Equation (1) seems to be the MSE loss for super resolution task in the ROI of the high resolution self-supervised guidance patch. If so, this is redundant with the MSE loss for SR in the overall objective function.

The term task fusion module is misleading as it adds a few terms to the overall objective function and is not a module that combines the features from the two tasks.

Spatial similarity loss in the task fusion module is equivalent to feature affinity loss in [1] and target enhanced loss is the new component added by the authors.

Was the inference done on patches with a sliding window strategy due to limited GPU memory? What was the batch size during training?

Both BRATS2020 and the liver dataset are relatively small datasets. Would suggest using cross validation splits for training.

For BRATS2020, comparison with Unet models that use a larger patch size (128 x 128 x 128) [2, 3] would be better suited here than models that use a smaller patch. Also models in [2, 3] achieved higher performance than SOTA models listed in Table 2. If authors were focussing on models evaluated on both BRATS2020 and liver datasets, would suggest splitting the table for the two datasets and provide the SOTA results for each task independently.

[2] Isensee et. al. No New-Net MICCAI 2018 [3] Henry et. al. Brain Tumor Segmentation with Self-ensembled, Deeply-Supervised 3D U-Net Neural Networks: A BraTS 2020 Challenge Solution. MICCAI 2020

The statement “Since our framework can directly output a complete segmentation mask at a time, it also has a faster inference speed than most of the other methods” is unclear. Was the inference done on the entire 192 x 192 x 128 volume? Please provide the inference times for the various models to support this statement.

Overall, there are a few novel components to the proposed model. However with the details provided in the paper it is hard to compare with current SOTA models: is there an improvement in performance with a similar model-size or is there an improvement in inference time.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Claims not supported by data/results provided
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

Author proposed a patch-free 3D image segmentation method by integrating with super-resolution guidance. This method is of interesting, and is important to clinical applications, which challenged by large 3D volumetric data.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A patch-free 3D medical image segmentation method, using low resolution input to generate high resolution segmentation result.
- A self-supervised guidance module is proposed too guide the segmentation and preserve high frequency information.
- A multi-task in employed to construct the network for information learning without hampering the inference efficiency.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The organization and writing need more improvements, as it is hard to follow and understand the method.
- Some descriptions are unclear, such as the way to extract the HR 3D patch.
- Results are limited without reporting std.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Seems the code will be published, but the statistical results are not sufficient without std.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- First of all, please include STD in the statistical result evaluation, otherwise, it is hard to see the results distribution.
- For me, the one thing I am caring about is how to extract patch from HR image, any requirement for position or randomly obtained? I believe different patch location would significantly influence the segmentation results, as it has different high frequency knowledge. However the paper just mention this without detailed discussion.
- I am curious about the performance of super-resolution branch, as the author state the method is a multi-task learning approach (just interesting about this).
Overall, this paper is interesting to readers and have some potentials to be improved in the future.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Method is interesting
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

7
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This is overall a well presented work with certain novelties. It presents an effective patch-free segmentation model to take down-sampled medical images as input and output high quality segmentations. Compared with patch-based approaches, the presented patch-free model learns useful global information and therefore produces less false segmentations. The logic of the presented method is sound.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This work is well motivated by the introduction section and Fig. 3.
2. The method is sound and of certain novelty. The workflow is novel to integrate the auxiliary super resolution task into the pipeline of segmentation. The objective functions, including self-supervised guidance loss, spatial similarity loss, target enhanced loss, BCE loss, Dice loss, MSE loss are clearly introduced.
3. The ablation study is convincing showing the proposed SR, TEL, SSL, and SGM components are effective.
4. Compared with many existing studies and the proposed method shows superior accuracy and relatively fast speed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
There is no obvious weakness of the paper. Only some minor comments,
1. How is the feature of HR 3D patch concatenated to the feature of LR 3D image? It is not intuitive to image as the patch and image represents different image regions. If the features are concatenated directly, how to deal with misalignment, semantic misalignment which means the features represent different image contents.
2. On page 6, “The cropped MRI image and its segmentation mask are used as ground truth of US and SR, while the input is the cropped image after down-sampling.” should be “… are used as ground truth of SR and US, …”
3. Errors in case 1 of Fig. 4 can be solved by connected component analysis. If time allows, some simple post-processing operations need to be applied after segmentation.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This article will provide its code and it uses public dataset. Therefore, it should be able to reproduce the reported main results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please see part 4.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Given part 2 and 3, I give a probably accept rating to this article.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presented a patch-free 3D medical image segmentation framework, which took down-sampled medical images as input and output high quality segmentations. Overall, it is well-written, and the logic in this method was sound. Although the novelty could be limited, but the techniques were applied usefully. The most important issues raised by the reviewers were about the experiments. The authors should provide details about the batch size, std values, the pre-processing, such as the patch extraction. I agree with Reviewer#1 that cross-validation should be applied for solid evaluation. There are other minor errors and missing information that should be revised and added. Detailed comments have been provided to the authors to help improve their work to be more solid.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

Thank you for your valuable comments. Our itemized responses to the questions are as follows:

Comment 1: The authors state that the main objective is to use larger/global context to improve segmentation using patch free models but they use a patch size of 96×96×64 in their experiments. (Reviewer #1) Response: We have to point out that reviewer #1 misunderstood our method. Our patch-free segmentation method does NOT use patches as input, but use a down-sampled entire 3D image as LR input (the LR image size is 96×96×64) due to limited GPU memory. The proposed segmentation method will directly generate a HR segmentation mask at a time by combining super-resolution and self-supervised guidance. In contrast, conventional patch-based methods usually first decompose a HR 3D image into several small patches (the patch size is also 96×96×64 in our experiment) and perform segmentation on these patches separately. The final result is combined by the segmentation results of all these patches, and the whole process is very time-consuming.

Comment 2: The increase in model size with the addition of self-guidance modules is not provided. (Reviewer #1) Response: The parameters of a single self-guidance module is about 0.4M, which accounts for 4.7% of the parameters of the main segmentation network.

Comment 3: Some of the claims are unwarranted as there is no data to support these claims. For e.x., 4× higher speed when predicting. (Reviewer #1) Response: Actually, there are experimental results supporting our claims. The last column of Table 2 provides the inference time of different models.

Comment 4: How is the patch for self-supervised guidance cropped: randomly within the input patch in high resolution or is it the central patch? (Reviewer #1, #2, Meta-Reviewer) Response: We did experiments on this issue. We compared random cropping and central area cropping. Experiments demonstrated that central area cropping leads to a better result (+0.27% dice). Random cropping may cause instability since for every testing case the content of the guidance patch may vary a lot. Though this part was not included in the paper right now, we will include this issue in the final version.

Comment 5: Was the inference done on patches with a sliding window strategy due to limited GPU memory? What was the batch size during training? (Reviewer #1, Meta Reviewer) Response: Yes, the inference for patch-based methods use a sliding window strategy, which is described in Section 3.2. The batch size for our proposed method is 1. For other patch-based methods, we have tested different batch size settings of 1, 2 and 4 (on BRATS2020) and we found quite a few models perform best with a batch size of 1, such as UNet3D and ResUNet3D. The final experimental results in the paper are all reported with batch size being 1.

Comment 6: Would suggest using cross validation splits for training. (Reviewer #1, Meta reviewer) Response: We managed to finish 4 splits of the 5-fold cross validation of our method before rebuttal deadline, and the average DSC is 83.83% (84.27%, 83.82%, 83.62%, 83.62%), which is very close to what is reported in the paper, thus proving the stability of the proposed method. We also conducted some extra experiments with 6/2/2 train/val/test splits of BRATS2020 after submission, and the results are also consistent with those in the paper, our framework outperforming backbone baseline with 2.2% higher DSC.

Comment 7: Results are limited without reporting std. (Reviewer #2) Response: Some STD values of the results on BRATS2020 are listed as follows, in the format of [Method, DSC STD, HD95 STD]. The full STD results will be presented along with our code on the GitHub page. V-Net, 0.1345, 18.6006; UNet3D, 0.1277, 21.1204; ResUNet3D, 0.1182, 19.4195; ResUNet3D↑, 0.1525, 8.0853; HDResUNet, 0.1474, 12.0561; Ours, 0.1433, 8.6250. Our framework shows its great stability on 95% Hausdorff and also outperforms other patch-free methods’ STD of DSC.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presented a patch-free 3D medical image segmentation framework. The paper has been well written. The method is sound though with limited novelty. The authors have answered the concerns raised by the reviewers well, such as details about the batch size, std values, the pre-processing.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This is an interesting work for deep learning based 3D segmentation. It takes down-sampled volumes as input and produced high resolution segmentation by integrating with super resolution techniques. The rebuttal clarified key concerns raised on experimental evaluation.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes patch-free segmentation framework for 3D images. It is in line with a recent trend to combine global context with local details for semantic segmentations. While reviewers put forward many issues in detail, the merit of this paper was mostly acknowledged. In rebuttal, the authors made further justification, most of which were for their experiments. The authors also noted they would release code plus data, which is necessary to support their paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

back to top

Patch-Free 3D Medical Image Segmentation Driven by Super-Resolution Technique and Self-Supervised Guidance