Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Wenxuan Wang, Chen Chen, Meng Ding, Hong Yu, Sen Zha, Jiangyun Li

Abstract

Transformer, which can benefit from global (long-range) information modeling using self-attention mechanisms, has been successful in natural language processing and 2D image classification recently. However, both local and global features are crucial for dense prediction tasks, especially for 3D medical image segmentation. In this paper, we for the first time exploit Transformer in 3D CNN for MRI Brain Tumor Segmentation and propose a novel network named TransBTS based on the encoder-decoder structure. To capture the local 3D context information, the encoder first utilizes 3D CNN to extract the volumetric spatial feature maps. Meanwhile, the feature maps are reformed elaborately for tokens that are fed into Transformer for global feature modeling. The decoder leverages the features embedded by Transformer and performs progressive upsampling to predict the detailed segmentation map. Extensive experimental results on both BraTS 2019 and 2020 datasets show that TransBTS achieves comparable or higher results than previous state-of-the-art 3D methods for brain tumor segmentation on 3D MRI scans. The source code is available at https://github.com/Wenxuan-1119/TransBTS.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_11

SharedIt: https://rdcu.be/cyhLD

Link to the code repository

https://github.com/Wenxuan-1119/TransBTS

Link to the dataset(s)

https://ipp.cbica.upenn.edu/

Reviews

Review #1

Please describe the contribution of the paper

The authors present a method for 3D segmentation based on U-Net and Transformers for Brain Tumor Segmentation called TransBTS. The method The method uses an encoder to reduce the dimension of the input, a transformer encoder to learn global features, and a decoder to recover high-resolution features. The results show that the TransBTS outperforms several methods in the BraTS 2019 dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The TransBTS represents a novel application of transformers in the field of 3D medical segmentation. Also, the use of convolutional layers with transformers permits the analysis of local and global features together.
- Table 1 shows that the method obtains high results in the task of brain tumor segmentation.
- Qualitative results show that the segmentations obtained with the TransBTS have a a higher level of detail than other top-performing methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The main results (table 1) show that the TransBTS outperforms several methods tested on the BraTS 2019 dataset. However, the comparison does not include methods like [1,2], which obtain comparable and higher results than TransBTS, respectively. This goes agains the claim that TransBTS outperforms the state-of-the-art for brain tumor segmentation in 3D MRI scans.
- Based on the batch size and the GPU capacity used for training, the TransBTS seems to be highly memory-demanding, which limits its usability. Also, considering that the methods that perform on par or superior to TransBTS are much more memory-efficient, it is unclear what is the advantage of using transformers for this task.
- The last ablation study about the skip-connections (SC) is not clear. There is no explanation about the SCs that are included in the transformer layers, so there is no way to know if the included connections are the standard SC used in transformers or if those originate in the convolutional encoder.
[1] Frey, M., & Nau, M. (2019, October). Memory efficient brain tumor segmentation using an autoencoder-regularized U-net. In International MICCAI Brainlesion Workshop (pp. 388-396). Springer, Cham. [2] Myronenko, A., & Hatamizadeh, A. (2019, October). Robust semantic segmentation of brain tumor regions from 3D MRIs. In International MICCAI Brainlesion Workshop (pp. 82-89). Springer, Cham.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code will be made available. The dataset is public.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- A more detailed comparison with methods that attain similar or higher performance would be helpful to understand the benefit of using transformers for this task. Because of the way it is currently presented, the model seems interesting but I fail to see any advantage of using it instead of other state-of-the-art methods, while the computational-cost drawback is evident.
- The comparison with the state-of-the-art is incomplete, which makes Table 1 misleading because the TransBTS does obtain better results than all the methods it is compared against, but there are other approaches with higher performances.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method presented has a clear technical novelty and obtains good results, which makes it a promising novel approach for the task of 3D segmentation. However, the major limitation of the model is the computational and memory costs that the transformer introduces, and that is never addressed in the paper. 3D segmentation is by itself a computationally expensive task, and creating methods that further increase that cost raises the question about their real utility. This question is even more relevant considering that there are methods that obtain better results while being cost-efficient (see citations in the weakness section). Consequently, the TransBTS does not outperform the state-of-the-art methods, as it was originally claimed.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

7
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

In this paper, the authors extend the current 2D transformer-based segmentation framework to a 3D version and apply it Ito volumetric brain tumor MRI segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The clarity and organization of this paper is good.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The survey is insufficient since some published state-of-the-art methods [1-2] which obtained convincing results on BraTS 2019 validation dataset were not cited in this paper.
2. The novelty of the method is quite limited. As mentioned in the paper, there are two main distinctions between the proposed method and TransUNet [3], (1) the proposed TransBTS is a 3D network while the TransUNet is a 2D network; (2) The TransUNet adopts pre-trained backbones while the proposed TransBTS is trained from scratch. However, these changes are neither significant nor novel, especially for a MICCAI paper.
3. The result of the proposed method shown in Table1 is not convincing as no state-of-the-art (SOTA) method was included for comparison. Please refer to the leaderboard of BraTS 2019 validation set (https://www.cbica.upenn.edu/BraTS19/lboardValidation.html) and the proceeding of BrainLes 2019 [4] for more information.
4. The motivation of applying transformer to brain tumor MRI segmentation is not clear, especially considering its poor segmentation performance.
[1] Jia H, Xia Y, Cai W, et al. Learning High-Resolution and Efficient Non-local Features for Brain Glioma Segmentation in MR Images, International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020: 480-490. [2] Jiang Z, Ding C, Liu M, et al. Two-stage cascaded u-net: 1st place solution to brats challenge 2019 segmentation task, International MICCAI Brainlesion Workshop. Springer, Cham, 2019: 231-241. [3] Chen, Jieneng, et al. “Transunet: Transformers make strong encoders for medical image segmentation.” arXiv preprint arXiv:2102.04306 (2021). [4] Crimi, Alessandro, and Spyridon Bakas, eds. Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 17, 2019, Revised Selected Papers, Part I. Vol. 11992. Springer Nature, 2020.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of the paper is good, since all details of the implementation and network structure are provided. In addition, the author promised to release the code soon.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The literature review should be improved and more SOTA methods should be cited.
2. It is necessary for the authors to compare the proposed method with the SOTAs on BraTS 2019 validation set. In addition, it would be more convincing if the authors can provide the experimental results on BraTS 2020 validation set.
3. The ablation study of the transformer module should be added in the experiments.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The novelty of the proposed method is marginal.
2. The segmentation result of proposed method on BraTS 2019 validation set is far from the state-of-the-arts.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper introduces a transformer based solution for 3D volumetric segmentation of brain tumor from MRI.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-Using transformers for 3D volumetric segmentation is a novel application.

-TransBTS has an initial set of conv layers for low level feature extraction and then has transformers in the bottle neck which extracts global dependencies.
- Ablation studies conducted are useful.
-TransBTS is able to use skip connections between conv layers in encoder and decoder which helps in efficient training.

-TransBTS does not use any pretrained weights from ViT.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The results in terms of performance metrics is not much impressive as it achieves only an improvement of less than 0.3 % for whole tumor and tumor core.
-Comparison in terms of number of parameters, FLOPs and inference time would shed light on the efficiency of TransBTS.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Implementation details are clear. Code link can be added in the final version.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Overall, this paper is novel as it is one of the first to apply transformers for 3D volumetric segmentation. Although the improvements in performance is not that impressive, using transformers for extracting global dependencies should be useful for brain tumor segmentation. What can be added to make the paper better would be comparison in terms of number of parameters, FLOPs and inference time. Also, more comparison in terms of novelty with other concurrent works of transformers for medical segmentation could be added.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novelty, Application of transformers for 3D volumetric segmentation.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Interesting work but all reviewers expressed their concerns on the lack of comparative evaluation against other state-of-the-art methods [1,2] and limited novelty. Reviewer 1 also expressed concerns on the high computational and memory costs introduced that Transformer introduces. Please carefully address the issues raised by the reviewers in your rebuttal.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

Author Feedback

Limited novelty(R2) We respectfully disagree. R1 and R3 acknowledged the novelty of TransBTS as the first application of Transformer in 3D CNN for volumetric MRI segmentation. Without bells and whistles, TransBTS is a clean and general 3D network that serves a strong 3D baseline for volumetric segmentation. We argue that the two main distinctions between TransBTS and TransUNet are significant. 1)TransUNet is a 2D network that processes each 3D image in a slice-by-slice manner, while our TransBTS is based on 3D CNN and processes all the image slices at once, allowing the exploitation of better representations of continuous information between slices. There are fundamental differences between 2D and 3D networks (e.g. 2D UNet vs 3D UNet). Moreover, effectively incorporating Transformer in 3D CNN is not a trivial task, and we are the first to accomplish this. TransBTS sheds lights on the utility of Transformer in 3D CNNs and inspires new research in this direction. 2)TransUNet adopts the ViT structure to leverage pre-trained ViT models on large-scale ImageNet dataset. In contrast, TransBTS has a flexible network design and is trained from scratch on task-specific dataset without the dependence on pre-trained weights, making it more flexible and dataset friendly (a strength acknowledged by R3). We also stress that our method and TransUNet (arXiv preprint) are concurrent works.

Comparative evaluation(R1,R2) Since TransBTS is a 3D network for volumetric segmentation, we focus on comparing it with SOTA 3D networks to demonstrate its effectiveness. Our method yields the best performance. After using Test-Time Augmentation(TTA), TransBTS has Dice scores of 78.93%, 90.0%, 81.94% on ET, WT, TC, which are comparable or higher results than that reported in the two references suggested by R1. Moreover, based on the performance comparison of 8 top-ranking methods in reference [1] provided by R2, TransBTS yields the mean-rank of 7, which ranks 6th among those methods according to BraTS19 leaderboard. TransBTS is highly competitive in terms of performance. Note that TransBTS is a clean and general 3D network (with basic 3D conv layers and Transformer module) without any complex structures and add-ons. Any effective techniques such as multi-scale feature fusion (used in [1] provided by R2) can be easily plugged into TransBTS to boost the performance. It’s worth emphasizing that the main goal of this work is to investigate how to effectively incorporate Transformer in the popular 3D CNNs to unleash the potential of both networks, rather than just beating the SOTA results with complex designs.

Transformer ablation and BraTS2020(R2) We test TransBTS without the Transformer part with 5-fold cross validation on BraTS19 training set, resulting in Dice scores of 73.07%, 88.51%, 80.96% on ET, WT, TC. Compared to the results of TransBTS (see line 3 of Sec. 3.1 in paper), significant improvements are brought by Transformer (5.85%, 1.72%, 0.23% for ET, TC, WT), verifying its effectiveness. We also evaluate TransBTS on BraTS2020 validation set. We adopt the hyperparameters on BraTS19. It achieves Dice scores of 78.73%, 90.09%, 81.73% and HD of 17.947, 4.964, 9.769mm on ET, WT, TC. It’s comparable to the SOTA method (nnU-Net for Brain Tumor Segmentation) which took the 1st place in the segmentation task of BraTS2020, according to the mean score of Dice and HD95 of Table2 in this reference.

Model complexity (R1,R3) TransBTS has 32.99M param. and 340.9GFLOPs (3D Unet has 16.21M param. and 1669.53GFLOPs), which is a moderate size model. By reducing the layers in Transformer from 4 to 1, we reach a lightweight TransBTS which has 15.14M param. and 212.99GFLOPs while achieving Dice scores of 78.9%, 90.36%, 81.76% on ET, WT, TC (with TTA on BraTS19). Note that efficient Transformers such as Reformer can be used in our framework to reduce the memory and computation complexity while maintaining the accuracy. But this is beyond the scope of this work.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors’ response was convincing and clearly addressed the major comments raised by the reviewer. I agree that the use of transformer for 3D CNN is not a trivial task and therefore, I suggest the paper can be interesting for many other researchers in MICCAI community.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors combined transformer with 3D U-Net for brain tumor segmentation. The idea is interesting, but there are some contemporary works with similar ideas. The authors clarified the difference in the rebuttal, but the novelty still seems to be limited. The comparison with state-of-the-art methods for brain tumor segmentation was not very convincing.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

16

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors addressed most reviewers’ concerns in the rebuttal. The idea of applying transformer to 3D segmentation is interesting and trending. Although the performance is not significantly improved when compared with other SOTAs, the paper could be of interest to the community. With the additional results presented in the rebuttal, I recommend acceptance of the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

back to top

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer