Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Cheng Peng, S. Kevin Zhou, Rama Chellappa

# Abstract

Medical image super-resolution (SR) is an active research area that has many potential applications, including reducing scan time, bettering visual understanding, increasing robustness in downstream tasks, etc. However, applying deep-learning-based SR approaches for clinical applications often encounters issues of domain inconsistency, as the test data may be acquired by different machines or on different organs. In this work, we present a novel algorithm called domain adaptable volumetric super-resolution (DA-VSR) to better bridge the domain inconsistency gap. DA-VSR uses a unified feature extraction backbone and a series of network heads to improve image quality over different planes. Furthermore, DA-VSR leverages the in-plane and through-plane resolution differences on the test data to achieve a self-learned domain adaptation. As such, DA-VSR combines the advantages of a strong feature generator learned through supervised training and the ability to tune to the idiosyncrasies of the test volumes through unsupervised learning. Through experiments, we demonstrate that DA-VSR significantly improves super-resolution quality across numerous datasets of different domains, thereby taking a further step toward real clinical applications.

SharedIt: https://rdcu.be/cyhUz

N/A

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

The authors describe a method for volumetric super-resolution (VSR) that can account for a domain mismatch during testing. The main contribution is a formulation that allows fine-tuning the feature extractor network on the test set while keeping the lightweight reconstruction network heads fixed. The method is trained for fixed upscaling factors on a publicly available lung CT dataset and evaluated for the volumetric SR problem of CT images acquired from organs liver, colon, kidney. The method compares favourably to state of the art methods (SAINT, 3D RCAN, 3D RDN) when matched for parameter count.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• Innovative formulation of a common feature extractor network as the basis for subsequent inplane, throughplane and refinement network heads. This feature extractor is adapted to the test domain in a self-supervised setting, while keeping the reconstruction heads frozen.
• The authors conduct an ablation study that shows a clear trend that the proposed modifications are beneficial
• Visual results provided in Fig. 2 are convincing and DA VSR seems to outperform compared reference methods quantitatively (Table 1)
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• The authors downscale the original dataset from 512x512 to 256x256 to reduce computational complexity. The authors further interpolate the slice thickness from original 1-3.5mm to a standardised 2.5mm. Both of those “data normalization steps” constitute a severe limitation of the practical applicability of the method. It is critical for volumetric SR methods to account for the two challenges: a) deal with varying through plane upscaling factors b) control computational complexity imposed by large 3D input volumes with high inplane resolution.
• It is unclear how important the sample size of the test set is for an efficient self-supervised adaptation. How does the method perform on ‘new’ samples from the target domain (after adaptation)? What if there is only a single volume available from test domain T? If we would split test domain T into T_a and T_b, adapt parameters on T_a and test on T_b, how would SR performance compare on T_a/b?
• The magnitude of quantitative improvements due to domain adaptation are not very convincing, no statistical significance tests were conducted.
• It is questionable whether the experiment summarised in Table 1 really tests the methods’ ability to handle domain shift or is mostly driven by the fact that there are ~10 times more training scans in the lung dataset allowing to train a better feature extractor regardless of the organ.
• Overall the method is comparably complex and it is difficult to follow how exactly the network and reference methods are trained. E.g which heads/components are trained and frozen at what stage. It seems reference methods were re-implemented, to what extent were those reimplementations optimised to ensure fair comparison?
• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There is a discrepancy between the provided reproducibility checklist and what is observed in the paper. For example, standard deviation/error bars/significance tests are not reported for the presented results; no description of hardware & software used for training.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Potential typos:

• Equation (5): Should it be G_S^fro and G_F instead?

It would be valuable to report the ‘upper bound’ of the performance of a supervised method trained on the Liver/Colon/Kidney dataset with matched sample size.

Please discuss or correct the discrepancy of DA-VSR’s performance in Table 1 and Table 2. Wouldn’t one expect that this performance is identical, why has DA-VSR more parameters in Table 2 as compared to Table 1?

Evaluation in the context of a second application could further confirm the benefit of the method, e.g. perhaps on a modality with more pronounced domain shift across scanners/settings (e.g. MRI)

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is interesting as it relies on a common backbone feature extractor for the inplane, throughplane and refinement reconstruction tasks. My biggest concern is centered around the evaluation of the method. In particular the evaluation is conducted in an idealized setting (inplane downscaling to 256x256, standardization of through plane resolution to 2.5mm). Further reference methods seem to be largely reimplemented, where it is not convincingly clear how they were trained and optimized to allow for a fair comparison. E.g. it should be discussed why DA-VSR^NA outperforms SAINT (Table 2) without any domain adaptation. Variability/Error estimates of performance estimates are not provided.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

3

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

A new domain adaptation algorithm called DA-VSA is proposed for the 3D medical image super-resolution task. A unified encoder is used with different heads for multiple directions super-resolution. In the test phase, a simple adaption is employed with a backbone fixed to achieve the best results.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Significant improvement in PSNR and visual quality from sample given in the paper.
2. The new idea that do multi-directional SR and combination of them using a shared encoder and different heads.
3. A general solution that compatabile with other methods like SAINT and SMORE.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The methods for comparsion is implemented by the author. “we implement all compared models to have similar number of parameters” – Page 5
2. Comparsion with other methods are lacked. Related works are mentioned while the difference is not highlighted.
3. Lack of some reference. This paper also includes SR for multiple axes but in the same network. Georgescu, Mariana-Iuliana, Radu Tudor Ionescu, and Nicolae Verga. “Convolutional Neural Networks with Intermediate Loss for 3D Super-Resolution of CT and MRI Scans.” IEEE Access 8 (2020): 49112-49124.

This paper seems to be the first using multi-slices as input for SR in CT. Yu, Haichao, et al. “Computed tomography super-resolution using convolutional neural networks.” 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Some details about the model is not given in the paper like the trade-off hyper-paramter $\lambda$, while the author is committed to provide the source code for training and testing. The dataset is public available also. Overall the reproducibility is good if the code will be given in the future.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The methods for comparsion is implemented by the author so it would be great to add some sentences to justify the correctness of the implementation, like ‘achieved similar PSNR compared to the original paper’
2. Though the related works are mentioned, it would be better to clearly mention the similar and different parts between the proposed and existing.
3. It will make the paper more convincing to add those missing reference and some short discussions.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method achieves significant improvement in terms of PSNR and the idea is also novel (paper with similar published while the network structure and loss are different).

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

The paper studies domain adaptation volumetric super-resolution methods based on a series of slice-wise. The authors propose a technique termed DA-VSR. In contrast to the existing method, DA-VSER uses a single feature extractor with several task-specific network heads for upsampling and fusion. DA-VSR leverages in-plane and through-plane resolution differences as self-supervised signals for self-learned domain adaptation. The authors demonstrate the robustness of their method compared with the state-of-the-art qualitatively and quantitatively.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

– The paper is well organized, and the motivation behind mixing domain adaptation is well justified. – The DA-VSR introduced an novel in-plane SR head in the self-learned domain adaptation stage. – The DA-VSR achieves better results compared with the state-of-the-art methods quantitatively and visually.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

– Quantitative results were reported in Table 1 and 2. How are the results computed? For example, specific cases or all the dataset. – I think the claims of the paper are a bit bold. The paper claims in the Abstract section “DA-VSR significantly improves super- resolution quality across numerous datasets of different domains”. However, there is one dataset including different organs. – Marginal improvement in terms of SoTA – Figure 1 needs to be more optimally presented. Would it be possible to somehow make it more clear what is happening? Otherwise Loss functions can be included for better illustration. – The windows in Figures are too wide. – – The authors seem to have missed some relevant literature. Specifically they don’t discuss learning based methods for image enhancement at any length, missing out on several relevant citations, e.g. “Deep Generative Adversarial Networks for Compressed Sensing Automates MRI”, “Structurally-sensitive multi-scale deep neural network for low-dose CT denoising”, “Multiple Cycle-in-Cycle Generative Adversarial Networks for Unsupervised Image Super-Resolution”.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

– Will code be available? It is not mentioned that code is made available.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

– “To facilitate faster and more efficient acquisitions or storage, it is routine to acquire or reconstruct a few high-resolution cross sectional images, leading to a low through-plane resolution when the acquired images are organized into an anisotropic volume.” is a little bit confusing. Please reorganize this sentence. – It’s necessary for authors to use the correct references. In Page 4, Section 2.2, “SMORE [23]”, I have read the paper cited by authors, but can not find the term. Please check the manuscript carefully.

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The DA-VSR introduced an novel in-plane SR head in the self-learned domain adaptation stage.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

8

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers agree that this is a mostly well motivated, innovative approach with convincing visual results. However, due to the large variance in the scores and some points brought up by R1 questioning the practicability of the method I recommend this paper for the rebuttal phase.

Among the reviewers there also seems to be a consensus that despite the good visual results, quantitative improvements seem very small, no significance tests are reported and the baselines have been reimplemented in a way guaranteeing fair a comparison.

Specifically R1, points out that the downscaling to 256x256 as well as the treatment of the standardisation of the through-plane resolution to 2.5mm limits the methods applicability in pratice.

Furthermore, R1 points out that positive results may be due to the much larger size of the training dataset.

Please make sure to address the above points in particular, along with the other issues raised by the reviewers.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

# Author Feedback

We sincerely thank all the reviewers for their time and valuable comments. We are glad to see that all reviewers consider our work to be interesting and novel. We also thank them for pointing out various presentation issues and suggestions on references which will be fixed. With limited space, we address some of the key questions from reviewers:

Reviewer 1+2: “Reference methods seem to be largely re-implemented”/Baseline Reliability Most of the presented baselines are in fact implemented by their respective authors. RDN/RCAN are implemented based on https://github.com/sanghyun-son/EDSR-PyTorch, which is recommended by the original author. We obtained SAINT from its original author, in fact the networks and training hyper-parameters follow the experiments done in SAINT by Peng et al [12]. The only algorithm that we took some liberty implementation-wise is SMORE [22] - for fair comparison, we replaced the original, shallow SR network with a deep, Residual-Dense network. By constraining on similar network size, we attempted our best at making fair comparisons.

Reviewer 1: “Comparison done in idealized setting” due to downscaling and slice thickness normalization. As we described in section 3.2, downscaling to 256 by 256 is performed due to the large memory consumption of 3D CNNs for SISR. Under similiar network size, 3D RDN and RCAN will cause out of memory issue for a 512 by 512 volume on an 11Gb Nvidia 2080Ti GPU. This is also observed in Peng et al[12], Chen et al [2], and Wang et al [16] (e.g. patch-wise inference and stitching is used to address OOM, however that can lead to artifacts on patch boundaries). To be clear, DA-VSR is scalable to very high resolution, as it breaks volumetric SR down to multi-directional 2D SR and thus much more memory friendly, akin to [12].

While we agree with Reviewer 1 that varying slice thickness in CT/MRI is an important issue in real application, such an issue is orthogonal to our work. In our work, we seek to address the domain shift between training and test datasets. Therefore, we apply slice thickness normalization to eliminate a changing variable that can confound our experiments. Slice thickness normalization is widely adopted across works that deal with volumetric medical images and can be seen in previous SR works [2,16,22].

Reviewer 1: Whether DA-VSR is handling domain shift caused by different organs or benefited from larger training set and a better feature extractor. Table 1 shows that our method benefits both from larger training set and domain adaptation. Specifically, DA-VSR outperforms DA-VSR_NA, despite both using the same large lung dataset for training; this demonstrates the utility of domain adaptation. DA-VSR also outperforms DA-VSR_SMORE, therefore showing the utility of pre-training with a large dataset despite its domain difference from the test set.

Reviewer 3: There is only one multi-organ dataset used for our experiments. As described in Section 3.2, we used multiple datasets (LIDC, KITS, Medical Segmentation Decathlon), each of which is taken by different scanners and focused on different organs.

Reviewer 1+3: Marginal performance improvement/significance test. We thank Reviewer 1 for suggesting significance tests and will add them to the final version where more space is given. As an example, the p-value is 0.03 of DA-VSR vs SAINT (X4) for the kidney dataset with 59 samples.

While signficant, PSNR/SSIM are limited in showing improvements on details. As shown in Fig. 2 and in supplemental material, there are distinctive domain drift artifacts of which DA-VSR addressed well. Domain drift also does not happen uniformly on a CT image. Some patches do not suffer as much since similar patches can be observed in the training set; however, some patches suffer heavily due to lack of observations. We argue that preventing the generation of erroneous detail in those patches is a clinically valuable pursuit.

Code for DA-VSR will be released/submitted upon acceptance.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Many of the points raised by the reviewers have been addressed.

• The fair baseline issue was answered satisfactorily for me.
• The issue with the small effect sizes has been partially addressed since the authors propose to include statistical significance tests and report one example comparison in the rebuttal. However, I believe such a change should not be judged at the meta-review stage and without the necessary context and details about the kind of test performed etc.
• The issues with in-plane and through-plane resolution relating to practicability have been mostly addressed. While the points raised by R1 are valid, I am willing to accept that they can be set aside for a proof-of-concept work.
• Positive results are partially due to pretraining on a larger dataset but also due to domain adaptation as shown in the quantitative results.

Based on the rebuttal I follow the recommendation of R2 & R3 in accepting the paper.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a novel algorithm called domain adaptable volumetric super-resolution (DA-VSR) for 3D image data. Two reviewers give relative high marks and the other reviewer raise a concern related to unclear experiment setting and small quantitative improvements.

In the rebuttal, the authors addressed these concerns, and they are strongly suggested to add these information to their final version.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

In the first reviews the major concerns were raised by R1 with regards to practical applicability. The rebuttal has addressed most of the concerns, especially if we consider that practical steps have to be taken to address out of memory issues. However the authors argument about the results in table 1 demonstrating proof of handling domain shift could have been presented better.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3