Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Numan Celik, Sharib Ali, Soumya Gupta, Barbara Braden, Jens Rittscher

Abstract

Gastrointestinal (GI) cancer precursors require frequent monitoring for risk stratification of patients. Automated segmentation methods can help to assess risk areas more accurately, and assist in therapeutic procedures or even removal. In clinical practice, addition to the conventional white-light imaging (WLI), complimentary modalities such as narrow-band imaging (NBI) and fluorescence imaging are used. While, today most segmentation approaches are supervised and only concentrated on a single modality dataset, this work exploits to use unsupervised domain adaptation (UDA) technique that is capable to generalize to an unseen target modality. In this context, we propose a novel UDA-based segmentation method that couples the variational autoencoder and U-Net with a common EfficientNet-B4 backbone, and uses a joint loss for latent-space optimization for target samples. We show that our model can generalize to unseen target NBI (target) modality when trained using only WLI (source) modality. Our experiments on both upper and lower GI endoscopy data show the effectiveness of our approach method compared to naive supervised approach and state-of-the-art UDA segmentation methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_29

SharedIt: https://rdcu.be/cyl4c

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper
    • work exploits to use unsupervised domain adaptation (UDA) technique that is capable to generalize to an unseen target modality

    -propose a novel UDA- based segmentation method that couples the variational autoencoder and U-Net with a common EfficientNet-B4 backbone, and uses a joint loss for latent-space optimization for target samples.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The impact that can be generated by the work is demonstrated by the statistics presented in the introduction:

    “Endoscopy, a vital tool for screening and disease surveillance, is however heavily operator dependent and 12% of cancers are missed”

    • Theoretical part very well constructed.

    • The results are well presented, comparing with the state of the art in the application area

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some writing problems, I recommend a general review before final submission, if accepted. (Ex: visualisation)

    • A point of attention. The article gets you a lot of attention. In the dataset section the authors wrote: (train set: 90%, validation set: 10%)

    Authors could make it clearer at this point how the test basis was (or will be) defined.

    • What are the criteria for the separation of bases? At another point in the work, the authors changed the form of separation. (train set: 80%, validation set: 20%)
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The parameterization of the architectures are well described, this makes reproducibility much easier.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Some statements in the introduction could be referenced, I believe the authors could do a brief review. For example:

    “Today, high-definition endoscopes provide a sufficient resolution to allow for a detailed visualisation of the mucosal surface”

    • At some points, we found some problems in writing, I believe it is also worth a revision. For example: (visualisation)

    “Today, high-definition endoscopes provide a sufficient resolution to allow for a detailed visualisation of the mucosal surface”

    • Authors can make it clearer how public databases were used. Especially how they were separated between the bases of training, testing and validation.

    • The authors could make a visual scheme to help the reader to understand about the databases. Especially how the modeling base is obtained.

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Paper very well written, the authors paid attention to details and limitations of current techniques, which generates a lot of confidence.

    “To tackle these issues domain adaptation methods…”

    The technique can generate positive impacts, but it has some points of attention in the methodology (in particular the databases)

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The authors propose an unsupervised domain adaptation method to segment polyps in white light, used as the source domain, and the narrow-band imaging acts as the target domain. The network uses a shared EfficientNet-B4 encoder for a variational auto encoder and a U-Net-based segmentation model. The VAE learns the source domain representations while U-Net module leverages the learned encoded features for semantic segmentation. For inference, learnt latent space embeddings are optimized via a joint loss minimization scheme. The authors validate the approach via several experiments and ablation studies while benchmarking on other SOTA methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method proposes an architecture for unsupervised domain adaptation which outperforms several of the previous methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper does not show and what features are learnt which makes the method superior than other methods.
    • The paper lacks statistical testing on the evaluated results.
    • “…. is used to generate the closest clone of source domain with a learning rate  which is then used as an input to the segmentation network to predict the target mask.” It is not clear how the output of the joint optimization looks like (See below for suggestions).
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have marked most of the options as yes.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    (1) The authors should show with the help of a t-sne or PCA plot how the learnt features are domain invariant. (2) The authors should provide statistical results for the different metrics provided for the compared methods in Table 2 with respect to the baseline. I would suggest the DeLong test for ROC/AUC. (3) The joint optimization for test-time inference does not show what the closest clone looks like. The authors could plot t-sne/PCA features along with representative outputs on both the reconstructed images in the VAE training module and the test-time optimization module. (4) The paper could be posed as “time-time domain adaptation” method as the method involves training the test image for a few epochs/iterations and learning some feature embedding before being fed into the segmentation network.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The novelty of the method is not clear. The work combines different architectures. In particular, the contribution of the Sobel edge detection module within the deep network.
    • The authors should perform statistical tests on their evaluation to truly understand the significance of the method over the compared approaches as suggested above.
  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    In this work, the authors proposed an EndoUDA method for unsupervised domain adaptive segmentation for endoscopy images. The proposed method is based on a VAE and latent optimization strategy. The extensive experiments on two GI datasets indicate the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The proposed method outperforms the comparison models on the UDA segmentation tasks on two GI datasets.

    2) Table 1 provides detailed results on the reason to select the EfficientUNet as the segmentation network.

    3) This paper is overall clearly written and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The latent optimization module is similar to the Latent Search procedure proposed in GLSS [13], which lacks novelty.

    2) The comparison results in Table 2 and 3 are questionable. For EndoUDA, the segmentation network is EfficieitUNet, which has a better segmentation performance than naive U-Net, according to Table 1. Therefore, it is not clear whether the performance gain of the EndoUDA is from the domain adaption algorithm, or from the differences between the EfficientUnet and U-Net.

    3) For the whole paper, the definition of domain adaption is not correct. For the domain adaption, the data in the target domain is unlabelled, instead of unseen.

    4) There lacks comparison with recent domain adaptive segmentation methods, such as [a], [b], [c]:

    [a] ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation, in CVPR 2019 [b] What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation, in CVPR 2020 [c] Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision, in CVPR 2020

    Among them, [b] is working on the endoscopy images as this paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is not seen during the review. Since the datasets in this paper are private, it is questionable whether this method is effective without validating on public datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    1) The claim on page 2 that there are only a few domain adaption methods in medical image analysis is not correct. There are already many UDA methods, please refer to this survey [d] for details:

    [d] Domain Adaptation for Medical Image Analysis: A Survey, on arxiv 2021.

    2) For Table 3 and the ablation studies in the supplementary material, please also report the results under the Dice score.

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Given the questionary experiment results and the lack of novelty, the reviewer propose to reject it since the paper cannot meet the MICCAI standard.

  • What is the ranking of this paper in your review stack?

    7

  • Number of papers in your stack

    8

  • Reviewer confidence

    Confident but not absolutely certain



Review #4

  • Please describe the contribution of the paper

    This paper tackles domain generalization for segmentation. The main idea is to first learn a variational autoencoder learnt in the source domain. Given a target sample, the decoder part of the VAE is used to “translate” the target sample to the source distribution. To find the best “translation” of the target sample, the nearest point from the source domain is searched iteratively in the VAE latent space. The optimization of this nearest point is done by comparing the translated reconstruction and the original image, using correlation and structure similarity loss. Then, segmentation is performed using the “translated” image as input. The method is tested on 2 differents datasets, and shows SOTA results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem setting is challenging and still very new in the MICCAI community. The idea of the paper is nice, quite simple. Sufficient experimental section: Experiments on 2 different datasets, Comparison to SOTA UDA methods; Ablation studies, different Semi Supervised settings explored (adding a number of labeled examples in the target datasets);

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper is an extension of the ideas from 1] proposed in the CV field. Here, the ideas from 1] are applied on two different medical imaging datasets. The methodological differences with 1] are:removal of the Perceptual loss in the VAE training (the ablation study was already done in 1]), a small change in the network architecture (coupling of the encoder layer of the VAE and the segmentation network backbone), and an additional correlation loss in the VAE latent search optimization. The changes are incremental, but the method is well validated on two medical imaging datasets.
    2. Some clarifications could be useful (see below).

    1]Pandey et al, Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images Through Generative LatentSearch ECCV 2020

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper has sufficient details about the framework, hyperparameters choosing, and evaluation. The datasets used are a mix between private and public image datasets. The paper doesn’t mention any code available, which limits reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. There seems to be a confusion in the paper between domain generalization (DG) and domain adaptation (DA) . Both techniques aim to improve the robustness of a learned task in the source to out-of-distribution samples, with a difference : in DA, some samples from the target distribution are used for adapting, i.e. there is a target training dataset, are results are then shown on a validation (or test) dataset. On the contrary, in domain generalization, there are target samples available for adaptating, i.e. no target training dataset. The task in DG is to improve robustness on new target samples directly. It seems that in the main experiment, this paper tackles DG, a more difficult problem than DA. It would be worth emphasizing this contribution. For an example of DG paper in medical imaging, see [1].

    2. Comparing the framework figure in [2] to the one in this manuscript, the gradient in the update of z present in the former has disappeared in the latter. Is this a mistake ? If not, what is the reason of this disappearance ? The update in the former makes more sense to me (gradient descent).

    3. why is the figure 2 of the framework entitled “… the proposed EndoUDA - adversarial unsupervised domain adaptation”. What is adversarial about the proposed method ?

    4. From what I understand, in the main experiment, there is no (unlabeled) target samples available for training. I’m not sure I understand how the method has been compared to SOTA UDA methods that do require a target training dataset. Figure 2 states that “Table 2: All comparison are provided for source only trained model and tested on target data”. In [3], the paper introducing AdaptSegnet, the training source and training target output masks are aligned, and the effectiveness of adaptation is shown on an external validation target set.

    5. Shouldn’t results from the first line of Table 2 and Table 3 coincide ? The IOU values are different.

    6. The drawbacks of the method are not discussed.

    [1] Liu et al, Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to Unseen Domains, MICCAI 2020. [2] Pandey, et al, Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images Through Generative LatentSearch, ECCV 2020 [3] Tsai, et al, Learning to Adapt Structured Output Space for Semantic Segmentation. CVPR

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is an extension of the ideas from 1]. The changes are incremental, but the method is well validated on two medical imaging datasets.The work presents ablation studies, experiments of semi-supervision introducing target samples in the training of the VAE, comparison with SOTA UDA.

    The paper is relatively clear, the setting is still new and challenging, making the paper interesting for the MICCAI community.

    1]Pandey et al, Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images Through Generative LatentSearch ECCV 2020

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    8

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Reviewers find the proposed unsupervised domain adaptation method interesting. However, reviewers also identify significant weaknesses of the paper including a) incremental changes over published work, b) confusion in definition of domain adaptation vs generalization, c) inconsistent results in table 2 & 3. Authors should address major reviewers’ concerns in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6




Author Feedback

The paper presents a novel modality agnostic approach to segment the early cancer precursors in endoscopy images. All reviewers recognised the high relevance to clinical translation and acknowledged its contribution to the medical imaging field.

We thank the reviewers for highlighting positive remarks: (R1 - novel UDA based segmentation, interesting, well written and well-presented results; R3 - approach validated via several experiments and ablation studies, benchmarking on SOTA method; R5 - outperformed the current UDA models; R6 - idea of the paper nice, simple to the problem setting, and sufficient experimental section: ablation studies, different semi-supervised settings explored). The very constructive comments improved the manuscript.

Response to the meta review:

C1: Incremental changes over published work? Response: These three contributions extend [13] (1) coupling of the VAE and U-Net encoder with EfficientNet-B4 backbone architecture – here, we simplified the network by sharing weights between VAE and U-Net, i.e., with no extra encoder network in the VAE; (2) removed perceptual loss that had two weighted network layers (U-Net and DeeplabV3+) that further simplify our architecture; and (3) a new joint loss for latent-space optimization for target samples. Our ablation studies (Supp. Table 4) showed that these modifications simplify the model, reduce computation time, and boost performance. Validation is performed on two datasets.

C2: Confusion of domain adaptation vs generalization? Response: This extension of previous UDA approach [13] is a “target independent UDA” method. Unlike most UDA approaches that access the unlabelled target data, the closest point in the source domain for a given target data point is searched during inference. A segmentation network trained only on the source domain is then used on target data. As the modalities of the source and the target domain differs in these approaches, our method can be coined as “modality-independent”. Our model is thus a “modality & target-independent UDA” as the target modality is not seen during VAE training. To eliminate confusion between domain adaptation and generalization problem we have used the term “target-independent UDA” as in [13] in the revised version.

C3: Inconsistent results in table 2 & 3? Response: We have re-run the experiments for U-Net with the same seed as used for other experiments and will update the findings in Table 3 that now resembles with the results in Table 2.

Review comments: R1: Database separation into training, testing and validation? A: As stated in the paper, the training set of WLI images is separated as 80-20 split for the Barrett’s; and 90-10 split for polyp. A separate target data is used for test. R3: Statistical results for the different metrics should be provided for the compared methods in Table 2: A: We have computed the standard deviation for each reported metric for all experiments. Also, we have conducted a statistical paired t-test to observe the statistical significance in the improvement of our method given in Table 2 compared to SOTA methods. The results will be incorporated in Table 2 of the revised paper. R5: Report the results under the Dice score for Table 3: A: We will add it in the revised version. R5: There lacks comparison with recent domain adaptive segmentation methods: A: ADVENT, CVPR 2019 - results have been generated and do not change our conclusions. What Can Be Transferred, CVPR 2020 - does not address the use of multiple modalities. We will include ADVENT, CVPR 2019 result in Table 2. R6: The gradient in the update of z has disappeared in Figure 2. Is this a mistake? A: Thank you. We have fixed this typo. R6: Why is the figure 2 of the framework entitled “the proposed EndoUDA - adversarial unsupervised domain adaptation”. What is adversarial about the method? A: The approach is a target-independent UDA and not adversarial. We have removed the word “adversarial”.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper has several weaknesses, including incremental changes over prior work, confusion between domain adaptation vs generalization, and inconsistent results in Tables 2 & 3. The rebuttal did not convincingly address key concerns from reviewers. For example, instead of explaining why there were inconsistencies in the results, the authors promise to include a new table. The paper is reviewed based on the submitted version, not the revised version. The area chair thinks that the paper did not meet the high MICCAI standard.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    12



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main criticism about the paper is the lack of novelty, which I agree with. Authors apply small modifications to a recently published work. Nonetheless, the modifications seem to help and the ablation study supports them. Despite being incremental, the paper has merit. It pushes the mentioned method forward.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed the points highlighted by the original meta-reviewer. When addressing the reviewers specific points, the authors have chosen to not talk about what seem to me like some of the biggest questions. For example, R5’s point whether the improvements over the naive U-Net baseline are due to using a more powerful net or due to the proposed DA technique, or R6’s point about how baselines that require target labels were evaluated.

    This paper received 4 reviews by knowledgeable reviewers, 3 of which recommend acceptance. The rebuttal answered some points, with some questions still open, but there is nothing I would consider a major reason to reject. In sum, I follow the initially favourable recommendations of the reviewers to accept this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



back to top