Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Lei Li, Veronika A. Zimmer, Julia A. Schnabel, Xiahai Zhuang

Abstract

Left atrial (LA) segmentation from late gadolinium enhanced magnetic resonance imaging (LGE MRI) is a crucial step needed for planning the treatment of atrial fibrillation. However, automatic LA segmentation from LGE MRI is still challenging, due to the poor image quality, high variability in LA shapes, and unclear LA boundary. Though deep learning-based methods can provide promising LA segmentation results, they often generalize poorly to unseen domains, such as data from different scanners and/or sites. In this work, we collect 140 LGE MRIs from different centers with different levels of image quality. To evaluate the domain generalization ability of models on the LA segmentation task, we employ four commonly used semantic segmentation networks for the LA segmentation from multi-center LGE MRIs. Besides, we investigate three domain generalization strategies, i.e., histogram matching, mutual information based disentangled representation, and random style transfer, where a simple histogram matching is proved to be most effective.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87231-1_54

SharedIt: https://rdcu.be/cyhWd

Link to the code repository

https://github.com/Marie0909/AtrialGeneral

Link to the dataset(s)

ISBI 2012: Left Atrium Fibrosis and Scar Segmentation Challenge http://atriaseg2018.cardiacatlas.org/

MICCAI 2018: Atrial Segmentation Challenge http://www.cardiacatlas.org/challenges/left-atrium-fibrosis-and-scar-segmentation-challenge/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper compared several different domain adaptation methods for LA segmentation when the data were collected from different centers. The authors reported the performance of these methods and discussed the results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is easy to follow. The paper is well-organized.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty of this paper is limited. The experiments among different segmentation networks and DA methods are listed. However, there is no new method or improved method is introduced, which could be a serious weakness.
    2. The authors may confuse domain adaptation and domain generalization. In the title, domain generalization is utilized, while in experiments domain adaptation methods are compared.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The utilized networks are all public networks.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Some technical novelty is needed.

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the novelty of the paper.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The authors propose to apply domain generalization techniques to deep-learning models for left atrial segmentation. They explore histogram matching, MID-Net and RST-Net and they show that the simple strategy of histogram matching achieves the best results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Paper is well organized and easy to read; tables and figures are well presented.
    2. The performance of networks on the target domain did improve after the application of HM, MID-Net, RST-Net.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors explored the scenario of domain shift with 4 different networks and they tried 3 methods to improve the domain generalization ability. However, the application of these DG techniques on the networks looks very straightforward. The novelty is limited.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors will release the code and data. It will not be difficult to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. domain generalization -> DG, domain adaptation -> DA
    2. the explanation in appendix is good, but ‘authors should not submit text materials beyond figure and table captions, the definition of variables in equations, or detailed proof of a theorem’ (https://miccai2021.org/en/PAPER-SUBMISSION-GUIDELINES.html#supplementary-material).
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is focusing on the application of current DA/DG methods applied on LA segmentation. It is overall good, easy to read, results well presented, but the novelty is limited.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors study how to address the domain shift in the segmentation of medical images. Their contribution is the comparison of 3 different domain generalization methods (histogram matching, random style transfer and MID-net). They use LGE MRI of the atria acquired in 4 different centers, and find that histogram matching produces the best results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Addressing the domain shift is interesting and important to the deployment of ML models to the real clinic. Since clinical data is very scarce, working towards combining data from different centers is very useful to the community. The experiments were well conducted and clearly reported.

    Finally, automatic segmentation of LGE images is a problem that is still not completely solved.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors did not properly justify their choice of specific DA methods and other decisions like the use of 2D networks instead of 3D.

    Literature/discussion of other possible alternatives is missing to domain adaptation, namely to fine tuning and unsupervised domain adaptation.

    The discussion could be improved, I miss the author’s interpretation of the experiments’ results.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code was provided, but they provide details on their optimization methods and hyperparameters and the libraries/packages they used. For completeness, it would be nice to also include the decay factor Beta_2 of the adam algorithm, and the way the weights were initialized. Small typo: “was set to 5e-5 and multiply 0.95” -> “was set to 5e-5 and multiplied by 0.95”

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    1- “In the clinic, it is impractical to retrain a model each time for the data collected from new vendors or centres”: While a complete retraining may be unfeasible, there exists methods in the literature for fine-tune already trained models. Why were these methods not considered? Literature in unsupervised domain adaptation methods that try to align the target and source domain is also not discussed. This should also be addressed in the discussion, since HM seems to obtain the best results.

    2- Why did the authors choose specifically the 3 methods compared? The authors should also state more clearly which are the premises of the selected methods, a short explanation of how they work, and why the authors believe they are an appropriate choice for their problem. Is reference #14 correct of RST? I could not find in that paper any reference to RST.

    3- The authors’ dataset is quite small. Did the authors consider combining both pre- and plost- ablation studies? That would also be a nice application of domain generalization, since the authors state that the images present texture differences.

    4- A bit more of description on the data is needed, in particular whether the demographics/ethnicity of the data coming from the different centers are comparable, and if the data collection followed Helsinki convention and informed consent was acquired for the study participants.

    5- Why was only a 2D approach considered for segmentation of 3D volume? Was information of height / coherence with previous and following slice considered?

    6- Since HM is completely independent of the training, have the authors considered applying it additionally to the other DA methods?

    7- HM outperforms the other methods, why do the authors think that the other methods fail to find histogram-invariant features? The discussion of the results in that regards is a bit short. I would like a bit of discussion on the need of the use of (potentially unlabeled) target domain data [ See comment #1].

    Minor comments:

    • The acronym of domain generalization is noted as DA in the paper. Shouldn’t it be DG?
    • Table 3: make sure all data in the same row has the same number of significant figures
    • Figure 1: It is not obvious to the non-expert. Mark the scarred areas.
    • Fig 3) Are fig a) and b) the same? If the main objective is to compare pre- and post- segmentation quality, it would be beneficial to put the boxplots of after and before in the same figure. Since the authors already established that DeepLab was the best model, they could put the other models in the supplementary material. Finally, add the quantitative data of the separated pre- and post- ablation. -Section 4.1: specify the statistical test used to derive the pvalues ( “though the difference is not significant for U-Net (p = 0.479)”)
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In my opinion, this paper has both strong and weak points. I find the paper’s topic of overcoming the domain shift very interesting for the community. I also think that the experiments are well conducted.

    My concerns to the work is mainly the lack of justification of the chosen methods, but I think that this can be ammended.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors studied the scenario of domain shift and compared 3 methods to improve the cross-domain generalization for LA segmentation from LGE MRI. All reviewers agree that the work well-motivated and clearly written. However, the largest concern is the lack of novelty in this work. That being said, the reviewers still find such a well-documented work can be beneficial to the community for better tackling the domain shift problem prevalent in many medical imaging applications. In the rebuttal, the authors are expected to better motivate and justify the chosen methods in comparison, and add in-depth interpretation on the results and conclusions in Discussion.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6




Author Feedback

Dear Area Chairs,

We would like to thank the meta-reviewer (MR) and reviewers (R1,2,3) for their very constructive and thoughtful comments, which have greatly improved the clarity of our manuscript. We have summarized several main comments with corresponding responses.

  1. For the novelty concerns: All three reviewers raised concerns about the novelty of this paper.

We would like to emphasize that our main contribution is not to propose a new method, but to investigate and evaluate domain generalization (DG) strategies on a new application, i.e., left atrial (LA) segmentation of multi-center LGE MRIs. Note that a particularly strong evaluation and a novel application (alongside novel formulation) are highlighted in the statement of CALL FOR PAPERS and REVIEWER GUIDELINES of MICCAI 2021. Besides, MR mentioned, “it can be beneficial to the community for better tackling the domain shift problem prevalent in many medical imaging applications”, and also R3 mentioned, “automatic segmentation of LGE images is a problem that is still not completely solved”.

  1. For the methodology and data concerns: -R3 asked for more explanation on the choice of DG methods and the use of 2D models. -R3 expected more clarity on the dataset description.

2.1 (1) The reason for choosing the three DG methods, i.e., HM, mutual information based disentangled representation (MID), and random style transfer (RST), is because they are state-of-the-art, representative and effective. They try to solve the domain shift problem from different perspectives (see Sec 2.2): HM aligns all domains onto a space with similar intensity distribution, MID extracts domain-invariant features, and RST augments domains. (2) We used 2D models, because it was reported in the literature that 2D U-Net performed slightly better than 3D U-Net for LA segmentation from LGE MRI, which has also been confirmed from our own experiments. 2.2 The datasets were acquired from EU and USA hospitals, and are partially from public datasets. We deleted some data information to ensure anonymity, and will include them in the final version.

  1. For the experiment concerns: -R1 thought the methods we tested were domain adaptation (DA) methods instead of DG, and R3 expected us to test and discuss DA methods. -R3 thought HM is completely independent of training, and suggested to apply it additionally to the other DG methods.

3.1 (1) While it is true that HM and MID can also be employed for DA (where the target domain is available), in this study we used it for DG (where the target domain is unknown). (2) DA is out of the scope of this study. 3.2 While it would be interesting to combine HM with MID/RST, it is important to note that HM and RST are mutually exclusive optimizations, as HM aims to reduce the domain diversity among training data, whereas RST tries to increase it.

  1. For the discussion concerns: MR, R3 suggest to add in-depth interpretation on the results and conclusions in discussion.

We provide as follows and could add to the final paper, space permitting: (1) A main challenge of DG is that the available source domains often exhibit limited diversity, hindering the DG ability of models. RST evidently improved the DG ability, which proved the effectiveness of domain augmentation. However, when we introduced multi-source domains (MSD), the DG improvement is minimal. This may be attributed to the existence of large variations in the manual labels of LGE MRIs from different centers. (2) There was no evident performance difference in LA segmentation between post- and pre-ablation LGE MRI, probably because we directly combined them in training and test stages. Note that there could be domain shift between post- and pre-ablation LGE MRI from the same center, which nevertheless was not considered here. (3) The two state-of-the-art DG methods both performed worse than the simple HM, which may indicate that there is still large scope for further algorithmic developments in DG.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper addressed an important problem for clinical applications of DL methods, and compared several different domain adaptation methods for LA segmentation when the data were collected from different centers. The authors have properly addressed the concern, mainly on novelty, and I agree that per MICCAI guideline “we encourage submission of papers that demonstrate clinical relevance and novel clinical applications” and this paper falls into this category.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is an interesting and timely study on the domain generalization problem. Although no new methods were proposed, the empirical study on a good size dataset can offer valuable insights for other applications. The authors gave good response to the points raised by the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work compared several DA methods on multi-site datasets. The major concern raised by reviewers is the novelty. Another distinction to make is DA and DG. Authors clarified in rebuttal that the goal of this paper is “to investigate and evaluate domain generalization (DG) strategies on a new application”. Personally, I would not call LA segmentation a “novel” application. Furthermore, if the goal is to evaluate domain generalization techniques, to be very honest, a short MCCAI paper is not an ideal venue to fully include the SOTA DG methods. I would say an extended version of this work will be better suited for an application-oriented journal, or if authors are willing to, for hosting a challenge by releasing the dataset. But the current work falls short to be included in MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    21



back to top