Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Zheren Li, Zhiming Cui, Sheng Wang, Yuji Qi, Xi Ouyang, Qitian Chen, Yuezhi Yang, Zhong Xue, Dinggang Shen, Jie-Zhi Cheng

Abstract

Lesion detection is a fundamental problem in the computer-aided diagnosis scheme for mammography. The advance of deep learning techniques have made a remarkable progress for this task, provided that the training data are large and sufficiently diverse in terms of image style and quality. In particular, the diversity of image style may be majorly attributed to the vendor factor. However, the collection of mammograms from vendors as many as possible is very expensive and sometimes impractical for laboratory-scale studies. Accordingly, to further augment the generalization capability of deep learning model to various vendors with limited resources, a new contrastive learning scheme is developed. Specifically, the backbone network is firstly trained with a multi-style and multi-view unsupervised self-learning scheme for the embedding of invariant features to various vendor-styles. Afterward, the backbone network is then recalibrated to the downstream task of lesion detection with the specific supervised learning. The proposed method is evaluated with mammograms from four vendors and one unseen public dataset. The experimental results suggest that our approach can effectively improve detection performance on both seen and unseen domains, and outperforms many state-of-the-art (SOTA) generalization methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_10

SharedIt: https://rdcu.be/cyl75

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Due to the difficulties of collecting large-scale labeled training data, unsupervised representation learning followed by supervised learning of the downstream task is becoming general. The authors proposed a domain-specific unsupervised representation learning scheme which exploits contrastive learning of multi-view (CC, MLO) and multi-style (different vendor generated by StyleGAN) images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors properly used well-known previous methods, i.e. StyleGAN and learning with contrastive loss, to resolve domain generalization issues in lesion detection in mammograms. The proposed pretraining method showed better performance compared to 1) basic types of pretraining methods, 2) several domain generalization methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

StyleGAN and Contrastive Learning are well-known techniques. Not only the general settings of those two works but also the details specific to the lesion detection in mammography also need to be added. E.g., training the StyleGAN with medical images tends to be unstable in terms of training & quality, so details of training process needs to be described.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The detailed training recipe of the proposed method is very important especially in StyleGAN part. The quality of the generated images are important in domain generalization. The authors checked ‘NO’ for all questions related to code release. Even though the data used for the experiments is hard to be released, the codes at least need to be released for the reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

MVCL & MSCL showed similar performance improvement in Table 2. Does the quality of the style-transferred image important to guarantee the level of performance improvement (described in Table 2)? If yes, training details of the StyleGAN needs to be added in the literature.

The exemplary style-transferred images shown in the supplementary material seem to be normal images. Abnormal images with two types of lesions (especially the micro calc) would be helpful to understand the learned features of the StyleGAN and guestimate why the proposed method works well.

MSVCL of the Style B is superior to MSCL & MVCL, while they are similar in other styles. The authors should give some description about the difference of the trend in their experimental results. Mean & std of multiple runs also would be helpful.

MSVCL numbers in Table 2 & 3 are different. Is it correct?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Good adoption of the well-known previous works for the target problem. But, more details of the specific setting and additional explanation of the experimental results are needed to support the superiority of the proposed method in target applications.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper combines multi-style and multi-view data to do contrastive pretraining on multiple domains. The multi-style is achieved using CycleGANs on different domains. The multi-view is the two views (CC and MLO) of the mammograms. The pretrained model, after fine-tuning, performs better on unseen domains than baselines.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This work is a good combination of contrastive learning, multi-style augmentation and multi-view learning. It’s natural that it improves the domain generalization performance.
2. The experimental results support that the proposed method outperformed a few recent methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In Table 2, by comparing SimCLR (ImageNet → MammoPre) and MVCL, we can see the multi-view brought 0.015 improvement. In Table 3, EISNet was 0.02 worse than MSVCL. Was EISNet (as well as other baselines) trained with multi-view? If they were trained with single view only, then this was not a fair comparison, as the multi-view could largely eliminate this performance gap.
2. Multi-view learning on mammogram has been explored before, but related papers are not discussed or compared. E.g. “Multi-view Multi-task Learning for Improving Autonomous Mammogram Diagnosis”, Machine Learning for Healthcare Conference, 2019.
3. The authors should also compare with adversarial domain adaptation methods, e.g. RevGrad.
4. Although two unseen domains D and E were evaluated, they were actually both acquired using Siemens devices. In Fig 3 we can also see example images from domain D and E are visually similar.
5. Three baseline methods were evaluated in Table 3, which seem not be “many SOTA generalization methods” stated in the abstract.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method seems not difficult to reproduce by a third-party. Though, it will be a great relief for the community if the authors could release the code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. In Table 3, was EISNet (as well as other baselines) trained with multi-view? If they were trained with single view only, then this was not a fair comparison, as the multi-view could largely eliminate this performance gap.
2. Why the reported results of MSVCL in Table 2 and 3 are different (both at the bottom row)?
3. Why the “Random (no pretraining” method in Table 2 performed so poorly both on the seen domains and unsee domains? Was it trained with supervision? If so, it shouuld perform well on the seen domains.
4. The authors are suggested to compare with adversarial domain adaptation methods, e.g. RevGrad. Multi-view methods should also be compared with.
5. If possible, the authors should evaluate on another unseen domain (other than Siemens). If it’s infeasble, the authors should clearly indicate that domains D and E were both acquired using Siemens devices.
6. It seems quite time consuming to train M*(M-1)/2 CycleGANs, where M is the number of domains.
7. A few typos/grammar errors: 1) Paragraph 1 in Page 4, “The work [17] unidirectional takes …” 2) Paragraph 1 in Section 3.2, “we compare lesion detection performance with 1) no,” should be “ … 1) no pretraining”. 3) Paragraph 2 in Section 3.2, “The details ablation analysis” => “The detailed ablation analysis” 4) Paragraph 2 in Page 7, “either seen and unseen domains” => “either seen or unseen domains”; “our method can outperform the EISNet” => “our method outperformed the EISNet”
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. It makes sense to combine contrastive learning, multi-style augmentation and multi-view learning. It’s natural that it improves the domain generalization performance.
2. Although the experimental results showed that the proposed method outperformed a few recent methods, I have concerns whether the comparison was fair (See weakness 1).
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

These authors present a method for domain generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning. Fist, a CycleGAN is used to generate different styles from one vendor and the generated samples as used for contrastive learning. Second, for the Multiview, the authors treat the CC and MLO view of the same breast from the same patient as positive pair, whereas the other combination of CC and MLO is a negative pair for contrastive learning. Finally, the authors present a unified self-supervised contrastive learning to learn generalizable domain invariant features, which later can be used for the downstream task. The exhaustive experiments clearly show the contribution of their proposed approach for each of the components.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

• The authors address one interesting area of research for domain generalization in medical imaging, more specifically for Multiview and multi-style vendors. • The authors experiment clearly shows the contribution of each component of their proposed approach. • Comparison to classical transfer learning as well as to state-of-the-art domain generalization methods is well described and conducted with consistent improvement, showing the superiority of their proposed approach.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• The flow of the paper reading is good, but some parts of the paper are unclear please the constructive comments section. • The authors use abbreviations before introducing them. It may be hard for the reader to follow the different methods abbreviation and the views names without defining them first. See the constructive comments section.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors do not include statements regarding the reproducibility or open access to code or data.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

• In the introduction the sentence “Meanwhile, for the multi-view contrastive learning, the CC and MLO views of the same breast are also paired as positive samples” does not make clear what is exactly happening in the multi-view learning. Elaborating better this may be an easier read. • The authors use abbreviations like CC, MLO, MSCL, MVCL, MSVCL. The authors are encourage to introduce and define these abbreviations before using the terms. • I found a big paragraph repeated in the text: “We adopt ResNet-50 as the backbone model for contrastive learning and FCOS detector. For fair comparison, the learning rate and batch size for all contrastive learning schemes are set the same as 0.3 and 256, respectively. Meanwhile, all contrastive learning schemes in all experiments use the same diversifying opera- tions, including random cropping, random rotation in ±10◦, horizontal flipping and random color jittering (strength=0.2).” • Fig 2 and 4 could have better description. Since Figure 2 summarizes the proposed method, more information can be provided in the figure.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper addresses an interesting topic in the MICCAI community, domain generalization. The experiments conducted are exhaustive and consistent showing the success of the proposed method for domain generalization. They also show very decent comparison to state-of-the-art methods in domain generalization as well as with transfer learning. Other than minor changes to the paper will result in a really interesting finding for the MICCAI community.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work adopts GAN and contrastive learning for domain generalization. Reviewers recognized the overall contribution in adopting the techniques. However, more details are needed for the specific setting and experiment, especically for the GAN part, which can be unstable and needs to be shown important for the overall performance. Also, experimental comparisons will need some clarification. Please note, the aim of rebuttal is to clarify misunderstandings / rationale behind method and experiment settings. Promise of extra experiments will not be considered.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

Thank you for your time in review this paper. We hereby clarify several major comments and cordially request your consideration on this paper.

Q1: Importance of style transfer quality for detection performance. (R2) A: Our goal is to seek a robust encoder of detector to various manufacture domains with contrastive learning. Good quality of style transfer can be helpful for the contrastive learning. To our experience, reasonable quality of style transfer can be attained with more training epochs of 50 for CycleGAN models. To systematically find suitable models in terms of epoch setting, we devise an indirect selection method based on the detection performance. Specifically, we use the detection performance as metrics to decide which models from the epoch settings can produce helpful style transfer images with quality for the detection tasks (see “Implementation details” in Sec.3). We fix all hyper-parameters of CycleGAN and FCOS, except the epoch of the CycleGAN, in the indirect selection scheme.

The reasons of using indirect selection method are twofold. First, it is relatively difficult to quantitatively assess the quality of style transfer. The subjective assessment is impractical and improper. Second, these chosen models are based on the detection performance and may better align with our ultimate goal of detection.

In summary, the style transfer quality is important for contrastive learning. Accordingly, we devise an indirect model selection method to find models that may produce good quality of style transfer and detection results.

Q2: Training details of the styleGAN. (R2) A: The original CycleGAN was adopted here. The backbone of the generator is ResNet 9 blocks with 20 conv layers, while the discriminator is PatchGAN with 6 conv layers for binary classifications. In training, the sizes of inputs and outputs are 512*512 by random cropping the images. MSE and L1 losses are used for classification and reconstruction. For a style transfer, e.g., A to B, the involved training data from A and B are both 1000 for balanced training. We will add more details in the Supplement.

Q3: MSVCL numbers in Table 2 & 3 are different. (R2/R3) A: These are typos. The results of Table 3 are the correct one. We will fix it in the final version.

Q4: Support of reproducibility. (R2) A: Although it seems that the concern about reproducibility is different between R2 and R3. We will release the CycleGAN models, which were the major concern of R2, for the support of reproducibility.

Q5: Multi-view implementation for the 3 comparing methods. (R3) A: The 3 methods conduct domain generalization along with the downstream task. Therefore, we trained the 3 methods with the multi-view images from various vendors and explored the domain generalization correspondingly. Due to the space limitation, the details of training for 3 methods were not given in submission but will be added in the new Supplement.

Q6: No-pretrained performed poorly on seen domains. (R3) A: The no-pretrained detector was trained with supervision. The major reason why no-pretrained detector performed poorly on seen domains may lie in the very limited training data (360*3 annotated data), compared to the pretrained ImageNet from millions of images. Low level features from ImageNet may still be helpful.

Q7: Domains D and E may be the same. (R3) A: Although images of domains D and E shown in manuscript may look similar, it is worth noting that data of domain D were collected from Asian populations, whereas images of domain E were from Europe. Therefore, there may still exist domain gap. We’ll try to collect data of non-Siemens devices for further validation in the future study.

Q8: Compared with RevGrad. (R3) A: We have tried RevGrad, which majorly requires discriminators for multi-view and -vendor domains. The preliminary results were not very well and may need some modification to our problem. Therefore, we didn’t add it for comparison. We’ll explore it in future work.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The overall method setting makes sense to reviewers, though it is a combination of existing well-known methods. Major questions include 1) the details of training CycleGAN, since GAN networks can be unstable, and how it is properly trained is extremely important for representation learning, especially for disease cases, 2) performance gain significance, and 3) evaluation settings and fairness. Rebuttal added more details of the methods, and clarifies the “fairness” question properly. Still, I agree with reviewers on that it will be better if the two testing sets are from two vendors, rather than just population shift.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper brings interesting practical contributions for the community for the important domain generalisation problem. The rebuttal addresses the concerns raised by the AC and reviewers, so I recommend the acceptance of this paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

There are somewhat conflicting reviews given different reasons. The authors addressed most of these concerns. It is generally agreed by the reviewers that the work has value for presentation at MICCAI, which I would support. But, eventually this is a borderline paper and the decision may go either way.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

back to top

Domain Generalization for Mammography Detection via Multi-style and Multi-view Contrastive Learning