Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jiatong Cai, Chenglu Zhu, Can Cui, Honglin Li, Tong Wu, Shichuan Zhang, Lin Yang

Abstract

Ki67 is a significant biomarker in the diagnosis and prognosis of cancer, whose index can be evaluated by quantifying its expression in Ki67 immunohistochemistry (IHC) stained images. However, quantitative analysis on multi-source Ki67 images is yet a challenging task in practice due to cross-domain distribution differences, which result from imaging variation, staining styles and lesion types. Many recent studies have made some efforts on domain generalization (DG), whereas there are still some noteworthy limitations. Specifically in the case of Ki67 images, learning invariant representation is at the mercy of the insufficient number of domains and the cell categories mismatching in different domains. In this paper, we propose a novel method to improve DG by searching the domain-agnostic subnetwork in a domain merging scenario. Partial model parameters are iteratively pruned according to the domain gap, which is caused by the data converting from a single domain into merged domains during training. In addition, the model is optimized by fine-tuning on merged domains to eliminate the interference of class mismatching among various domains. Furthermore, an appropriate implementation is attained by applying the pruning method to different parts of the framework. Compared with known DG methods, our method yields excellent performance in multiclass nucleus recognition of Ki67 IHC images, especially in the lost category cases. Moreover, our competitive results are also evaluated on the public dataset over the state-of-the-art DG methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_27

SharedIt: https://rdcu.be/cymai

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Looking at domain generalization (DG) and transforming DG into a domain-agnostic subnetwork searching problem (prune-based generalization), the paper reports good performance for multiclass nucleus recognition of Ki67 IHC images. Results are evaluated on the public dataset in comparison to other DG methods as well.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Paper is well written. Ideas clearly formulated. The prune-based generalization in Fig 1 does in fact contain some new ideas. Relatively comprehensive testing (Table 2)
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Small number of images though neuclei count is perhaps high (no stats provided) Comparing with PACS not necessary and unrelated to IHC challenges. Hence, section 3.2 and Table 3 are redundant for a medical image anayis paper
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- No code avaiable
- Data seem to private
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Remove PACS
- Add more stats on neuclei count
- Justify the methds in Table 2. Why not others?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

NIce idea, clearly proposed, good experiements.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

3
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper proposes a model pruning strategy to address the domain generalization challenge in multi-source model training. The experiments are conducted in both medical and natural imaging tasks, demonstrating the robustness and generalizability.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The motivation and related work are clearly stated.

I appreciate the authors validate the idea on both medical and natural imaging datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The methodology is somewhat not easy to follow, especially Section 2.2.

The ablation study on the prune rate p% is unclear. If the experiments conclude that pruning on all modules achieves the best performance, what is the reason to introduce this hyperparameter?

The interpretation of the results in Table 2 should be further clarified.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think the pruning rate p% is not easy to determine when applying this idea to alternative applications, which may lower the reproducibility of the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Regarding nucleus recognition in Table 2, the authors present many results but fail to interpret them and make clear conclusions.
1. What do you mean by “merge” and “unseen”? By comparing performance between “merge” and “unseen” for the same algorithm, what can we conclude?
2. What is the reason to take ERM-F as a reference rather than ERM?
3. What do “Ours-Encoder” and “Ours-Decoder” mean? What can be concluded by comparing encoder, decoder, and all?
4. Why some of the results are bolded?
Considering the ablation study of pruning parameters, I am confused about the conclusion.
1. “Pruning on all modules achieves the best performance” and “the importance of allocating parameters to prune in a balanced manner” seem contradictory. How to ensure this balance when addressing a new application? If pruning all modules works the best, why do we need to consider a balance?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper tackles a significant problem (domain generalization), and the authors provide experimental evidence showing the proposed method is effective for both medical and natural imaging tasks. However, the result of nucleus recognition (Table 2) and the choice of pruning rate p need to be further clarified.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Somewhat confident

Review #3

Please describe the contribution of the paper

In this work, the authors proposed a domain generalization model based on pruning and finetuning strategies. The proposed method is validated on the detection and classification tasks for a Ki67 dataset, as well as a PACS dataset for object classification. The proposed method has achieved state-of-the-art performance on both.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The proposed study on domain generalization for cross-domain Ki67 dataset analysis is important, which reduce the cost for data sharing.

2) The theoretical analysis is interesting and provides convincing proof of the proposed method.

3) The overall paper is clearly written and easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) There lacks comparison with more recent domain generalization methods, such as [a], [b], and [c].

[a] Learning to balance specificity and invariance for in and out of domain generalization, in ECCV 2020. [b] Domain Generalization via Model-Agnostic Learning of Semantic Features, in NeuIPS 2019. [c] Domain Generalization by Solving Jigsaw Puzzles, in ICCV 2019.

Currently, the comparison methods in Table 2 are out-of-date.

2) There lacks experiments on public medical dataset. Currently, the experiments are conducted on one in-house dataset, and a general classification dataset. This makes the overall contribution of this paper on medical image analysis limited.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The medical dataset in this paper seems to be private, which makes it questionable to reproduce the proposed method on related medical image analysis tasks.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1) For the testing unseen domains in the Ki67 dataset, there are three new types of tumors (cervix, pancreas, and urethra). It would be more convincing to include the detailed results under different tumor classes in Table 2, which can further prove the model generalization ability.

2) In Table 2, it seems that the proposed method is not always the best under different metrics. Please include some explanations on this issue.

3) In Table 2, please include the results without generalization, and the upper bound results with the model trainned on the target domain in a fully supervised manner.

4) To better emphasize the contribution of this paper on medical image analysis, I suggest that you can include more domain generalization tasks about medical images, other than general images.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Due to the lack of comparison with related domain generalization methods, and the less convincing experimental results from the view of medical image analysis, the overall contributions of this paper is limit.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

8
Reviewer confidence

Confident but not absolutely certain

Review #4

Please describe the contribution of the paper

This paper applied domain generalization to the multi-source nucleus recognition problem. The authors proposed a pruning-based method to find the domain invariant features across different domains. Experimental results demonstrated that the proposed model is better than several baselines.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed pruning-based algorithm is well explained.
2. Experimental results are clearly discussed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1.The biggest concern of this paper is that the model architecture and loss functions are unknown. Although there is an implementation part, the details of them are not mentioned. At least, the overall optimization function should be mentioned; otherwise, it is hard to tell how the neural network is optimized.
1. Some sentences are not proper. Such as, on page 2, “Yet, since the model is barely trained on the source domain and target domain, the performance is flawed on unseen domains.” For general domain adaptation, the model is trained by all source domain and target domains. Why “barely” here? The causality seems not solid. “What’s more, some studies are not applicable when certain categories do not exist overall training domains. ” Need “in” here, and it seems that this problem still remains in this paper.
2. One concern of “Ki67” dataset is that the number of images is small (less than 200). Also, the domain difference is unknown here. It is better to show some example images.
3. In terms of the results part, the paper did not compare with SOTA methods. The frequently mentioned DANN method was proposed several years ago. More recent methods should be compared, e.g., [1-2]. In addition, compared with the ERM method and other baselines in both Tab.1 and 2, the proposed method is not significantly better. Especially, the results in the PACS dataset are worse than the results in [3].
4. For the ablation study, it is better to show a table. It seems that the lottery ticket hypothesis in the motivation part is not well proved.
[1].Kang, G., Jiang, L., Yang, Y., & Hauptmann, A. G. (2019). Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4893-4902). [2]. Zhang, Y., Tang, H., Jia, K., & Tan, M. (2019). Domain-symmetric networks for adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5031-5040). [3]. Carlucci, F. M., Porzi, L., Caputo, B., Ricci, E., & Bulo, S. R. (2020). MultiDIAL: Domain Alignment Layers for (Multisource) Unsupervised Domain Adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

With the unknown model architecture and loss functions, it is difficult to reproduce the results in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See 4.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper received diverging scores from reviewers. Although reviewers found some merits of the paper (e.g., an interesting method to conduct nucleus recognition on cross-domain image data and achieve promising performance), they raised some important issues: the method is not well presented (section 2.2) and some critical technical details are missing, experimental results are not clearly interpreted (Table 2), the dataset size is small and might not be sufficient to verify the effectiveness of the proposed method, and there is a lack of a comparison with recent state-of-the-art methods.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Author Feedback

R#1 Q3 Thanks for your suggestion. A total number of 35,948 individual nuclei are annotated which can be included in the final version. Q6 Reasons for not comparing with other methods, R#4 pointed out the same issue. For detailed interpretation, please refer to the answer of Q3, R#4.

R#2 Q3 We conducted ablation experiments by adjusting p% from 1e-2 to 1e-5. Due to space limitations, this part is not included. Q6 1.‘Merge’ and ‘unseen’ represent domains participating in training and never involved in training, respectively. We listed the performance on the two distinct domains in Table2 not for comparisons. We can conclude from the results that our method managed to achieve the goal of preserving the accuracy on the training(merge) domains while performing well on unseen domains as stated in Section 2.2. 2.In our method, the model is finally fine-tuned. We take ERM-F as a reference to guarantee the improvement is not gained from fine-tuning. 3.‘Ours-Encoder’ and ‘Ours-Decoder’ denote applying the pruning method only on the ‘encoder’ and ‘decoder’ modules. Previous works such as [1, a] always apply DG methods on the ‘decoder’ modules. However, our method performs the best applied on all modules. 4.The word ‘balance’ may be misleading. We suggest that pruning weights globally is superior to pruning on certain modules.

R#4 Q3 We did not intend to turn a blind eye to these methods. Though rather good performance achieved in those papers, they are not applicable in our problem setting. [a] is designed to be task-specific as the method is applied to certain classifiers. It is hard to get it reproduced on a cell recognition task. In [b], the class alignment loss can’t work cause the categories are inconsistent among domains. Besides, the triplet loss exerts a pair-wise sampling strategy. It is however biased because certain categories of training domains are missing, ‘positive’ samples can only be selected from domains containing such categories. Similarly, the SOTA DG method[2] based on pair-wise matching also fails in our problem setting. The effectiveness of [c] on DG in classification relies on the spatial co-location of image parts. It is however of limit significance in cell recognition as the labels are cell-level rather than image-level. Q6 In Table 2, the detection performance degradation of our method on merge domains is reasonable. Because ERM, with limited generalization, tends to over-fit on domain-specific features which may help with the accuracy of merge domains. Even so, our methods surpass ERM by a non-trivial margin of 1.73% (F1-score) on the merge domains in classification.

R#5 Q3 1.We have indeed stated the model architecture and loss function in our original paper. (Section 3.1, 2nd paragraph, Line 6-8). 2.Unlike DA, DG is evaluated on unseen domains rather than target domains. The word ‘barely’ emphasizes the limitation of DA methods on unseen domains. The ‘missing category’ problem remains is incorrect, our method managed to work under the inconsistent category condition as shown in Tabel 1. 3.Our dataset is sufficient to verify the effectiveness of the proposed method. Comparing with recent similar work [3], our data has more cases (41 vs 38), larger resolution (19201080 vs 500500), and more annotations. 4.Our method focuses on the DG rather than DA. The referenced methods conduct DA without evaluating results on unseen domains. So it is unfair to compare the results of our method with those in the referenced papers. Reasons for not comparing with other recent DG methods in Table2, please refer to Q3, R#4. 5.The lottery ticket hypothesis is a classic pruning method and needs no additional proof.

References: [1]Self-challenging Improves Cross-Domain Generalization. ECCV 2020. [2]Domain Generalization using Causal Matching. ICML 2020. [3]Pixel-to-Pixel Learning With Weak Supervision for Single-Stage Nucleus Recognition in Ki67 Images. IEEE TBME 2019. [a,b,c] are referenced by R#4.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This manuscript presents a pruning-based domain generalization method that is able to recognize different types of nuclei in unseen domains. The rebuttal has addressed most of the major concerns regarding interpretation of experimental results and comparison with other related state-of-the-art methods. The author would need to improve the clarity of the presentation.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

14

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper proposes a prune-based domain generalization model for nucleus recognition, which is able to learn invariant representations across different domains. The method is interesting and has achieved good performance, though more analysis regarding the results should be discussed. The authors have responded to certain concerns of the reviewers, while questions like missing technical details and lacking results interpretation are not fully addressed.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The work proposed a model pruning framework to address the domain generalization challenge in multi-source model training. The proposed method was evaluated on the detection and classification tasks for a Ki67 dataset, as well as a PACS dataset for object classification.

The work proposed an interesting method. However, as summarized in the original meta-review, a variety of issues were with the paper’s presentation and experimental design. The authors did promise to increase the dataset size in the rebuttal. However, it cannot be counted. The submitted work is immature for a MICCAI publication. The authors are encouraged to organize their latest results for publication in some new venue.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

14

back to top

Generalizing Nucleus Recognition Model in Multi-source Ki67 Immunohistochemistry Stained Images via Domain-specific Pruning