Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Renzhen Wang, Yichen Wu, Huai Chen, Lisheng Wang, Deyu Meng

Abstract

Consistency regularization has shown superiority in deep semi-supervised learning, which commonly estimates pseudo-label conditioned on each single sample and its perturbations. However, such a strategy ignores the relation between data points, and probably arises error accumulation problems once one sample and its perturbations are integrally misclassified. Against this issue, we propose Neighbor Matching, a pseudo-label estimator that propagates labels for unlabeled samples according to their neighboring ones (labeled samples with the same semantic category) during training in an online manner. Different from existing methods, for an unlabeled sample, our Neighbor Matching defines a mapping function that predicts its pseudo-label conditioned on itself and its local manifold. Concretely, the local manifold is constructed by a memory padding module that memorizes the embeddings and labels of labeled data across different mini-batches. We experiment with two distinct benchmark datasets for semi-supervised classification of thoracic disease and skin lesion, and the results demonstrate the superiority of our approach beyond other state-of-the-art methods. Source code will be publicly available.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_41

SharedIt: https://rdcu.be/cyl2L

Link to the code repository

https://github.com/renzhenwang/neighbor-matching

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a neighbor matching algorithm to estimate soft labels for semi supervised learning. Moreover, it introduces an attention-based memory mechanism in order to keep track of the embedding space during training for each class.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

I very much enjoyed reading the paper. The manuscript is well-written and well-organized with clear definition of the problem. From the technical point of view, the authors have addressed the problem of error accumulation through memorization of the embedding space for each class. This is an interesting approach, since the latent vectors of data points provide a compact representation for calculating similarity. Finally, the authors provide enough experiments to evaluate their method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Using a queue mechanism to dynamically update the memory bank sounds like a naive approach. This can lead to low confidence samples enter the memory and gradually drive the pseudo labels to wrong predictions.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provided sufficient details of the training algorithm and their hyper-parameters in the paper. The methodology is simple yet elegant. Therefore, I believe their results can be reproduced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I suggest investigating a selection strategy for dynamic updates. As an example, this can be done via uncertainty measures. I also suggest expanding the scope of the work. With some modifications the proposed algorithm might be applied to image segmentation.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a simple yet elegant approach for generating pseudo labels. The manuscript is well-organized with sufficient experiments to support the hypothesis.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper presents a semi-supervised image classification algorithm based on similarity weighted label propagation for pseudo label prediction and a novel “memory bank” mechanism. To generate pseudo labels for the unlabeled images, the unlabeled image is fed into the network to generate its embedding. The embedding is compared to the embedding of the labeled images in the “memory bank”. The pseudo label is computed as similarity weighted average of the ground truth labels. The “memory bank” is a set of embedding vectors that has high confidence during the inference. The “memory bank” is updated during each mini-batch by enqueue the high confidence labeled samples in the current mini-batch and dequeue the embedding vectors from the previous mini-batches. In this way, the “memory bank” is updated randomly so that different unlabeled samples and labeled samples can meet each other there. Experiments are conducted on two public datasets. The results seems convincing and promising.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The problem is of interest and of importance.
2. The proposed method is a plug-and-play module, very easy to implement.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The experiment is a little weak, there is no comparison against at least a typical fully supervised method to spot light the effectiveness of the proposed method.
2. The experiment can be stronger if the proposed method is compared to some state-of-the-art methods, like Mean Teacher, Temporal Ensembling.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
It is likely to reproduce given the following reasons:
1. The data sets used are public dataset, the authors described the split of the data in detail;
2. The authors are willing to release code upon paper acceptance.
3. Details of the hyperparameters are included in the implementation details in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
This paper presents a semi-supervised image classification algorithm based on similarity weighted label propagation for pseudo label prediction and a novel “memory bank” mechanism. To generate pseudo labels for the unlabeled images, the unlabeled image is fed into the network to generate its embedding. The embedding is compared to the embedding of the labeled images in the “memory bank”. The pseudo label is computed as similarity weighted average of the ground truth labels. The “memory bank” is a set of embedding vectors that has high confidence during the inference. The “memory bank” is updated during each mini-batch by enqueue the high confidence labeled samples in the current mini-batch and dequeue the embedding vectors from the previous mini-batches. In this way, the “memory bank” is updated randomly so that different unlabeled samples and labeled samples can meet each other there. To counteract the class imbalance problem, the number of labeled samples from each class is equal in the “memory bank”. Experiments are conducted on two public datasets. The results seems convincing and promising. Only one small drawback is the proposed method is not compared against some typical fully supervised method to show the accuracy gap, if there is any. Also the proposed method is not compared to some state-of-the-art semi-supervised methods, like Mean Teacher, Temporal Ensembling and VAT.
1. The experiment is a little weak, there is no comparison against at least a typical fully supervised method to spot light the effectiveness of the proposed method.
2. The experiment can be stronger if the proposed method is compared to some state-of-the-art methods, like Mean Teacher, Temporal Ensembling.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The problem is of interest and of importance.
2. The proposed method is a plug-and-play module, very easy to implement.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The authors proposed a semi-supervised learning method called Neighbor Matching, which 1) projects the unlabeled data (x_j) into feature space, 2) measures the distances between the features of x_j with the features of x_i (labeled data), 3) generate pseudo-labels using a weighted sum function. The hypothesis is the images of the same disease should close to each other in the feature spaces. In addition, they also proposed a memory padding module that is designed to dynamically update the local neighborhood.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The neighbor matching idea is inspired by clinical practice that is somewhat novel. The paper is well written and technically sound.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The proposed method is based on a strong assumption that similar diseases should be close to each other in the feature spaces. This assumption may be reasonable from the clinical point of view. However, from an algorithm point of view, the performance might highly depend on the feature extractor (h(•)).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper includes all the necessary implementation details, e.g., network architecture, hyperparameters, etc.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
The neighbor matching idea is somewhat novel, and the paper is well written. However, the review has the following comments/concerns.
1. The proposed method is based on a strong assumption that similar diseases should be close to each other in the feature spaces. It will be good if the authors can test this hypothesis. At least visualize some features using something like tSNE.
2. The authors evaluated the proposed method on two datasets compared against two baseline models. Though the proposed method has a better performance in almost all the settings, the performance (number-wise) is not very impressive. For instance, C2L[1], a self-supervised method, achieves the 0.88 AUC on CheXpert with ResNet18 backbone. However, the highest reported number in this paper is 0.6934 AUC. Though there are different reasons why the number is so low such as the performance may be limited by the backbone network–AlexNet or the proposed method only trained using 500 labeled samples, it will always be nice to see what is the highest performance the method can push.
3. IMHO, semi-supervised training is also related to network pretraining because both of them are dealing with the small amount of labeled data issues in deep learning training. It will be nice to show the comparison result against popular pretraining methods, such as the ImageNet pre-trained model, self-supervised pre-trained modl[1,2], weakly-supervised pretraining method[3], etc. At least introduce a few different methods in the background. Overall, I think this is good work. However, it is slightly below the bar of the MICCAI paper.
========= [1] Zhou, Hong-Yu, et al. “Comparing to learn: Surpassing imagenet pretraining on radiographs by comparing image representations.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020. [2] Zhou, Zongwei, et al. “Models genesis: Generic autodidactic models for 3d medical image analysis.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019. [3] Liang, Gongbo, et al. “Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging.” arXiv preprint arXiv:2010.03060 (2020).
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There are two major factors that led the review to the overall score of this paper: 1) feature evaluation/visualization may be needed to support the hypothesis; 2) more evaluation result may be needed, especially to show how high the performance can be pushed with the proposed method.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a neighbor matching algorithm to estimate soft labels for deep learning under the semi-supervised learning setting. All the three reviewer consistently agree that the neighbor matching idea is interesting and novel. The AC agrees with the reviewers that the presented method can be useful semi-supervised learning, as well as its applications in MIC and CAI. So the AC recommend an acceptance of this paper. The authors are encouraged to carefully check the review comments and address the issues raised therein.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

Thanks for the comments from the reviewers. It is pleasing to see that the reviewers agree that the proposed neighbor matching method can be useful for pseudo-label estimation in semi-supervised learning. In particular, we are very grateful to reviews for their constructive comments, including: 1) Reviewer #1 suggest using uncertainty measures to dynamically update the memory bank for pseudo-label estimation with high confidence. 2) Reviewer #2 suggest conducting comparison experiments with other state-of-the-art semi-supervised learning methods. 3) Reviewer #3 suggest leveraging more powerful backbone to achieve the performance upper bound of the proposed method.

We will strengthen our work in the extension version according to all these valuable comments. In the following, we provide the response to the major issues raised by Reviewer #3.

Q1: The proposed method is based on a strong assumption that similar diseases should be close to each other in the feature spaces. A1: Our basic assumption is that images with the same category usually lie in a low-dimensional manifold, which does not mean that these images should be close to each other in the feature spaces. Actually, our memory padding module mainly aims to make the pseudo-label estimation work well when one class is multi-modal in the feature spaces. Beyond that, the temperature parameter T in Eq. 2 is designed to adjust the contributions of different labeled samples in the memory bank to the estimated pseudo-labels. For example, a large enouth T can make the unlabeled sample inherit leabel from the nearest one, which does not require that all the samples with the same disease sholud be close to each other.

back to top

Neighbor Matching for Semi-supervised Learning