Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Kazuma Fujii, Daiki Suehiro, Kazuya Nishimura, Ryoma \Bise

Abstract

Cell detection is an essential task in cell image analysis. Recent deep learning-based detection methods have achieved very promising results. In general, these methods require exhaustively annotating the cells in an entire image. If some of the cells are not annotated (imperfect annotation), the detection performance significantly degrades due to noisy labels. This often occurs in real collaborations with biologists and even in public data-sets. Our proposed method takes a pseudo labeling approach for cell detection from imperfect annotated data. A detection convolutional neural network (CNN) trained using such missing labeled data often produces over-detection. We treat partially labeled cells as positive samples and the detected positions except for the labeled cell as unlabeled samples. Then we select reliable pseudo labels from unlabeled data using recent machine learning techniques; positive-and-unlabeled (PU) learning and P-classification. Experiments using microscopy images for five different conditions demonstrate the effectiveness of the proposed method.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_41

SharedIt: https://rdcu.be/cymax

Link to the code repository

https://github.com/FujiiKazuma/CDFIAPLSUP.git

Link to the dataset(s)

http://celltrackingchallenge.net/2d-datasets/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a method for cell detection which can learn from non-exhaustive annotations. In first stage, it is trained only using positive samples, rest of the images are considered ignore region. Then, detections in unlabeled regions are ranked and top/bottom ranking detections are labeled positive/negative samples. This process is repeated multiple times.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Most of the paper is well-written.
    • Fig. 1 gives a good overview of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is unclear what exactly is “masked loss”? It seems loss is only computed for regions close to the annotated cells but what is the actual loss function?
    • What is the contribution of PU-learning and P-classification step. Some ablation experiments should have been included. E.g. if the features from the first detection network are used, what would be the difference in performance.
    • One of the two datasets used is from cell tracking challenge [1], it would have been good to compare the performance with some of the methods listed on their website and report the metrics used by the challeneg, e.g. DET, used to evaluate detection performance. [1] http://celltrackingchallenge.net/latest-csb-results/
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • “it is very costly to annotate all the cells in an image since there are as many as hundreds or thousands of cells in an image” It is not necessary to annotate the whole image, small regions in an image can be exhautively annotated and either only those regions can be used during training or the the loss is only calculated for those regions.
    • “some of the current public data-sets only provide partially annotated cells (imperfect annotation)” Please provide references to some of these datasets.
  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The contribution of the proposed techniques (PU-learning, P-classification) was not analyzed sufficiently.
    • The selected baselines are very weak, especially for HSC dataset, some methods from cell tracking challenge could have been used (they have DET scores around 0.99, i.e. almost perfect detections), which had very significantly better performance than the chosen baseline methods (F-score of 0.1-0.48).
  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Somewhat confident



Review #2

  • Please describe the contribution of the paper

    This paper proposed an iterative manner of cell detection using pseudo labeling from the imperfect annotation in order to reduce the false positives. The method is tested on two datasets which can obtain the best performance compared with other baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Interactive pseudo labeling idea is interesting.
    • The writing is very good. Figure 3 is very illustrative to show the effectiveness of the labeling process.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The biggest issue is that the baselines are too simple. As the method proposed by the author is a pseudo labeling method to deal with label noise, the baselines to be compared with should also be some SOTA methods dealing with label noise (e.g. https://arxiv.org/pdf/1412.6596.pdf, this is not SOTA but very popular). However, the baselines author chose are not very competitive.
    • The workflow seems a little bit redundant to me. As two different feature extractors are used (one in the first Unet, another is in PU-learning). Why not using only one feature extractor and do thresholding on the heatmap directly for the pseudo labeling?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It should be reproducible as long as the code is released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Please check the commend in the weekness section. Providing stronger baselines can largely improve the quality of the paper.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting and well presented. The main weakness is the baseline methods are not good enough and more insights of why using current framework will be helpful for the reader.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper use PU learning and P-classification to select pseudo labels from unlabeled data to train the detector for cell.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Using PU learning as feature extractor and P-classification as the ranking system is novel.

    2. Enough experiment, especially for Table 2 and Table 3, it’s great to see the improvement over iterations.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some hyper parameters are not studied, such as the alfa in p-classification.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and evaluation can be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The ratio alfa of selected top K in ranking is set to 0.05. It’s will be very interesting to see how the alfa effect the iteration number and performance

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method of using PU and p-classification is novel and enough iteration to support the method.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    7

  • Reviewer confidence

    Very confident



Review #4

  • Please describe the contribution of the paper

    This paper presents an interesting self-training method for solving a very practical problem in cell detection. The proposed method can utilize partially annotated cell detection data to train a network. The overall framework is self-training. First, a detection network is trained on partially annotated images with loss being computed only for the labeled locations. The trained network is applied to the same partially annotated data to generate true positive and false positive detections. Next, patches are cropped based on the detection results. The patches are used to train a classifier network in a positive-and-unlabeled learning regime. The classifier network is applied to the patches to generate the feature representation for each patch. The feature vectors corresponding to high confident detections are selected by a P-classification procedure. These are used as ground truth label in the next training iteration. The experimental results seems promising.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The problem under consideration is of importance. It solves a very practical problem that is there are many partially annotated pathology images in which only a smaller portion of the cells are annotated. The existing deep learning method mostly assume all instances should be annotated;
    2. The method seems working very well for simple microscope images. It may make a fundamental work for future works that can work for more challenging images.
    3. The organization and presentation is very smooth. It is a very good write-up.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no particular weak points, only one thing is whether this can be applied to pathology images?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    the data used is public data. The authors will release code upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    In the positive-and-unlabeled learning, when cropping the patches, how to ensure the cropped unlabeled patch contain a cell in its centroid position? If not in the centroid position, how to create the Gaussian peak for this cell for next training session?

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The problem under consideration is of importance. It solves a very practical problem that is there are many partially annotated pathology images in which only a smaller portion of the cells are annotated. The existing deep learning method mostly assume all instances should be annotated;
    2. The method is novel;
    3. The method seems working very well for simple microscope images. It may make a fundamental work for future works that can work for more challenging images.
    4. The organization and presentation is very smooth. It is a very good write-up.
  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    7

  • Reviewer confidence

    Very confident



Review #5

  • Please describe the contribution of the paper

    This paper is about how to detect cells in microscopy images with partial annotations. The main idea is trying to employ P-classification to select cell samples as pseudo labels to help train the detection network. In this paper, the authors propose a new method to extract the patch representations with PU-learning. Then, the top-and bottom-ranked samples in P-classifications will be used as new labeled samples in the next iterative model training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper include the following items:

    1. They employ PU-learning to extract optimal image patch features with the label information of positive and unlabeled. This step employs all the labeled and unlabeled patches in the original data set.
    2. The P-classification will give ranking scores to all the input image features. After that, some confident positive and negative patches will be selected and added to the dataset as pseudo-labels.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of this paper include the following items:

    1. In the P-classification, top rank and top-bottom samples are selected in each training iterations. Then, several difficult image patches will be selected (although not very confident) and added as pseudo labels. Then, the cell detection results will be falling down. The results are shown in Table 2 and Table 3. Therefore, I think the pseudo label selection metric should be adapted/updated for each training iterations.
    2. Based on the examples in Fig 2, we can observe that the model performs well on cells with a regular shape (BF-C2DL-HSC) but will generate some errors with irregular cell shapes. I think this weakness can be addressed during the pseudo-label selection procedure. Probably the authors can consider selecting cells with different shapes (i.e., cells in different groups). Under the current model architecture (Fig.1), the model probably is only good at selecting one certain type of cells (e.g., regular shape cells).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In this paper, the proposed model is evaluated on five conditioned data from two public datasets. However, I cannot find much information related to implementation details, such as the details of detection network, hyper-parameter settings. It will be helpful for other researchers to reproduce your work with more details of the proposed model.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Here are some additional comments about this paper:

    1. Please consider adding more implementation details.
    2. Please consider adding more discussion about the content in Table 2 and Table
    3. Several more examples of selected pseudo labels for different condition cells should be added (e.g., Fig. 3).
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I make this decision based on the proposed model to learn optimal patch representation with PU-learning and also select positive and negative samples with P-classification. These methods do not need human experts to join the model training loop. Although the sample selection metrics are not perfect, this paper is still worth to be accepted.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    2

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposes an approach to train cell detectors from imperfect annotations, using positive and unlabeled (PU) learning and P-classification to select pseudo labels.

    The reviewers have a few questions/comments on improving the paper:

    1. Ablation studies are expected to analyze the effectiveness of PU-learning and P-classification. How will the pseudo label selection rules/parameters affect the training?
    2. More related methods are expected to be compared. The performance difference between fully-supervised methods and learning from imperfect annotation needs to be analyzed. It is surprising to see one recent PU-learning method [19] so poorly compared to the proposed one. What is the major difference between the two PU-learning methods? If the prior is provided to [19] and the method is well-trained on the cell image datasets, will it generate comparative performance with the proposed? The reviewers have some other detailed questions/suggestions. Please consider to address in the rebuttal.
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We would like to thank all the reviewers so much for their insightful comments and positive evaluation (4 reviewers gave acceptance scores including strong accept (9)). For instance, reviewer 2 (R2) commented that the idea of interactive pseudo labeling is interesting. R4 commented that problem setting is practical and the paper may make a fundamental work. R1, R2, and R4 commented that the paper is well-written. In reviews, basically, no reviewer concerns the novelty and technical contributions of our method. The main concerns are the explanation of the experiment details and additional experiments. We would like to address these concerns and performed additional experiments as much as possible in the short rebuttal term.

  1. Ablation study for PU-learning and P-classification We first state that parts of ablation were performed and described in the paper; the results of the experiment without PU-learning and P-classification are shown in Table 1 as “With mask”. The results of the experiment with PU-learning are shown in Table 1 as “Yang [19]”. A preliminary study we conducted has shown that when trying to select pseudo-labels based on the output of PU-learning, many wrong pseudo-labels are selected in patches where the foreground and background are ambiguous. Therefore, P-classification is considered to be effective in the selection of pseudo-labels.

  2. Sensitivity with hyperparameter The percentages of pseudo labels to be selected, α and β, were set to 5% in all our experiments, in which we did not perform optimal parameter search for a fair comparison. These parameters work well as long as it is not set to an extremely large value. As an additional experiment, we set α and β to 2.5% and applied the proposed method to the data set of Control in C2C12. As a result, we obtained an F-measure of 0.871, which was better than the original score 0.834 in the paper. Based on this result, we confirmed that these parameters were not sensitive to the performance. For p in P-classification, we used 4, which was used in the reference paper [4]. This p-value will work if it is not too small or too large.

  3. Comparison with fully supervised method Thank the reviewer’s suggestions to add the comparison with fully-supervised methods. We applied the baseline method [10] using fully supervised training data to the BF-C2DL-HSC data set. The results showed that the accuracy of Precision, Recall, and F-measure were all above 0.99 since this dataset was relatively easy. Compared to these results, in our setup, we only used 10% of the training data, the proposed method achieved 0.952. The comparison with fully supervised learning for the four conditions of C2C12 can be found in the reference paper [Nishimura+, ECCV, 2020], although the experimental conditions are not the same. It showed that 0.92 on Control, 0.92 on FGF2, 0.98 on BMP2, 0.96 on FGF2+BMP2. As shown in Table1 of our paper, the experimental result showed that 0.834 on Control, 0.768 on FGF2, 0.769 on BMP2, 0.578 on FGF2+BMP2, 0.952 on BF-C2DL-HSC.

  4. The reason for poor results by PU-learning in some datasets. If the prior is provided to [19], the method will generate comparative performance? In our method, we treat the detection results using the detection CNN trained by sparse supervised data as the unlabeled data in PU-learning. In such a situation, the detection CNN is unstable, and thus the prior of the positive and negative samples in the test data is also unstable (e.g., sometimes the CNN detects many positions as unlabeled data). Even if we use the correct prior in training, the prior in the test may be different from that in training. In this case, the PU-learning could not work well. In contrast to directly using PU-learning for detection, our method re-trains the detection CNN using confident pseudo labels selected by using P-classification. It is much stable than [19]. We will add the above discussions in the final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors’ feedback addressed the main questions raised by the reviewers. The ablation studies without PU-learning or P-classification should be clearly demonstrated. Sensitivity studies and fair comparisons are expected as well.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a novel iterative method for cell detection from imperfect annotation by using pseudo-label selection. Experimental results on two data sets show the potential of the method compared to several baseline methods. The paper is well written and has strong reviewer support. In my opinion the main weaknesses are the limited data used and the lack of comparison with more state-of-the-art methods that are known to perform better than the baseline methods considered in the experiments. Unfortunately the rebuttal does not take away these concerns. Other reviewer concerns are sufficiently addressed though. Overall the work is interesting and can be considered for inclusion in the MICCAI 2021 program.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript presents a deep semi-supervised method for cell detection in microscopy images, using only partially annotated training data. It provides a promising approach to address an important issue for microscopy image analysis, i.e., individual cell annotation is expensive and the annotation might be imperfect. The rebuttal has addressed most reviewers’ concerns (e.g., effects of important components/hyperparameters and comparison with fully supervised method [10] and others [19]), and thus the paper is recommended for acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



back to top