Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Stanislav Lukyanenko, Won-Dong Jang, Donglai Wei, Robbert Struyven, Yoon Kim, Brian Leahy, Helen Yang, Alexander Rush, Dalit Ben-Yosef, Daniel Needleman, Hanspeter Pfister

Abstract

The developmental process of embryos follows a monotonic order. An embryo can progressively cleave from one cell to multiple cells and finally transform to morula and blastocyst. For time-lapse videos of embryos, most existing developmental stage classification methods conduct per-frame predictions using an image frame at each time step. However, classification using only images suffers from overlapping between cells and imbalance between stages. Temporal information can be valuable in addressing this problem by capturing movements between neighboring frames. In this work, we propose a two-stream model for developmental stage classification. Unlike previous methods, our two-stream model accepts both temporal and image information. We develop a linear-chain conditional random field (CRF) on top of neural network features extracted from the temporal and image streams to make use of both modalities. The linear-chain CRF formulation enables tractable training of global sequential models over multiple frames while also making it possible to inject monotonic development order constraints into the learning process explicitly. We demonstrate our algorithm on two time-lapse embryo video datasets: i) mouse and ii) human embryo datasets. Our method achieves 98.1% and 80.6% for mouse and human embryo stage classification, respectively. Our approach will enable more profound clinical and biological studies and suggests a new direction for developmental stage classification by utilizing temporal information.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_35

SharedIt: https://rdcu.be/cymaq

Link to the code repository

https://github.com/stlukyanenko/lc-crf-embryo-classification

Link to the dataset(s)

http://celltracking.bio.nyu.edu/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a Machine Learning model for embryonic staging from time-lapse videos. The method consists of a two-stream model to use image information for stage classification but to also incorporate temporal information to detect changes. This information is then combined within a CRF model that enables the incorporation of additional constraints (such as monotonicity). The methods performs very well in the validation on mouse and human videos presented in the manuscript.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a strong method to classify stages from videos of development. Such a tool can be of great interest to many different applications in developmental biology (apart from the use-case given in the paper) and even beyond.

    The presentation of the paper is excellent. The story is well written and well motivated. The authors give all necessary technical details.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I do not see a major weakness of this paper.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I would rate the reprocubility of the paper as very high.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I just have two minor questions:

    1. Footnote 1 says “Since the potentials in a CRF do not need to be log probabilities, normalization via the softmax function is not strictly necessary.” Did you really mean log probabilities here or just probabilities? Maybe I’ve been missing something here.

    2. On p. 6 when explaining the training, the authors mention: “To construct a batch, we randomly sample 50 frames from each video in a consecutive order.” Just to be clear: Does it mean you sample a starting point k and then take the frames in [k, k+50] or do you really sample 50 frames randomly (as long as they remain ordered in the end)?

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is one of the best papers in my stack. I have no doubt that the presented method will be of interest to the community. I see a lot of applications in developmental biology in general, where stage classification is a common problem.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The paper describes a method for automatic staging of microscopy images of early embryos (here: mouse and human embryos) that is of importance for in vitro fertilization. The method reasonably combines deep learning and a classical method to come up with the stage prediction. The obtained results are promising and outperform the state-of-the-art in most scenarios.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The work presents a combination of CNNs and a linear-chain CRF that elegantly exploits the strengths of both approaches. Exploiting both the spatial and the temporal domain for the stage prediction is clearly beneficial and the linear-chain CRF allows to constrain the model predictions with reasonable assumptions about the acquisition process of the image data. While the obtained results are not significantly different, the ablation studies indicate that all of the design choices are well-justified and yield the best-performing method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors state that “most” other methods don’t use the temporal information many times in the paper. I think this should not be stressed that boldly, as already two of the state-of-the-art methods that are compared to their method indeed use the time domain as well (in a different fashion, however).

    It is not fully clear how the ResNet50 is used. Is this model retrained on this particular data set or do you use a network pretrained on ImageNet or the like?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Methods are clearly described, used data sets are from public repositories and the evaluation scores are also clearly described. Thus, I think the findings should defintely be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Except of the unclear usage of the ResNet50 as stated above, I think this is a solid conference contribution without much to add/change.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the individual methods are not new per se, the authors present a valuable combination of a state-or-the-art deep learning approach with established CRF-based method. The combination seems to be well-suited for the particular application of stage classification and outperforms the existing methods (at least for the most part). Overall, a solid conference contribution.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Somewhat confident



Review #3

  • Please describe the contribution of the paper

    This paper combine an image model, transition detector and a CRF model together to tackle the problem of Embryo classification in videos.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed model take in both temporal and image as input. By capturing the temporal movements between two frames, the performance is better than single-image model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In transition detector, why feeding two consecutive frames but not more frames?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    My only concern is the two consecutive frames in transition detector. To my understanding, it is possible that the transition from one frame to another is not obvious. So maybe adding more frames will help. The author could do some experiments in the future.

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and has enough experiments to support the work.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper uses two streams of inputs (image and temporal motion) for neural networks to extract features which are then input to a CRF to classify the stages of embryos.

    The reviewers support the proposed method as a good tool application in developmental biology. The paper is well written. Though individual components of the proposed method are not new, the combination convinced the reviewers.

    The reviewers also brought up a few comments/questions to improve the paper: more ablation studies on the temporal motion input; fair comparison with other method (e.g., fine-tune ResNet50 on the datasets); incorrect claims (e.g., most other methods do not use temporal information); and some details on the training.

    In addition to the reviewers’ comments, the AC has a few more:

    1. Multi-stream bidirectional LSTM has been widely used in the computer vision community for action/event detection. What is the advantage of the proposed “network-based feature extraction + CRF” method compared to those multi-stream (bidirectional) LSTM?
    2. There is no ablation study in the paper to show the effectiveness of the two streams compared with single stream (i.e., image vs. motion vs. image + motion).
    3. There is no ablation study to show the effectiveness of the CRF. For example, if replacing the CRF by a (bidirectional) LSTM using the same multi-stream inputs, what will its performance be?
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1




Author Feedback

We thank all of the reviewers for their constructive comments.

[Reviewer1-Q1] Clarification of Footnote 1 (probabilities or log-probabilities)

A: Thank you for pointing this out. We meant probabilities. We will correct the footnote accordingly.

[Reviewer1-Q2] Details of sampling strategy

A: We have randomly sampled 50 frames from the entire video but have sorted them sequentially when training.

[Reviewer2-Q1] Clarification of ResNet50 used in experiments

A: We have used ResNet50 pretrained on ImageNet.

[Reviewer3-Q1] Different number of input frames for the transition detector

A: When sampling frames for training, rare stages (e.g., 3 cells or 5-7 cells) may be present only on 1-2 frames. Accepting more frames in our transition detector might make the detection of these rare transitions harder. Nonetheless, we appreciate your suggestion and will experiment with more input frames for the transition detector in the future.

[MetaReviewer-Q1] What is the advantage of the proposed method compared to LSTM?

A: Our method allows imposing additional temporal constraints on monotonicity, which is not available with LSTM. In addition to that, our CRF-based method is easily interpretable, as it provides both unary and pairwise potentials. We will make it clear in the camera-ready version.

[MetaReviewer-Q2] Effectiveness of the two streams compared with single stream

A: As shown in many video classification works, motion information is quite valuable. We agree that using motion information could improve the performance of stage classification methods. However, we think that is out of scope in this work, as we aim to show the effectiveness of the linear chain CRF rather than compare different input modalities.

[MetaReviewer-Q3] What would the performance be if LSTM replaces CRF?

A: We can replace CRF with LSTM by accepting the concatenation of features from the classifier and the transition detector as its input. While CRF explicitly injects the monotonicity into the model using pairwise potentials, it is not available to force LSTM to use the transition information for monotonic predictions. As such, CRF may perform better than LSTM when consecutive frames have mixed predictions.



back to top