Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Lisette Lockhart, Parvaneh Saeedi, Jason Au, Jon Havelock

# Abstract

In Vitro Fertilization (IVF) treatment is increasingly chosen by couples suffering from infertility as a means to conceive. Time-lapse imaging technology has enabled continuous monitoring of embryos in vitro and time-based development metrics for assessing embryo quality prior to transfer. Timing at which embryos reach certain development stages provides valuable information about their potential to become a positive pregnancy. Automating development stage detection of day 4-5 embryos remains difficult due to small variation between stages. In this paper, a classifier is trained to detect embryo development stage with learning strategies added to explicitly address challenges of this task. Synergic loss encourages the network to recognize and utilize stage similarities between different embryos. Short-range temporal learning incorporates chronological order to embryo sequence predictions. Image and sequence augmentations complement both approaches to increase generalization to unseen sequences. The proposed approach was applied to human embryo sequences with labeled morula and blastocyst stage onsets. Morula and blastocyst stage classification was improved by 5.71% and 1.11%, respectively, while morula and blastocyst stage mean absolute onset error was reduced by 19.1% and 8.7%, respectively. Code is available: https://github.com/llockhar/Embryo-Stage-Onset-Detection.

SharedIt: https://rdcu.be/cyl6s

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

In Vitro Fertilization (IVF) treatment can benefit from the timing at which embryos reach certain development stages for their potential to become a positive pregnancy. The authors use: -Synergic loss to provides feedback to learning of embryo-independent stage similarities -Long Short-Term Memory (LSTM) layers add temporal context to image stage classi cation -Image and sequence training augmentations promote generalization to un- seen embryo sequences. They applied the method to human embryo sequences with labeled morula and blastocyst stage onsets. Morula and blastocyst stage classi cation was improved, while morula and blastocyst stage mean absolute onset error was reduced.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is a valid problem. The solution is reasonable and validated.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Explain while they select to use synergic loss for stage similarities. The stages they identify are very few. Perhaps a simpler markov chain would be sufficient with fewer parameters than sequential neural networks. Argue that multiparameter neural networks are necessary.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

They provide a diagram to explain their neural network that is helpful.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Justify why you select this methodology. You do not explicitly link the stages that you are modeling with the best timing for embryo transfer.

strong accept (9)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a valid problem with a valid and evaluated solution.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

This paper presents a computational approach to classify developmental stages in time-lapse videos of embryos as quality control for IVF. The main difficulty of this task is the typical small variability of features between stages and the often high variability across experiments. The authors address this problem by introducing a synergic loss, an LSTM sequence model and dedicated augmentations to a deep learning network classifier. These additions considerably improved the model performance compared to the intrinsic basline.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

I think the main strength of the paper is the idea to include the parallel input streams with the synergic loss in combination with a refinement via LSTM. This approach seems to be very promising for in a more general context (staging of microscopy time-lapse experiments in general.).

I also like the step-wise validation of the different parts of the method.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weakness in my opinion is lack of clarity in motivating or explaining the different parts of the model. E.g.

• it would be good to give the reasons why the synergic loss approach was chosen and what the expectation was,
• the explanation of the post-processing of the predictions was not clear to me

A minor but still important point: I think the paper should also come with the source code, otherwise it would be hard to rebuild such a system for e.g. a different application. It seems like the authors do not plan to publish the code.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There is no code available it seems. I think this is limiting the reproducibility of the paper.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I do like the paper in priniciple. Overall it is well structured. I found some details in the text a bit confusing and sometimes I think crucial explanations are missing:

major:

• In 3.1. the description of the synergic network requires an explanation I think. It would be good to know what the intuition is behind this choice. Even a single sentence would be helpful to make the paper self-contained.
• In my opinion 3.3. does not contain enough information to understand and reproduce the optimization of stage predictions. An explanation, an additional panel in figure 2 or a small example would be very helpful.

minor:

• The presentation of the related work on p2 was a bit confusing. Maybe this could be given a clearer structure or order.
• “Embryo staging of embryonic cleavage times and morula through expanded blastocyst stage onset was ﬁrst performed using image processing techniques [9].” I am not sure that this reference is really the first paper to use image processing for embryo staging. Shouldn’t there be an earlier reference, or maybe the phrase “image processing for embryo staging” is meant in a more narrow sense here.
• I found the explanation of the oversampling approach on p5 (top paragraph) a bit confusing. After reading the paper I kind of understood what was meant here, but maybe you can rewrite this part and include a little bit more explanation to make it clearer.

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In principle I find it a good paper that describes an interesting approach to stage prediction from time-lapse data. I think it can be of interest to a lot of people working outside of IVF as well. But it would certainly improve this contribution considerably to provide some more explanations and intuitions and also make the actual implementation available.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

6

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

The paper presents a classi er to detect embryo development stage. Synergic loss encourages the network to recognize and utilize stage similarities between different embryos. Short-range temporal learning incorporates chronological order to embryo sequence predictions.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Morula and blastocyst stage onset prediction based on the image processing and recognition technology is increasingly used in IVF clinic. Through embryo quality assessment, the embryos with the best transplantation potential can be selected for transplantation. Automating embryo development stage detection based on deep network is innovative.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main work of this paper is to apply deep learning to the evaluation of embryo transfer potential. However, the network model itself is less innovative. In addition, some details and notations are missing in Methodology section.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

In experiments, only 117 mebryo timelapse imaging sequences are involved. What’s the ratio of positive to negative samples? Is it enough to train the classifier?

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In IVF clinic, it is important to assess the potential for embryo transfer, not to judge the developmental stage of the embryo.

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Innovation, and the key problem to solve.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

2

• Reviewer confidence

Confident but not absolutely certain

### Review #4

• Please describe the contribution of the paper

This paper uses two CNN-LSTM with synerigic loss to perform the task of Embryo stage detection.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Unlike common CNN-LSTM architecture to deal with sequence classification, this paper use a synerigic loss between two CNN-LSTM to implement the pairwise-learning.
2. The authors provide enough ablation study.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

No comparison with other works.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code can be provided.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Comparison with other works is needed. At least the author should provide single CNN, single CNN-LSTM performance.

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In experiments, no comparison with previous works.

• What is the ranking of this paper in your review stack?

5

• Number of papers in your stack

7

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a computational approach to classify developmental stages in time-lapse videos of embryos as quality control for IVF. The main difficulty of this task is the typical small variability of features between stages and the often-high variability across experiments. The authors tried to address this problem by introducing a synergic loss, a CNN-LSTM sequence model, and dedicated augmentations to a deep learning network classifier. These additions considerably improved the model performance compared to the intrinsic baseline.

Strength I think the main strength of the paper is the idea to include the parallel input streams with the synergic loss in combination with a refinement via LSTM. This approach seems to be very promising in a more general context (staging of microscopy time-lapse experiments in general.).

Weakness

• It would be good to give the reasons why the synergic loss approach was chosen and what the expectation was,
• The explanation of the post-processing of the predictions was not clear to me
• No comparison with other works.
• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

# Author Feedback

We thank all the reviewers for their feedback and interest in the importance of our work, which we agree “seems to be very promising in a more general context” [Rev. #4]. To facilitate further research and development, we have published the source code and will include a GitHub link in the final paper version.

Rev. #2 and Rev. #4: Synergic loss intuition

We agree the use of synergic loss was not motivated in the paper and will revise it for clarity. Synergic loss was chosen because it explicitly encourages all pairs of encoded features from the same stage to be similar. Encoded features from different stages are discouraged from being similar. Since embryos vary considerably throughout each stage but vary little near stage onsets, synergic loss was expected to improve the network’s ability to encode features that can better distinguish between stages.

Rev. #2 and #5: Embryo transfer

While selecting the best embryo for transfer is the end goal in a clinical setting, we argue that automating stage onset detection is important and may enable future innovations and research opportunities, as noted by Rev. #4. It can improve embryologists’ workflow for selecting implantation embryos by removing the need for manual stage onset detection. As noted by Rev. #5, image recognition methods are becoming more prevalent IVF clinics. Predicting development stage onset to assist with embryo selection is more interpretable and could be adopted into clinical practice easier than an algorithm that predicts embryo implantation directly.

Rev. #6 and Meta-Rev #2: Comparison with other works

Directly comparing our work with others on the same dataset for the same task is ideal. However, there is no publicly available dataset to use for comparison. Previous works [Leahy et al., 2020 and Feyeux et al., 2020] do not have the same embryo stages and contain significantly different amounts of training and test sequences, so comparing reported performance would not be informative.

Single CNN performance is provided as baseline (top row of Tables 2 and 3). This single CNN baseline is similar to the “Single Focus” method used in [Leahy et al., 2020] with VGG16 instead of NesNeXt101 (which we found empirically was a stronger baseline for our staging task and dataset). It is not possible to re-implement their highest performing “Full Setting” on our dataset since this method uses multiple focal planes as network input and our dataset has only a single focal plane available. We did not include results for all combinations of the methods due to page limits.

Rev. #4: Explanation of post-processing

 We acknowledge this paragraph could be confusing and could be re-phrased as follows: Embryo stage is clinically defined to be monotonic non-decreasing across any time-lapse sequence. The network output stage predictions can oscillate between stages and are therefore restructured to enforce monotonic non-decreasing order. Each sequence is restructured by selecting the monotonic non-decreasing series of stage classifications that has lowest error with the network output stage classification predictions. The loss (NLL or MAE) is computed for each frame and summed across the sequence. For a sequence of length N, all possible monotonic non-decreasing stage classifications are: $\forall (i \in [2, N-1], j \in [3, N] j > i)$, where $i$ and $j$ are morula and blastocyst stage onset frame, respectively. Stage onset is then selected from the minimum restructured sequence index at which that stage was predicted.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

All the main issues have been well addressed.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

To facilitate in vitro fertilization (IVF) treatment, this paper proposes a classification method to predict the development stage of embryos from time-lapse imaging data using synergic loss and temporal learning. Experimental results on human embryo sequences with labeled morula and blastocyst stage onsets are promising. The reviewer recommendations are mixed. Overall the work is interesting and addresses an important problem. There is concern about the novelty of the network model used and the lack of comparison with other works. However, as the authors point out, there is little previous work to compare with and no public data sets are available. With the promised revisions I believe the paper is acceptable for MICCAI 2021.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I appreciate the rebuttal of the authors as it concisely answers the concerns of the reviewers. For me, the comment regarding baselines was the most important limitation of the paper, and the authors have adequatly cleared this up. Combined with the clarification comments that they propose to include in the final version, I believe this work will be interesting to the general community.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5