Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yucheng Tang, Riqiang Gao, Hohin Lee, Qi Yang, Xin Yu, Yuyin Zhou, Shunxing Bao, Yuankai Huo, Jeffrey Spraggins, Jack Virostko, Zhoubing Xu, Bennett A. Landman

Abstract

Pancreas CT segmentation offers promise at understanding the structural manifestation of metabolic conditions. To date, the medical primary record of conditions that impact the pancreas are in the electronic health record (EHR) in terms of diagnostic phenotype data (e.g., ICD-10 codes). We posit that similar structural phenotypes could be revealed by studying subjects with similar medical outcomes. Segmentation is mainly driven by imaging data, but this direct approach may not consider differing canonical appearances with different underlying conditions (e.g., pancreatic atrophy versus pancreatic cysts). To this end, we exploit clinical features from EHR data to complement image features for enhancing the pancreas segmentation, especially in high-risk outcomes. Specifically, we propose, to best of our knowledge, the first phenotype embedding model for pancreas segmentation by predicting representatives that share similar comorbidities. Such an embedding strategy can adaptively refine the segmentation outcome based on the discriminative contexts distilled from clinical features. Experiments with 2000 patients’ EHR data and 300 CT images with healthy pancreas, type II diabetes, and pancreatitis subjects show that segmentation by predictive phenotyping significantly improves performance over state-of-the-arts (Dice score 0.775 to 0.791, p < 0.05, Wilcoxon signed-rank test).The proposed method additionally achieves superior performance on two public testing datasets, BTCV MICCAI Challenge 2015 and TCIA pancreas CT. Our approach provides a promising direction of advancing segmentation with phenotype features while without requiring EHR data as input during testing.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_3

SharedIt: https://rdcu.be/cyhLv

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors introduce novel method for pancreas segmentation that extracts phenotype embeddings from the image. Algorithm requires EHR data for training, but not for inference. Two publicly available datasets were used for evaluation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel method for generating phenotype embeddings.
    • Suggested method is an interesting way to utilise EHR data, when available.
    • Method seems to outperform existing image segmentation methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Network architecture is only briefly described and hard to understand.
    • EHR dataset used for training is only briefly described. We don’t know neither what preprocessing was used for features nor distribution of diagnoses.
    • It’s unclear if improvement in accuracy came from EHR data or from new network architecture. To check that, authors could train the same network without features from EHR data and measure performance.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors used private dataset. Network architecture is difficult to understand and reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Network architecture is only briefly described and hard to understand.
    • EHR dataset used for training is only briefly described. We don’t know neither what preprocessing was used for features nor distribution of diagnoses.
    • It’s unclear if improvement in accuracy came from EHR data or from new network architecture. To check that, authors could train the same network without features from EHR data and measure performance.
  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Idea and results are promising, but level of depth, detail and explanation is not good enough for MICCAI paper.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method to improve a pancreas segmentation model using EHR data at training time, but not at testing time. The EHR data is used to learn clusters of patients in a self-supervised way. The segmentation is done in a classical encoder-decoder approach, with the major difference being a predictor network whose output is the cluster the patient belongs to, using the latent representation of the image. This cluster prediction is concatenated to the latent representation of the image before being fed to the decoder. This approach is shown to improve performance compared to other SOTA methods on multiple datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is a clever way to use EHR data to help segmentation, without requiring it for inference. The proposed method can definitely be extended to any segmentation tasks, not just pancreas CT.

    The experiments are convincing. The proposed method is compared to other SOTA methods, and while it does not improve the results that by a wide margin, it does so by being consistently good on all images, unlike the other methods which have a high variance in performance on different images (Fig 3, left).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the method is quite high-level, and there are multiple details that I am uncertain about (see the section below).

    In addition, I am uncertain the method works as is claimed by the authors. It seems the performance is not very sensitive to the number of clusters K (Fig 3, right). If the method performs reasonably well with K=1, doesn’t it mean it’s not the phenotype information that is useful?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    While some low-level details of the method are missing, since the authors will release the code, I believe the paper will be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Concerning the method:

    • The decoder is a U-Net, meaning there are skip-connections to the encoder x?
    • Why does the predictor has 2 outputs? Shouldn’t it be just 1? You only need to do the cluster prediction once, no? And yet there is FC1 and FC2 for the predictor in Fig 2.

    Fig 4: the performance of the proposed method is impressive for the Type I patient. Does the method perform systematically better than the others on Type I patients?

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I gave a score of 7, because the proposed method is, as far as I know, novel and can be generalized to any segmentation task when EHR data is available.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Somewhat confident



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors exploit clinical features from EHR data to complement image features for enhancing the pancreas segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a novel multi-modal way to improve the pancreas segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The writing and lack of ablation study. Refer to Section 7.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The implementation details (network parameters) are not listed in the paper, but the authors agree to release the code once accepted to make reproducing such framework possible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors claim that “the phenotype embedding model for pancreas segmentation by predicting representatives that share similar comorbidities can adaptively refine the segmentation outcome based on the discriminative contexts distilled from clinical features”. Such multi-modal framework may help on the segmentation performance, but the paper lacks of ablation study of such factor. The reader may want to know to what extent the DSC improvement is determined by phenotype embedding.

    The writing of the paper is poor. There is a lot of space for the writing improvement of this paper.

    1. The ending of Section 2.1 should be a period.
    2. The first character of each word of a section / sub-section title should be capitalized.
    3. “Other details include” in Section 3.2 should be followed by phrase instead of sentence.
    4. The 4th and 5th lines in Section should be “optimizes…” and “updates…”
  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The writing and lack of ablation study.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The proposed method appears novel and the experiments provide indication of the robustness of the developed framework. It seems however that the origin of this improvement cannot be directly imputed to the use of EHR and further work would be needed to disentangle the improvement brought by the use of the EHR and the network architecture itself. Improvements in the writing style and flow of the paper would further help clarity

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7




Author Feedback

We are appreciative to meta- and all reviewers’ comments on this paper. This paper proposes the first EHR embedded model for medical image segmentation. We show that segmentation tasks benefit from a new method of phenotype embedding that does not require EHR data to be available at inference. We agree that additional clarity is needed to address the two major points and two minor points raised by the reviewers. Below, we show how the requested information was present (with less than ideal clarity) in the manuscript: Major concerns:

  1. Lack of clarity on the benefits of specific innovations (EHR data and network architecture): A. Benefit of the EHR (Meta-reviewer, reviewer 1, 3): In Table 1, we compared the backbone model [18] (row 5) and predictive phenotyping (row 6). The EHR improved performance on two datasets by 1.5%, significant Wilcoxon signed-rank test, p<0.001. For the diabetic patients, performance improvement gains were larger. Predictive phenotyping with the EHR outperforms naïve approach with feature concatenation by a large margin, from 74.5% to 77.9% (Figure 3). B. Benefit of the network architecture (Reviewer 2, 3): In Table 1 and Figure 3, we compared with pancreas segmentation state-of-the-art methods. Predictive phenotyping significantly improved performance in terms of DSC, ASD and HD, with p<0.05, Wilcoxon signed-rank test. Importantly, HD improvement shows that EHR information provided useful context to reduce outliers. In Table 2, we further investigate the comparison experiment results with external testing sets. For external validation, the two public challenge data do not include patient EHR, i.e., demographics, ICD codes. Our method implicitly predicts the future outcomes from the image feature and fused to the segmentation task. The method achieves a mean DSC of 0.757 on BTCV data, and 0.827 on TCIA pancreas CT.
  2. Lack of clarity of the sensitivity of phenotype cluster K (Reviewer 2): In Figure 3b, we evaluated the performance by varying the number of clusters K from 1 to 10 on the in-house dataset. The Dice improvement between K=1 and K=10 is above 3%, indicating the performance is sensitive to cluster K (significant p < 0.001). The scaling of Figure 3b will be improved to show the curve improvement clearly. Minor concerns:
  3. Clarification of network architecture (Reviewer 1, 2): The architecture of our model is clarified, and additional details are referenced in citation [18]: EHR Encoder: The model consists two convolutional layers, each has kernel size of 3×3×3, followed by rectified linear units (ReLU). A fully connected layer is then used for generating consistent features’ dimensions. Image Encoder: The CNN is first used as a feature extractor to generate a feature map for the input. 3D volume data are encoded by two 3×3×3 convolutional layers followed by batch normalization and ReLU activations, then a 2 × 2 × 2 max pooling with strides of two in each dimension. Segmentation backbone: We adopt a four level U-Net-like structure as the segmentation backbone as shown in Figure 2. Each level in the encoder consists two 3×3×3 convolutional layers, followed by rectified linear units (ReLU) and a max pooling of 2×2×2 and strides of 2. In the decoder, the transpose convolutions of 2×2×2 and strides of 2 are used followed by two 3×3×3 convolutions and ReLU. Skip connectors from layers of same level in the decoder to provide higher-resolution features to the decoder. The last layer is a 1×1×1 convolution that set the number of output channels. We used Dice Loss. The largest volume size is 168 x 168 x 64, which is the same as in baseline hierarchical method. Preprocessing detail is introduced in section 3.1. The EHR encoding details including the generation of EHR are shown in Section 2.1. The architecture will be clarified in the method section and implementation details.
  4. Clarification of writing flow (Reviewer 3): We appreciate and will accept all comments regarding grammer.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key points of concern raised by the reviewers regarding notably clarity and specific improvement associated to the proposed framework have been well addressed in the rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers and Meta-reviewer highlighted the lack of clarity of the paper in illustrating the use EHR for the proposed task. The description of the EHR data (characteristic, processing, distributions, …) was found lacking, as well as the illustration of the improvement of the network component. The rebuttal addresses these aspects by clarifying that Table 1 and Figure 3 report the improvement of the proposed work with respect to baseline approaches not using EHR information. The same argument is provided for the demonstration of the network component. Finally, for the description of the EHR data, the authors point to the relevant sections 2.1 and 3.1.

    After reading reviews and comments the feeling on the lack of clarity on the paper is confirmed. The reproducibility of the work is poor, since a very minimal (if not absent) description of the EHR data is given in the paper. Given that the use of EHR data is one of the main focus of the work, the lack of clarity undermines the value of the study. For similar reasons, the lack of an ablation study makes it difficult to appreciate the contribution of the different components of the study (use of multiple phenotypes, networks, loss functions, …)

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    15



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors have addressed the major concerns from reviewers, i.e. contribution of the EHR phenotype data compared to backbone network architecture, and sensitivity to K in clustering. The paper has a novel idea of using phenotype information refine pancreas segmentation.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



back to top