Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jie-Neng Chen, Ke Yan, Yu-Dong Zhang, Youbao Tang, Xun Xu, Shuwen Sun, Qiuping Liu, Lingyun Huang, Jing Xiao, Alan L. Yuille, Ya Zhang, Le Lu

Abstract

The boundary of tumors (hepatocellular carcinoma, or HCC) contains rich semantics: capsular invasion, visibility, smoothness, folding and protuberance, etc. Capsular invasion on tumor boundary has proven to be clinically correlated with the prognostic indicator, microvascular invasion (MVI). Investigating tumor boundary semantics has tremendous clinical values. In this paper, we propose the first and novel computational framework that disentangles the task into two components: spatial vertex localization and sequential semantic classification. (1) A HCC tumor segmentor is built for tumor mask boundary extraction, followed by polar transform representing the boundary with radius and angle. Vertex generator is used to produce fixed-length boundary vertices where vertex features are sampled on the corresponding spatial locations.
(2) The sampled deep vertex features with positional embedding are mapped into a sequential space and decoded by a multilayer perceptron (MLP) for semantic classification. Extensive experiments on tumor capsule semantics demonstrate the effectiveness of our framework. Mining the correlation between the boundary semantics and MVI status proves the feasibility to integrate this boundary semantics as a valid HCC prognostic biomarker.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_72

SharedIt: https://rdcu.be/cyl9j

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a method for hepatocellular carcinoma (HCC) boundary delineation via spatial boundary vertex localization and subsequently detected boundary classification for microvascular invasion scoring.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposed is well written and proposed a novel approach to tackle a challenging clinical task. It has great potential for future clinical use and it is proven via extensive experimental validation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- It is not clear from the paper how the three variables (e.g. [0.6, 0.1, 0.3]) which make up the patient-specific capsular biomarker are chosen.
- The proposed method showed very similar performance when compared to the GT upper-bound performance. On the other hand, the authors mentioned that the reliability of human-labeled boundary ground-truth is questionable due to the ill-posed nature of boundary detection and uncertainty caused by human-annotated error. So, it is not clear from the paper why Gaussian blurring is required to mimic human annotation when the estimated boundary before blushing could be more accurate.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method seems reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper sufficiently described the approach and different methodological choices. Additional results for the number of vertices greater than 90, and the effect of using Gaussian blurring would add value to the paper.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The proposed method is technically novel and can be used for similar other applications.
- The paper is well written and shows promising results.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

Several cited in the paper medical studies report that tumor boundary can be categorized into different classes of capsular invasion. Such separation into different classes along the border could serve as a prognostic biomarker. The paper describes a methodology to segment a boundary of a tumor into such semantically different classes.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors introduce a novel block on top of the established U-Net architecture to facilitate learning of the different semantics on the border.

They perform a thorough ablation analysis on two datasets and report superior performance of the proposed method compared to a baseline.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
In the paper the authors analyze the inter-rater variability of the used datasets. The reported results show that the best overlap between human expert annotations is around 50%. I am not sure that a segmentation task that has the best overlap between experts of 50% calls for automatization. Under such a low level of agreement, even the human annotation is not prioritized high for clinical decision making. I am not saying that such datasets should be completely discarded, but we rather should think about methodologies foremost addressing this variability (e.g. learning with noisy labels [1,2]). The usefulness of plain supervised segmentation learning methods (resulting in 30-40% dice) is questionable.

I would expect, as a proof of superiority of the method, a comparison with a baseline in a form of a U-Net which is trained to predict directly the boundary with different classes. Instead, the authors mention that they use a U-net with vertex-based mask sampling as a baseline. I am afraid such a baseline was chosen as it is inferior to the proposed one.

The clarity of the narrative should be improved. I was not able at a few places to get the gist due to writing issues.
1. https://arxiv.org/abs/2007.08199
2. http://proceedings.mlr.press/v9/yan10a/yan10a.pdf
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The manuscript seems to correspond to the reproducibility checklist.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- in the following sentence, I guess S should be in the nominator since you define them as ratio (S={1/4,18,…}) and not as multiplier: “Multi-scale pyramid features are generated with each scale feature x ∈ R^(H/S × H/S ×256) “
- missing reference: “We first build a tumor segmentation model (See )”
- a single blue arrow (with label “Tumor segmentor”) in Fig.2, corresponds to “upsample every scale feature into scale 1/4 and merge them with an add operator, followed by a 3×3 convolution layer, 4× upsampling operator and a softmax activation layer”. I believe it is important for the sake of clarity to either draw all these components on the plot.
- I am by far not the most intelligent person in the field, but with my background in applied math I am able from time to time to get the gist of miccai papers. Sorry, but in the following paragraph filled with math notions I understood null: “Specifically, grid k ∈ {1, 2, . . . , N } is filled with a set of candidate vertices (r g , θ g ) ∈ G R×3 in polar representation, where θ g ∈ {(k − 1) ∗ ∆θ, k ∗ ∆θ, (k + 1) ∗ ∆θ}, r g ∈ {r|θ = θ g }, and R approximates to the max (r|θ = k ∗ ∆θ) − min (r|θ = k ∗ ∆θ)”
  What is ∆θ? 360/N? Please explain. What is G, topolgical group? Probably not, please explain. Using your notations, max (r|θ = k ∗ ∆θ) would be r_g, and min (r|θ = k ∗ ∆θ) would be 0. But you wanted to say somewhat different. Please clarify. I would suggest instead of explaining in multiple sentences in the paragraph above what the polar coordinates are (which most of the people know or at least can google), make this pargraph with math notations more verbose and clear.
- “The MLP contains two layers with a GELU non-linearity”. Probably, Relu.
- “For each patient with multiple slices of images, we inference”. We infer.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The peculiar choice of the baseline method, unclear description of some parts in the method section, and generally questionable problem statement make me reduce the overall score.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

In this paper, the author proposed two components: spatial vertex localization and sequential semantic classification to investigate the tumour boundary.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Experiments are sufficient and comprehensive.
2. The idea is interesting. However, important details are missing in the Method part.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. There are a lot of essential implementation details that are missing. It is hard to evaluate the reproducibility of this work. Please see detailed comments for multiple concerns.
2. The Methods part needs a major revision. Many details are unclear or missing. Please see detailed comments.
3. I will raise the score if the author can address my concerns in detailed comments.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The main part of polar system transformation, converting the region mask into boundary mask during training, are unclear. Please see detailed comments.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. If the extracted boundary needs to be converted into polar representation? Why is Gaussian blurring used to increase the thickness? With the blurred (thicker) boundary, is the inner contour or outer counter used as the final boundary for polar representation? Similarly, Fig.3 (e) looks strange to me. Why an inverted mask from the polar system is a thick boundary mask? Rather than a thin boundary? How are the vertices sampled along the boundary to invert a thick boundary?
2. The implementation of the polar transformation part is unclear. How did the author transfer the extracted boundary mask into the Cartesian coordinate during the training? In other words, how did the author generate the coordinate positional map during training?
3. If the centroid of the tumour mask is unknown, how did the author predict the pole during testing? As I know, PolarMask[22] needs to predict the pole and the boundary vertices at the same time during training. While, in this work, the author only predicted the boundary vertices location in the polar system. How did the author tackle the pole location issue?
4.As I know, [20] did not convert the predicted region mask into the boundary mask during training. How did the author achieve this?

6 The author is suggested to show X_p, X_coor, X_g, X_seq in fig.2 to clarify.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The details in the Method part.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

6
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The overall reviewers’ opinion about the proposed method that addresses a very challenging tumour delineation task using sequential learning is positive. They remark novel method aspects, an important clinical problem and good scores considering the high inter-rater variability. There are, however, a number of points that require further clarification: the rational of Gaussian blurring, the discussion of alternative methods for learning from noisy labels, a fairer baseline (“the authors mention that they use a U-net with vertex-based mask sampling as a baseline. I am afraid such a baseline was chosen as it is inferior to the proposed one”) and the detailed comments from reviewer #3 about the shortcomings of the method description. I believe this (relatively minor) concerns can be successfully addressed during rebuttal and trust the authors can provide convincing answers.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

We thank all reviewers and AC for their constructive comments. We mainly address the questions raised in the meta-review below; R4’s detailed comments will be directly edited in the revised manuscript (to save space here).

The benefits of Gaussian blurring are three-fold: (1) to avoid the discontinuity in the polar coordinates. If directly using the thin extracted edge, some emitted rays in the polar transform with equidistant angle may not intersect the edge, especially when rays are denser (the angle interval is smaller). Void coordinates or “broken” boundaries in polar coordinates can happen. Blurring the extracted tumor edge via a fixed Gaussian kernel fixes this issue. (2) Gaussian blurring ensures the margin between inner and outer contours, making it feasible to construct the spatial grids surrounded by emitted rays, inner and outer contours, and facilitate data augmentation by randomly sampling vertices in grids. The boundary is blurred to have thickness and inner/outer contours so that N emitted rays with equidistant angle divide the boundary into N spatial grids. Each grid contains several candidate vertex pixels, to be randomly sampled. (3) Gaussian blurring boosts the fault tolerance performance for the stage of foreground-boundary prediction. “GT_Vertices” in Table 3 are generated by fixedly sampling the vertices along the (thin) middle GT boundary skeleton. The comparison of 4th and 5th rows in Table 3 proves the superior performance by data-augmentation where Gaussian blurring is indispensably required.

We will study methods of learning from noisy labels in future work as follows to further address the inter-rater variability: (1) computing the prediction confidence via uncertainty estimation, (2) learning from readers’ annotation preference: arxiv.org/abs/2104.05570, (3) loss reweighting by computing the inconsistency among readers. The problem studied in this work essentially has higher degrees of inter-rater ambiguity/variability so that AI-based quantification and standardization are becoming more needed, necessary and desirable for diagnosis.

Our baseline is a U-Net trained to predict the boundary with different classes, as R3 suggested. The vertex-based mask sampling is a post-processing strategy on top of baseline U-Net’s semantic predictions. Our problem formulation can then be converted as semantic sequence learning. Adopting vertex-based mask sampling on segmentation makes it more comparable with the proposed method where both can be evaluated and compared in the sequence space. This can be considered as a reasonable baseline. As shown in table 1, our sequential learning models strongly outperform this baseline as 48.42 vs. 25.28 (F1) and 36.58 vs. 26.26 (F1), in CAP and FEN datasets, respectively.

Note that our main contributions are the novel method representation on this challenging, clinically important problem and our clinically relevant quantitative performance achieved in inter-reader user study (Table 2). Our algorithm performs comparably to a radiologist with four-year practicing experience.

“Detailed comments from R3”: (1) “Multi-scale pyramid features are generated with each scale feature x in R^(HS × HS × 256)” (2) Add reference: “We first build a tumor segmentation model as Fig.1(b)” (3) We will present all described method components on the plot. (4) ∆θ is the angle interval between two adjacent rays, equal to 360/N. (5) G, a real number set {R}, defines the range of vertices’ polar coordinates. A vertex (r_g, θ_g) is constrained on the line segment intersecting between the boundary and the ray of (θ_g), and then constrained on a grid. The range of radius values for candidate vertices, [min(r θ=k∗∆θ), max(r θ=k∗∆θ)], indicates a vertex is located between the inner and outer contour. (6) It is Gaussian Error Linear Unit (GELU) activation (https://arxiv.org/abs/1606.08415) used here, not RELU.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have addressed a number of minor comments and gave a better reasoning for the importance of the Gaussian smoothing step. I am still not quite convinced that the employed baseline (default Res-UNet with vertex sampling) can be considered current state-of-the-art given its poor performance, but would give the authors the benefit of doubt that it is “a reasonable baseline”. Unfortunately, the authors will neither release code nor dataset, so while this is a fair MICCAI paper that can initiate some interesting discussion its impact of adoption in other research will be severely limited (to put it more nicely: I would still encourage to release some code and/or more method details that would enable exact replication).
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The reviewers are generally positive about the paper, especially regarding the novelty and significance. The concerns regarding Gaussian blur and more were addressed in the rebuttal to a reasonable level. So I would recommend acceptance of the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Following my reading of the paper, reviews, and rebuttal, it seems the authors have addressed most of the concerns, in particular from R4 which had several questions about the method. From my point of view, the idea of linking tumour boundary characteristics to prognostic values in appealing. If the authors can integrate the clarifications given in the rebuttal, I would lean towards acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

back to top

Sequential Learning on Liver Tumor Boundary Semantics and Prognostic Biomarker Mining