Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Trinh Thi Le Vuong, Kyungeun Kim, Boram Song, Jin Tae Kwak

Abstract

In digital pathology, cancer grading has been widely studied by utilizing hand-crafted features and advanced machine learning and deep learning methods. In most of such studies, cancer grading has been formulated as a multi-class categorical classification problem, likely overlooking the relation-ship among different cancer grades. Herein, we propose a ranking-based deep neural network for cancer grading in pathology images. Utilizing deep neural networks, pathology images are mapped into a latent space. Built based upon a triplet loss, a ranking loss is devised to maximize the inter-class distance among cancer grades in the latent space with respect to the aggressiveness of cancer, leading to the correct ordering or rank of pathology images. To evaluate the proposed method, a number of colorectal pathology images have been employed. The experimental results demonstrate that the proposed approach is capable of predicting cancer grades with high accuracy, outperforming the deep neural networks without the ranking loss.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_52

SharedIt: https://rdcu.be/cymbe

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This work uses cancer grading in digital pathology images to propose a ranking-based deep neural network to grade cancer automatically. The authors propose a new ranking loss function to maximize the inter-class distance among different gradings of cancer. The proposed loss function was implemented using three different networks (DenseNet, MobileNet, EfficientNet),
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

All sections are well-explained. The authors proposed a new loss function to force to maximize the distance between different grades of cancer in pathology images. They have extended a triplet loss function by adding another 2 terms. Each term is a conditional distance measure from the anchor’s embedding (from the positive images) to two different negative embeddings. It seems like that the conditions on the distance measures, help to define the classes that are closer to each other and eventually maximize their distance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- It has not been clearly explained whether the novelty is ranking loss alone or ranking loss plus the triplet loss.
- The effect of network architecture on the result has not been discussed. e.g., MobileNet nullifies the effect of rank loss compared to the triplet loss
- No justification given why these three CNNs were picked.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is expected the proposed method be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Fig. 1 need more clear explanation, especially the bottom sub-figures.
- What is h(.) function in eq(6)? and why is it easy to extend eq(6) to eq(7)?
- What does each color in Fig. 2 represent? (green, red, yellow)
- In eq(2), j is does not appear in the equation. Should k be replaced by j?
- This statement is not accurate: “Regardless of the type of backbone networks, RankCNNs were substantially superior to both PlainCNNs and TripletCNNs,” This is clearly not the case for MobileNet.
- There are few grammatical errors. e.g.,
- page 2, “To enforces” -> “To enforce”
- page 4, rewrite the following: “the triplet loss is designed to have 𝑥𝑎 is closer to all…”
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes an interesting approach for multi-class classification in cancer grading with potential to be applied to different cancer sites not only in digital pathology, but for medical imaging modalities such as MRI and CT.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

In most studies cancer grading is formulated as a multi-class categorical classification problem, overlooking the cancer grade ranking order. The authors propose a ranking-based loss. It is based on Triplet loss [24], the authors add two more terms to “maximize the inter-class distance among cancer grades”.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Intuitively, it makes sense to reflect ranking order in the loss. The authors have sought for a possible representation of that ranking loss.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors need to justify their choices in different parts of the paper, as I detail in the comments section. The justification requirements includes also the proposed loss, the authors should improve their explanation for the choices and parameters there and in other parts of the paper. The authors also fail to explain things clearly in general. One obvious example is figure 1, which has several details, but there is no text describing what those details are in the figure. I am also concerned regarding the experiments. I explain in the comments a set of details that are needed, which include per-class data (most important), metrics and cross validation with statistical analysis (significance, etc). There are also some apparently strange things in the results, such as the values for F1mic exactly the same as those for accuracy. A better review of related work on ranking based loss should also be important.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I believe there is enough detail for reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The presentation can be improved significantly. I already mentioned figure 1 not being described at all, and the need to justify and explain several details.

You need to explain everything that you show in figure 1, near the figure. Why is lambda=2? (1)

There is confusion in an important phrase, correct it: For an (anchor) image 𝑥𝑎, the triplet loss is designed to have 𝑥𝑎 is closer to all other (positive) images 𝑥𝑝 of the same class than it is to any (negative) image 𝑥𝑛 with a different class label from 𝑥𝑎.

In eq (9) define P.

The second and third terms in eq (9) are not sufficiently well justified or explained. Why is the P part not sufficient? How do the 2nd and 3rd terms work and why do they emphasise some perspective? You need to be much clearer and convince the reader about why of each detail.

“However, the triplet loss can partially handle the rank among the images of multiple classes since the loss only considers a pair of class labels at a time. “ -> this phrase is very strange, explain better…and correct the phrase error “however the triplet loss can partially handle…”. This is also a good place for you to explain better what is missing and what should be added to improve further. Be very pedagogic, because you are explaining exactly how your further details that you add to the formula work and why they are important.

“These two terms aim to push negative images further away, forcing the minimum distance between anchor and negative images to be larger than the maximum distance between anchor and positive images, following the ordering of the class labels.” -> how? And why is the first term not sufficient to do that competently? Explain much better. As it is, it is not sufficiently clear.

Implementation Details: include parameters, such as learning rate. Why did you choose the data augmentation parameters you describe, why not others? Justify

“margins 𝛼𝒫, 𝛼𝒬, and 𝛼R to 1.0, 1.0, and -0.5, respectively. Setting 𝛼R = −0.5, we relax the constraint for the equidistant cases.”-> why these? Also, explain margin better in eqs 3,5 and 9. Also why 1,1,-0.5? Be clearer about all of this.

Experiments need a lot more work:

Your results seem strange. In particular, I do not understand why F1mic equals exactly accuracy.

Metrics and evaluation: please, improve the choice of metrics and evaluation methodology. Accuracy is typically not sufficient to evaluate accurately, recall should be paired with precision, please show both. You should also show confusion matrices, as they are very informative. Most importantly, you need to show all the results, with precision and recall, for each individual class (per-class results). There is the danger that some improvement may happen at the expense of some class, or something else, that is why you need to do it all.

Validity of results: you should do cross validation, display the results per fold and over all, then do a study on significance.

Additional typos to correct: (Year) in several references… To exploit the nature ordering -> natural the triplet loss is designed to have 𝑥𝑎 is closer to all other (positive) image MoblieNet However, the triplet loss can partially handle the rank among the images…
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Some inconsistencies in experimental work, which I detailed before. Some lack of explanations for choices and details, as detailed before. Some phrasing errors also The need for more validating results and experimental statistical significance, the need also to review and include more comparisons.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The authors proposed a ranking-based deep convolutional neural network (RankCNN) for colorectal cancer grading in pathology images. The proposed ranking loss makes the CNN aware the diagnostic priority of colorectal pathology. The proposed RankCNN outperformed the models trained by Cross entropy loss and common Triplet loss.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A novel definition of loss function based on triplet loss for cancer grading.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The division of the dataset is not clear, which could make the results less convincing. The details for the visual assessment of colorectal cancer semantic segmentation are omit.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The proposed method is easy to implement. The reproducibility checklist shows that the code will be made public, and the dataset will not be made public.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper proposed a novel ranking-based loss function for colorectal cancer grading with pathology images. The topic is interesting. Overall, the paper is well written and easy to follow. The motivation is clear and the method is technically sound. The experimental results have demonstrated the effectiveness of the proposed method. Here are some detailed comments.

1) The three parts of the ranking loss seem to have different effects on the performance for the grading. However, none hyper-parameters were defined to balance the effects of these triplet losses. Is it necessary to study the contribution of each effect by tunning the weights of each part of the loss function?

2) The division of the experimental dataset is not clear. The dataset consists of 6 colorectal tissue microarrays (TMAs) and 3 whole slide images (WSIs). However, 10,000 image patches were extracted from this TMAs and WSIs. I concern there would be WSI-level data overlapping in the training and testing set. In this case, it could make the information leakage and affect the convincing of the experiment. Please declare the details about the dataset division in the experimental section.

3) The detailed approach to generating the visualization in Fig. 2 is a little confusing. First, the meaning of the color is not given. Then, the results are spatially smooth. Is it obtained by sliding window methods? What is the step of the window sliding?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The topic is interesting. The motivation is clear. The method is technically sound. The result is good.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Given three inconsistent reviews, you are accordingly invited to submit your rebuttals to address the major comments, especially to: 1) clearly explain whether the novelty is ranking loss alone or ranking loss plus the triplet loss. It is necessary to study the contribution of each effect by tunning the weights of each part of the loss function. 2) explain why these three CNNs were picked and how these network architectures influence the final result (e.g., MobileNet nullifies the effect of rank loss compared to the triplet loss). 3) explain the data partition in the experiments, reviewer #3 concerns there would be WSI-level data overlapping in the training and testing set. 4) explain the details of Fig.1, please refer to the comments from reviewers #1 and #2. 5) explain the details of visual assessment of colorectal cancer semantic segmentation, e.g., what does each color in Fig. 2 represent? 6) explain the selection of some important parameters, such as \lambda and learning rate, also the evaluation metrics.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

1) Clearly explain the novelty of the manuscript and discuss the effect of the weight of each part of ranking loss. The main contribution of the manuscript is ranking loss, which is built based upon triplet loss. In triple loss, there is no notion of the ordering among multiple classes. Ranking loss extends triple loss by introducing two additional terms to force the samples to be ranked in order of cancer grades. As the reviewers noted, the weights of the three terms of ranking loss could have a substantial effect on the performance. Due to the time and space limits, we are unable to conduct extra experiments to study the effect of weights of the terms, but we do believe that the optimization of the weights could lead to an improved classification.

2) Explain why three CNNs were chosen and how the network architectures influence the final results. We chose CNN architectures that are widely used and are built based upon different principles. DenseNet uses all the subsequent layers via concatenation and scales the network by its depth. MobileNet is a light-weight network using depthwise separable convolutions. EfficientNet adopts a compound scaling method that balances the width, depth, and image resolution. As shown in Table 1, ranking loss achieved the best performance regardless of backbone networks. However, the additive value of ranking loss was different among the networks. Using L_CE only, MobileNet was better than other two networks, but the performance gain by ranking loss and triplet loss was lesser for MobileNet. As a result, using ranking loss, MobileNet was poorer than other two networks. This indicates that the effect of ranking loss could vary with the backbone networks.

3) Explain the data partition in the experiments, in particular, the WSI-level overlapping in the training and testing set. The data partition is done at WSI-level and TMA-level, not image patch level. There is no WSI-level or TMA-level overlapping during data partition. Specifically, training, validation, and test sets include 4 TMAs and one WSI, one TMA and one WSI, and one TMA and one WSI, respectively.

4) Explain the details of Fig. 1 x_a, x_p, and x_n denote an anchor, positive, and negative sample, respectively. c_1, c_2, c_3, and c_4 are the four classes that are ordered as c_1<c_2<c_3<c_4. P denotes triples where x_a and x_p belong to the same class. Q and R include triplets from three different classes. In Q, the class label of x_a is closer to the class label of x_p than that of x_n. In R, the class label of x_a is equidistant from that of x_p and x_n, but x_p is ranked higher than x_a. L_rank and L_CE are ranking loss and cross entropy loss, respectively.

5) Explain the details of visual assessment and what does each color represent in Fig. 2. For each tissue image, we slide a rectangular window of size 1024x1024 pixels with a step size of 256 pixels, generating a set of image patches. The image patches are resized by half and used to produce the probabilities for the four classes. Averaging the probabilities over the overlapping patches, we assign a class label per pixel in the tissue image. In Fig. 2, colors denote different class labels. Blue, yellow, green, and red represent benign, WD, MD, and PD, respectively. We add color boxes to Fig. 2 to clarify this.

6) Explain the selection of parameters such as λ and learning rate, and evaluation metrics. The learning rate is set to 1e-4. After 30 epochs, it is decayed by 10. λ is set to 2 after cross-validation experiments within the training set only. As the reviewers noted, accuracy alone is not sufficient to assess the classification results. Hence, we used several metrics in the manuscript. Computing average precision and precision and recall for each class, we observed that ranking loss outperformed others and the results are not biased toward a certain class. We note that F1mic is eliminated since F1mic is equivalent to accuracy for multi-class classification.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Basically, most reviews are very positive. And the authors’ response addresses the issues I summarized as a primary AC, especially the clarification on the novel point and some implementation details. Thus I agree to accept this paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

There are mixed reviews. Two reviewers have questions regarding various design choices and more ablation studies. Result evaluation needs to include more metrics as well. More detailed dataset description is needed. While the rebuttal has provided clear responses to the reviewers’ comments, one important issue that might have been missed by the reviewers is that it is not clear what the novelty is. Ranking loss is not a new concept and there are many ranking-based CNN designs. There is no performance comparison with such existing methods.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

16

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal addressed the comments and provided explanation/arguments for the raised points, specially the novelty of the pipeline and the rationale of selected networks. The authors also provided missing details of the parameters, which still need to be added to the camera-ready version.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

back to top

Ranking loss: A ranking-based deep neural network for colorectal cancer grading in pathology images