Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yukun Zhou, Moucheng Xu, Yipeng Hu, Hongxiang Lin, Joseph Jacob, Pearse A. Keane, Daniel C. Alexander

Abstract

Accurate multi-class segmentation is a long-standing challenge in medical imaging, especially in scenarios where classes share strong similarity. Segmenting retinal blood vessels in retinal photographs is one such scenario, in which arteries and veins need to be identified and differentiated from each other and from the background. Intra-segment misclassification, i.e. veins classified as arteries or vice versa, frequently occurs when arteries and veins intersect, whereas in binary retinal vessel segmentation, error rates are much lower. We thus propose a new approach that decomposes multi-class segmentation into multiple binary, followed by a binary-to-multi-class fusion network. The network merges representations of artery, vein, and multi-class feature maps, each of which are supervised by expert vessel annotation in adversarial training. A skip-connection based merging process explicitly maintains class-specific gradients to avoid gradient vanishing in deep layers, to favor the discriminative features. The results show that, our model respectively improves F1-score by 4.4%, 5.1%, and 4.2% compared with three state-of-the-art deep learning based methods on DRIVE-AV, LES-AV, and HRF-AV data sets.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_46

SharedIt: https://rdcu.be/cyhMo

Link to the code repository

https://github.com/rmaphoh/Learning-AVSegmentation

Link to the dataset(s)

https://medicine.uiowa.edu/eye/rite-dataset

https://figshare.com/articles/dataset/LES-AV_dataset/11857698

http://iflexis.com/downloads/HRF_AV_GT.zip

https://github.com/rmaphoh/Learning-AVSegmentation


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a framework for artery/vein segmentation from retinal images. Their framework splits up multi-class segmentation into first segmenting arteries and veins independently, and then fusing the binary segmentations with another network into the final multi-class segmentation. Evaluations on three publicly available datasets show good performance of the method, outperforming the previous state-of-the-art.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of splitting up the a/v segmentation into first individual binary segmentation and then fusion and multi-class segmentation seems promising.
    • The extensive evaluation shows that the proposed method outperforms the previous state-of-the-art.
    • The ablation study shows the contributions of many individual components.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method section is unclear, missing some details and contains some errors.
      • The GAN equations are wrong.
      • The notation in the text and the figures is not consistent and can be mixed up quite easily (e.g., f_m in section 2.2 and f^m in Fig 3 (a))
      • The losses BCE and MSE are not defined. For the binary network outputs, I suppose the authors use a sigmoid cross entropy. Is a sigmoid cross entropy (=binary) also used for the multi-class networks or do they use a softmax cross entropy?
      • More comments in 7.
    • Some parts of the method are not justified or discussed.
      • What is the influence of the “Skip-connection based merging process in segmenter”? As I understand from the ablation study, it is only evaluated in combination with the adversarial losses. It should be evaluated independently. Or is it needed for network convergence? If so, I would not see why. A discussion from the authors is needed.
      • Why is deep supervision only used in the “multi-class segmenter” and not in the binary segmenters?
      • Why do you need both BCE and MSE losses? Both of them optimize the same task. At least mention it in the discussion.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper should be well reproducible as the authors state that they will publish the source code upon acceptance and as publicly available datasets are used.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Introduction: The introduction is well written and motivates the proposed method.

    Figure 1: You could put the row captions directly into the figure, e.g., (a) Input, (b) GT, etc.

    Figure 3: “Adeversarial Loss” - should be “Adversarial Loss” Some parts are not explained or mentioned in the text, e.g., what is “Artery to Multi-class”?

    Methods:

    Skip-connection based merging process in segmenter: This sub-section is hard to follow. The notation of Equation (1) is not consistent with Figure 3 and different terms are used. Also, there is no justification of the proposed merging strategy. Is it really needed? What would be the difference to a vanilla U-Net? Also without this merging strategy, there would be enough contributions in the paper.

    Adversarial training: “adopt” - should be “adapt” The equations that define the generator and discriminator losses are wrong. “y and z indicate fundus image” - this does not make sense. Should y be a noise vector? The “|” also does not make sense in the equations. I would suggest to look into other papers to fix the equations, e.g., Dai et al. “Towards Diverse and Natural Image Descriptions via a Conditional GAN”. Due to the inexact equations, it is not clear, what is the input for the discriminator, especially as it is shared for all three networks. Are the segmentation inputs one-hot encoded or represented as integer values? From Equation (2) L_BCE and L_MSE it is not clear on which images the losses are calculated. While it is shown in Fig. 3 it should also mentioned in the text. The name “L_main” is not well chosen.

    Binary-to-multi-class Fusion Network: The first paragraph is hard to follow. It is not clear how the individual network outputs are fused. “The artery segmentation map f_a, multi-class feature map f_m, and vein segmentation map f_v are concatenated to generate fused feature maps for the next convolution operation.” - What is the next convolution operation? Is this a final convolution operation that fuses the concatenated outputs to generate the final multi-class output? This is not clear. The second paragraph does not fit in this sub-section. Maybe another subsection (e.g. “Supervised Training”) that contains the missing description of the losses (L_BCE and L_MSE) as well as the deep supervision would be good.

    Table 2: “blanket” - should be “bracket” “growth” is not defined - I would simply skip it.

    Comparing with most recent methods: “remarkably enhances” - the term “remarkably” does not fit. “All segmentation maps refer” - “For all segmentation maps, the reader is referred to”

    Ablation study: “As shown in Table 2, the adversarial segmentation network performs better due to skip-connection based merging and the pixel-level adversarial learning, when compared with vanilla U-Net [20] in first line.” Does this mean, that the first line in Table 2 is a U-Net without the skip-connection based merging as well as the adversarial loss? If so, this should be split up as otherwise the individual contribution of those two parts are not known. Especially, as otherwise the skip-connection based merging is not justified.

    Fig. 5: You could put the column captions (GT, Ours, U-Net) directly into the figure.

    Discussion: “deep supervision also improve the” - should be “improves”

    Supplementary Materials: While the predictions look good, the groundtruth or difference to the groundtruth should also be shown for comparison. Additionally, the input images could also be shown.

    General: Some formulations are not good. Another proof read of the paper would be necessary.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall approach of splitting up a/v segmentation into first individual segmentation and then merging is reasonable. The good performance of the proposed method on several datasets further strengthen the contribution. However, especially the method section could be better structured, while some parts also need clarification. Furthermore, the “skip-connection based merging process in segmenter” needs justification and further discussion.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposed a binary-to-multi-class fusion network to merge multi-class representations and binary-class representations for multi-class vessel segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The idea of merging multi-class and binary-class is interesting. (2) The proposed method outperformed some vessel segmentation methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Some results are not convincing. (see following detailed comments 1). (2) The key ablation studies are missing. (see following detailed comments 2).

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Most key experiment details are included so that it is possible to replicate this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    (1) Some results are not convincing. In Table 1, the author reported the sensitivity of MNNSA method [15] as 59.88% on DRIVE-AV dataset. However, in the original paper [15], the sensitivity of vessel segmentation is 79.16% on DRIVE-AV dataset. The reported results of MNNSA are inconsistent with their original paper. Please explain the reason.

    (2) The key ablation studies are missing. This paper aims to merge multi-class representations and binary-class representations for multi-class vessel segmentation. But the comparisons between the merged results and multi-class/binary-class results are missing. So, the contribution of this paper cannot be proved.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the idea of this paper is interesting, I still have two comments:

    (1) Some results are not convincing. In Table 1, the author reported the sensitivity of MNNSA method [15] as 59.88% on DRIVE-AV dataset. However, in the original paper [15], the sensitivity of vessel segmentation is 79.16% on DRIVE-AV dataset. The reported results of MNNSA are inconsistent with their original paper. Please explain the reason.

    (2) The key ablation studies are missing. This paper aims to merge multi-class representations and binary-class representations for multi-class vessel segmentation. But the comparisons between the merged results and multi-class/binary-class results are missing. So, the contribution of this paper cannot be proved.

    The authors should address these comments during feedback.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    6

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper introduces a system for artery/vein segmentation on retinal fundus images based on the combination of two binary segmentation models (artery vs all, vein vs all) into a multi-class model. The entire model makes use of adversarial training and deep supervision, and seems to result in improved results when compared with some other recently proposed techniques.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach here is focused on resolving ambiguities in a vessel segment, where predictions could be mixed instead of only artery or only vein, which is typical in this problem. The method takes advantage of several advanced techniques like adversarial training and deep supervision, which at first glance seem a bit overkill, but the authors included ablation studies to show the contribution of the different pieces. Experimental validation seems quite rigurous, although I have some doubts on this (see below).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Handling of the uncertainty class: The three datasets considered in this paper contain pixels labeled in four classes, namely artery, vein, background and uncertain. The authors do not mention how do they handle the uncertain class, but it seems from the evaluation section that they are actually measuring performance on artery, vein, and uncertain pixels (page 6 below). It is not clear if the method predicts uncertain pixels or not? 2) Handling of the background class: In the end, it is very unclear if the proposed technique predicts artery and vein pixels, or it also generates a full segmentation that includes background too. It seems to me that the model is solving artery/vein classification on top of vessel pixels, and the authors start from the vessel map assuming it is given, is this the case? If it is, I would not call this multi-class but rather binary, am I wrong? 3) Evaluation 1: Deriving from the previous points, I don’t understand how the evaluation is done for non-binary measures (ROC, PR, MSE). Is it approached as a multi-class problem? What would be the classes, artery, vein and uncertain? Moreover, it seems a bit funny that 99% of the P-values are 1.83e-4, is this a typo or all P-values ended up being exactly the same? 4) Evaluation 2: Considering that the paper claims to resolve intra-segment contradictory results, I was expecting to see some evaluation that would focus on that particular aspect of the method, but all reported metrics are global (at the image level), instead of local (at the segment level). 5) Hyperparameter impact: It seems from the supplementary material that the behavior of the hyperparameters is very wild and indicates a considerable instability of the proposed technique. For example, why do the authors think that using alpha=0.4 and alpha=0.6 results in good performance, but using alpha=0.5 results in terrible performance? In addition, tuning the hyperparameters for each dataset is a bit weak, one would expect the same model with the same hyperparameters to work for all datasets, otherwise one needs to tune hyperparameters for each new data that arrives. This is particularly weak for the beta hyperparameter: setting it to 0.8 results in the best performance for LES-AV but almost the worst performance for DRIVE. I believe this deserves further discussion. Finally, how were hyperparameters searched for HRF?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The three datasets employed in this paper are public, and the authors promise to release the code. There are a couple of points in the reproducibility sheet that are not really addressed: An analysis of situations in which the method failed/A description of the memory footprint: The authors indicate that these have been fulfilled, but I don’t think that is the case.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The two most important defficiencies in this paper, in my opinion, are as follow: 1) It is not clear if the method performs artery/vein classification from pre-available vessel maps, or artery/vein/background, and how does it deal with uncertain pixels (labeled as such in the ground-truth). This should be clarified from the beginning, since the authors make lots of references to the multi-class to binary fusion, etc. If the background is being considered during training, it seems to me that the evaluation metrics employed are not correct (see F formula in page 6 at the bottom). 2) Since the method claims to be useful for intra-segment disambiguation, I would recommend to adopt some kind of performance analysis that shows how this aspect of the problem is improved by their approach. Possibly splitting the vasculature into separate segments, and observing how uniform the predictions are per-segment.

    If time/space allows, the other concerns I describe in the “weaknesses” section, particularly the comment on hyperparameters and stability of the methods, should also be discussed, I think. Some minor issues: a) I believe the title of the paper should be switched to “Learning to Address Intra-segment Misclassification in Artery/Vein segmentation b) Please review your figure 3, it reads “adeversarial”

    Note: Since the authors report results on competing methods by training them and generating their own predictions, I think the comparison is acceptable as it is relative to the same evaluation process (not taking values from other paper’s tables). Nevertheless, evaluation details should be clarified.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the focus of the paper is on resolving intra-segment confusion, which is an interesting aspect of the artery/vein problem, the technique is never evaluated to understand if there is a quantitative improvement on that particular side of the problem. Moreover, as I mentioned above, there is some confusion on the purpose of the method regarding the background and uncertain classes. It is hard to assess the usefulness of the model if this is not clarified. Finally, the technique is quite complicated, with multiple moving parts and lots of losses. This results in quite a few hyperparameters, the effect of which seems unclear by looking at validation performance in the hyperparameter search section of the supplementary material.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    While all the reviewers found the merits of this paper, they also agreed that there are a few problems regarding the technique details and experiments. In the rebuttal, the authors should focus on explaining some technique issues (R1), addressing experimental concerns (R2 and R3)

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We appreciate the detailed comments and positive feedback from all three reviewers. Following AC’s guidance, we address the technical and experimental issues below. In addition, we will make minor edits regarding the notation and typos.

R1.1: GAN equations are wrong, potentially causing the discriminator input unclear. We will correct the typo in the equation as suggested. The discriminator input is indeed the concatenation of segmentation probability map (or ground-truth) and retinal image, as shown in Fig 3.

R1.2-1.5: The reviewer raised questions on the motivation and clarification on a number of network architectural choices and the training loss design, including the definitions of BCE, MSE, skip-connection, and deep supervision.

1.2 - BCE and MSE were defined in Fig 3, we will further clarify in the text. For multi-class segmenter, ground-truth binary map and segmentation probability map are both represented by three channels. Over these three channels, the loss was computed by averaging three BCEs - a variant to the standard multi-class CE with different class weighting.

1.3 & 1.4 - Skip-connection based merging and deep supervision location were designed intuitively, but we agree that investigating the effectiveness of alternative designs may be interesting.

1.5 - The next convolution operation is the final convolutional block to generate the multi-class segmentation map (Sec.2.2 and Fig.3).

R2.1: Some results are not convincing, e.g., in Table 1, reported results are inconsistent with the original paper [15]. The related comment is R3.1: how the evaluation is done for non-binary measure? and R3.2: Most of P-value in Table 1 are the same. We clarify that the task in this paper is different, arguably more challenging and evaluation is more stringent. 1) a sensitivity of 79.16% was reported in [15] for only segmenting vessels without differentiating the subtypes; 2) As detailed in “Evaluation metrics”, the binary classification metric was for each class (artery, uncertain pixels, and vein), e.g., artery versus background in the artery channel, to obtain F1-scores F_a, F_u, and F_v, then weighted by the ratio of the pixel counts. We argue that this bespoke measure provides a more intuitive and direct assessment for segmenting all vessel subtypes. Each pixel of the retinal photography is one of a wider range of classes (artery, uncertain pixels, vein, and background) compared to binary classification (vessel and background)[15]. A direct comparison may not be appropriate.

The same P-values from a ranking-based non-parametric Mann-Whitney U test indicate consistent metric ranking of the proposed method and [2], indeed an interesting observation.

R2.2: Key ablation studies are missing, i.e., comparisons between the merged results and multi-class/binary-class results. This important comparison was indeed included in the last two rows of Table 2 (merged results and multi-class results) and “Ensemble” in Table 1 (ensemble of binary-class results). We will highlight them in text.

R3.3: We appreciate several suggestions for clarity, including “I am not clear if the method predicts uncertain pixels or not?” and “it seems the model is solving artery/vein classification assuming the vessel map is given?”. We would like to further emphasise that, as described in Sect. 2.2 and Fig 3, the input of the model is retinal photography and output is a three-channel probability map (artery, uncertain pixels, and vein), and no vessel map is needed in inference.

R3.4: Reported metrics in Table 1 are global, instead of local. Qualitative local results were provided in Fig 5, with the lack of a widely-accepted definition of local metric. We believe it is an interesting future research question.

R3.5: Hyperparameter tuning is sensitive. Tuning the hyperparameters for each dataset is a bit weak. As introduced in “Implementation and Training”, all hyperparameters are fixed. We will update Supplementary Fig 1 to reflect the stableness.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed most of the issues raised by the reviewers. I recommend for getting accepted of this paper. However, one important reference is missing: Retinal Vascular Network Topology Reconstruction and Artery/Vein Classification via Dominant Set Clustering, IEEE Transactions on Medical Imaging, 2020, 39(2): 341-356

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors responded to the major concerns, such as technique issues, and experiment. Thus, I prefer to accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Generally, the reviewers have consistently positive feedbacks for this paper in the first round review and the authors have addressed/clarified most major concerns raised by reviewers and the meta-reviewer. Thus, the meta-reviewer would like to suggest “Accept” of this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9



back to top