Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Ruiheng Chang, Dong Wang, Haiyan Guo, Jia Ding, Liwei Wang

Abstract

Ultrasound segmentation models provide powerful tools for the diagnosis process of ultrasound examinations. However, developing such models for ultrasound videos requires densely annotated segmentation masks for all frames in a dataset, which is unpractical and unaffordable. Therefore, we propose a weakly-supervised learning (WSL) approach to accomplish the goal of video-based ultrasound segmentation. By only annotating the location of the start and end frames of the lesions, we obtain frame-level binary labels for WSL. We design Video Co-Attention Network to learn the correspondence between frames, where CAM and co-CAM will be obtained to perform lesion localization. Moreover, we find that the essential factor to the success of extracting video-level information is applying our proposed consistency regularization between CAM and co-CAM. Our method achieves an mIoU score of 45.43\% in the breast ultrasound dataset, which significantly outperforms the baseline methods. The codes of our models will be released.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_62

SharedIt: https://rdcu.be/cymbp

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a weakly-supervised ultrasound video segmentation method in the context of breast lesion. The locations of the start and end frames of the lesions are the only source of supervision for the task of lesion segmentation, hence this work falls in the category of “weakly-supervised learning”. The author proposes a co-attention mechanism that learns the frame-wise correspondence and utilizes and optimize class activation maps (CAM) to segment lesion. The proposed method outperforms baseline models in terms of the mIoU score.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The strengths of the paper are listed as follows in terms of technical contributions and clinical relevance:
- The paper is relatively easy to follow in terms of technical expression. It is well organized and self-contained. Most details are well conveyed. The investigated issue involves segmenting lesions from ultrasound videos, which is of certain clinical relevance to provide real-time feedbacks during breast ultrasound examination .
- The proposed video co-attention network is of certain technical novelty. The author introduces a co-attention module that models correlation between the CAM of two frames. By applying a consistency loss, CAM’s bias towards larger region is alleviated to some extent.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The weakness of the paper is listed as follows in terms of technical details:
- In terms of experimental results, it is usually expected to conduct cross-validation and statistical hypothesis testing for in-house dataset. This part is missing in the paper.
- Section 4.4 Ablation study is actually about hyper-parameter tuning. The reviewer is expected that the impact of three loss terms were analyzed respectively. How important is the nGWP structure as a practical design choice?
- In video-related analysis, the processing speed in terms of frame-per-second is a key factor to consider. This is missing in the experimental section.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author claims to open-source the code. The reviewer wonders whether the in-house dataset shall be released as well. The architecture is well explained. Overall, the reproducibility should be good.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The reviewer suggests to enrich the experimental results with cross-validation and further ablation studies as stated in Question 5.
- The baselines do not contain any published work, such as literature [1] and [20]. It is expected to benchmark more widely, especially for a in-house dataset.
- Put at least one figure of segmentation results that contain comparisons between baselines in the main paper, not the supplementary material.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The concern on technical novelty and relevance to clinical applications are the major factors that lead to the reviewer’s overall score.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

The paper proposed a weakly-supervised video segmentation method for ultrasound video. The main contribution of the paper is the Consistency Loss between CAM (Class Attention Map) and co-CAM (Co-Attention CAM).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed consistency loss is very effective as shown in the experimental results.
- The authors promise to release code, which will be very helpful for reproductivity purpose.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Because CAM and co-CAM are proposed by prior works, so “CAM and co-CAM without consistency loss” should be one of the baselines in Table 1.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code will be released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Please give the full name of CAM and co-CAM in introduction.
- In 4th paragraph in Page 5, “discriminate” should be “discriminative”.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The consistency loss is very effective as shown by the experiments.
- One concern is the lack of more convincing explanation why CAM and co-CAM without consistency works for natural video tasks but totally unhelpful in this paper.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper proposed a weakly supervised segmentation network for ultrasound video. CAM and co-attention are adopted in the network to generate pseudo labels, a consistency loss between cam and co-cam is proposed.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper is well written and organized.
- The consistency loss between cam and co-cam seems novel and their statements of the rationale is reasonable.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Both cam and co-cam are from published work (Ref [28] and Ref[20]).
- No comparison with other published SOTA.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides sufficient details about the models/algorithms, datasets, and evaluation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Since both cam and Co-cam predicted larger region than the ROI. Why Co-cam make aggressive predictions?
- Cam and Co-cam predicted larger region, will it be possible that the consistency loss between these two module lead to a pesudo label with larger ROI?
- Authors should compare their method with SOTA.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The presentation of this paper is clear.
- The method is some how novel.
- They used ultrasound video instead of 2D images, which is more relevant to clinical.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper received three reviews with positive feedback. The reviewers collectively raise several concerns. The authors are encouraged to address those in the final version. These issues include statistical hypothesis testing, proper ablation study, qualitative comparisons between baselines, and proper selection of baselines.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

N/A

back to top

Weakly-Supervised Ultrasound Video Segmentation with Minimal Annotations