Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Youyi Song, Lequan Yu, Baiying Lei, Kup-Sze Choi, Jing Qin

# Abstract

Learning from external data is an effective and efficient way of training deep networks, which can substantially alleviate the burden on collecting training data and annotations. It is of great significance in improving the performance of CT image segmentation tasks, where collecting a large amount of voxel-wise annotations is expensive or even impractical. In this paper, we propose a generic selective learning method to maximize the performance gains of harnessing external data in CT image segmentation. The key idea is to learn a weight for each external data such that good' data can have large weights and thus contribute more to the training loss, thereby implicitly encouraging the network to mine more valuable knowledge from informative external data while suppressing to memorize irrelevant patterns from useless’ or even `harmful’ data. Particularly, we formulate our idea as a constrained non-linear programming problem, solved by an iterative solution that alternatively conducts weights estimating and network updating. Extensive experiments on abdominal multi-organ CT segmentation datasets show the efficacy and performance gains of our method against existing methods. The code is publicly available.

SharedIt: https://rdcu.be/cyhMi

# Reviews

### Review #1

• Please describe the contribution of the paper

This paper proposes a generic selective learning method to maximize the performance gains of harnessing external data in CT image segmentation. The pipeline learns a weight to adjust the importance of each external data in the training loss while putting a hard constraint to enforce the network to learn better than without using external data. The problem is solved by alternatively performing weights estimation and network updating. Experiments using two abdominal multi-organ CT segmentation datasets show that the proposed contributions work better than state-of-the-art, in multiple scenarios.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• Learning from external data: not widely investigated but crucial issue!
• New generic selective learning formulation as a constrained non-linear programming problem
• Methodological contributions assessed through a detailed ablation study and multiple scenarios
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• One single assessment metric only (Dice): using other metrics (ASSD, MSSD…) could strengthen the experimental part
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The associated code will be released upon publication. The availability of the two abdominal multi-organ CT datasets is not mentioned.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The method is of high interest for the medical image analysis community. The submitted paper is innovative and very well written. The following comments could be taken into account for further improvements.

• In Eq.2a, why not also using a weight associated to each couple ${x_m,y_m}$ of the internal dataset (as for data arising from the external dataset) such that the impact of potential internal outliers can be removed or (at least) reduced?
• The “feasibility analysis” (Sect.2.2) is not really convincing…
• It would have been relevant to exploit other metrics (ASSD, MSSD…) in addition to the Dice to strengthen the analysis and resulting conclusions
• The article does not mention any cross-validation strategy. In particular, do you employ cross-validation for the simulations shown in Fig.2 and 3?

• The discrepancy between both distributions $\mathcal{D}_n$ and $\mathcal{D}$ (respectively from external and internal datasets) exploits an averaged loss value among all internal data. You should mention how does it overcome the drawback mentioned in Sect.1: “how to judge the loss value to be large enough for selecting external data remains elusive and is often done in a heuristic manner”.
• Dice gains between proposed and state-of-the-art (DS, RW, DD) methodologies could be confirmed using a statistical analysis through t-tests
• Sect.3 could be improved by providing qualitative (i.e. visual) organ segmentation results, additionally to quantitative ones.
• Two abdominal multi-organ CT datasets are employed in your experiments. As perspectives, could your approach be used for (much more) different datasets (e.g. multi-modal)?
• Typos: 1- please add “:” before Eq.1, 2- a verb is missing in the 2nd sentence of the “Feasibility analysis” paragraph, 3- “epochs” instead of “epoches” in Sect.3.1

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
• Strong methodological contributions on selective learning from external data
• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

This paper proposes a method for optimising training of neural networks when training data from more than 1 source is available and where the sources have slightly different distributions.

A new optimisation scheme is proposed where weights are assigned to individual data instances from the external source datasets. These weights are then learnt in an alternating manner with the network weights/parameters.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper addresses an important topic that is relevant for most neural network training scenarios. The proposed optimisation scheme can be applied in conjunction with many different network architectures and therefore has potentially strong impact.

Comparison to 3 other state of the art methods shows consistent outperformance by the proposed approach. Evaluation has been performed on two public multi-organ segmentation datasets and 8 different organs for each volume which indicate generalisable results. Experiments are carefully described and a hyper-parameter variance and ablation study are presented.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There are a number of fuzzy expressions and descriptions which should be explained more in detail or be rewritten to make the paper better readable and understandable (see comments below).

There is no discussion about the computational performance of the proposed approach. The authors criticise the method of Ren et al. for being computationally expensive but give no details about the proposed methods’ performance. The performance is likely to have great impact on whether the community will adopt the selective training strategy in own works.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors want to (publicly) release the code after publication. In general, the methodology and parameter settings are well described and sufficient references are provided for reimplementation.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper proposes an optimisation scheme for neural networks when training data from more than 1 source (internal + external data) is available and where the sources have slightly different distributions. The main idea is to calculate the difference of the loss of an external instance to the average loss over all internal instances. Several constraints are added to avoid trivial solutions.

The methodology is in general well described and evaluation has been done comprehensively with very good results. There are a number of expressions which should be clarified, though, as they are sometimes difficult to interpret:

• I begin with line 172 as this is most important: “For a fair comparison, we used the same experimental setup when implementing these methods, and their hyperparameters were set to the recommended values by the authors”. Does that mean all methods are using the same U-Net architecture with the same hyper-parameters and differ only in the respective sampling/weighting strategy? I interpret this as yes. If not, it would devalue a large part of the results as the individual accuracy could not be directly compared. Just to double check.

• line 86: “f__M is the optima without using external data”. This could be rephrased to “f_M is the optimum of the loss trained on internal data only”. The definition of f_M is given afterwards, but by reading line 86, it could be interpreted as being a theoretical optimum. It could also be added here that f__M has to be computed first which requires a complete training cycle on the internal data.

• line 113: “The first term in (3) is small when external data with a small discrepancy value having been assigned a large weight”. This is counterintuitive as obviously a smaller weight would lead to an even smaller term. It should be added that there is a constraint that enforces the sum of all weights being N.

• line 180: “..we randomly selected 50% data (CT volumes not 2D slices) for training and the remaining 50% for testing..”. Generally, it would be good to include cross-validation, especially as the single datasets are rather small (40+). There are also no tests whether the improvements over the other state of the art methods are statistically significant. Standard deviations are also not provided.

• line 200: internal data alleviation: Using 10%, 30%, 50% of the internal data essentially means training on 4,13, 20+ instances only. It is a bit unclear under what circumstances this would make sense instead of just swapping internal and external data.

• the term “weights” is usually used to denote network parameters. In order to avoid confusion with the weights associated to the external data instances, maybe a different name could be used, e.g. instance weights or similar. E.g. if someone is browsing the paper and looks at Figure 1, “weights” could be interpreted as network weights.

• there should be some discussion about the limitations of the proposed method.

strong accept (9)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed methodology is sound and very well evaluated and shows a clear advantage over the state of the art.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

This paper aims to a common problem, namely the maximize the performance gains of external data, and evaluated the effectiveness on the segmentation task. It leverages non-linear optimization technique to learn external data’ weights and update parameters.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Valuable motivation: The paper aims at how to the use of external data caused by the scarcity of data in the medical imaging scene. This is beneficial for the research and practicality of medical imaging algorithm.

2. Strong evalutation: The proposed method is applied to a segmentation task, and authors give a reasonable comparison with the SOTA method which shows the performance of the proposed method, that is, selecting “good” data, suppressing “useless” data, and using external data to optimize task.
3. Clear description: The authors expresse the idea better in formulas to help readers understand the details.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• Is there a visualizations in supplementary materials or experiments section to show the selected external data to visually demonstrate the effectiveness of the proposed method?
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide the code, which increases the reproducibility value of the proposed method.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The author may be able to add visual sample comparisons to show the guiding significance of the method proposed in this article for the construction of data sets. In actual clinical researches, it is tedious and time-consuming to construct a good dataset for training. The method in this article is dedicated to choosing a good sample. If the visualization results can be interpreted and can assist in the construction of the data set, it will help speed up this process.

The author mentioned that constraint is added after the 10th epoch. Is the proposed method an end-to-end training, or does it need to be added after obtaining a good baseline?

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper focuses on a key issue in the field of medical imaging, namely the use of external data, with clear motivation, clear description, reasonable evaluation and excellent results. From the perspective of methodology and results, the proposed method is more innovative. What is more encouraging is that the authors provide the code, which increases the reproducibility value of the proposed method.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

2

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Based on the reviewers’ comments, we think this is a high-quality paper with significant technical contributions. Authors should address reviewers’ comments in the camera-ready submission.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

# Author Feedback

Thanks for all reviewers’ affirmative comments on our paper, which are so helpful and constructive for improving the quality of this paper. Below are our response to the main concerns.

Q1: The availability of the datasets. (R1) They are publicly available, collected by other researchers; we shall provide the download link.

Q2: Why not also to learn a weight for internal data? (R1) We here do not consider any problems in the internal data; we assume that they are with very high quality for the learning, so follow the canonical way of learning from them.

Q3: Is it possible to apply our method to other different datasets, e.g. multi-modal? (R1) The learning framework has the ability to work on different dataets. As we claimed, it is a generic method to selectively learn from external data. However, we suggest to modify the discrepancy measure of distributions if necessary; we are not very sure that the used discrepancy measure can work well for all different datasets.

Q4: Are methods compared under the same U-Net architecture and the same hyper-parameters? (R2) Yes. All methods have the same network architecture and the learning setting, including the learning rate, optimizer, training epoch, etc.

Q5: Is our method an end-to-end training or requiring a good baseline first? (R3) Our method is an end-to-end training. We train the network from scratch, rather than from a good baseline. Starting to selectively learn after reaching some epoch is just for speeding up the training.

We shall also revise the paper as per the suggestion of other valuable comments. Thanks again for the time and effort spent on helping for improving the quality of this paper.