Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

SeokHwan Oh, Myeong-Gee Kim, Youngmin Kim, Hyuksool Kwon, Hyeon-Min Bae

# Abstract

In this paper, we present a scalable lesion-quantifying neural network based on b-mode-to-quantitative neural style transfer. Quantitative tissue characteristics have great potential in diagnostic ultrasound since pathological changes cause variations in biomechanical properties. The proposed system provides four clinically critical quantitative tissue images such as sound speed, attenuation coefficient, effective scatterer diameter, and effective scatterer concentration simultaneously by applying quantitative style information to structurally accurate b-mode images. The proposed system was evaluated through numerical simulation and phantom and ex-vivo measurements. The numerical simulation shows that the proposed framework outperforms the baseline model as well as existing state-of-the-art methods while achieving significant parameter reduction per quantitative variables. In phantom and ex-vivo studies, the BQI-Net demonstrates that the proposed system achieves sufficient sensitivity and specificity in identifying and classifying cancerous lesions.

SharedIt: https://rdcu.be/cyhU6

# Reviews

### Review #1

• Please describe the contribution of the paper

The paper describes a new neural network for quantitative ultrasound (QUS) of lesions. The input is RF ultrasound data and the output is QUS maps of speed-of-sound, attenuation coefficient, effective scatter diameter and effective scatterer concentration. Training is done on k-wave simulations and tests are done on physical ultrasound phantoms and ex vivo bovine muscle with artificial lesions to mimic cancer. Results are compared to “ground truth” which is unclear.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

QUS is a very hot topic in ultrasound and of interest to many MICCAI researchers. This is because quantitative ultrasound reduces the dependence on operator expertise which improves access to ultrasound capabilities in multiple clinical applications. This includes cancer detection which of huge importance. The four chosen metrics are of general interest. The proposed network appears to work so it is promising to have a complete neural network approach to QUS generation.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weakness is the lack of validation on real tissue in vivo for a specific clinical task. The use of k-wave simulations is a good start but human tissue is well known to be non-linear and unlike simple simulations. The physical ultrasound phantoms are promising but they are not described in enough detail to fully understand them, nor are they likely to match human tissue. Finally, the imitation cysts are also not described clearly and also not likely to match cancerous lesions. This means it is hard to evaluate the success of the proposed method. Furthermore, such QUS methods can be measured with classical techniques which does not appear to have been done. The “ground truth” is still unclear to me: is it provided by the manufacturer (it looks like CIRS phantoms in Fig 3)? The main weakness of replacing a classical measure of QUS with a neural network is confidence and repeatability of performance which have not been addressed in the current paper.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The simulations and phantoms are not described in enough detail to be able to replicate the results.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper would be improved by including results on human tissue in vivo with ground truth provided by independent repeated measurements so that the level of uncertainly of the ground truth can be provided. This is critical for a paper proposing to use a neural network to provide quantitative measurements. Also defend why a phane wave pulse-echo sequence is used since focused beamforming is far more common. It is not clear why the high acquisition rate of plane-wave imaging would be helpful. The Introduction should also make it clear that these QUS measurements can also be done with standard algorithms and there is a body of literature on improvement of classical methods (which have the advantage of explainability over NN approaches).

Also, if the focus is on cancer, then a cancer-specific approach to QUS is needed, i.e. describe how the QUS will be used and what accuracy is needed? The last sentence of “The proposed system … shows high potential for clinical purpose, especially in early detection and differential diagnosis of cancer.” is not justified by the results of this paper since no real cancer images were used.

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There is no comparison the easy-to-implement classical QUS measurements so it is hard for the reader to have confidence in the results from the proposed NN. The impact is also limited by not using any real cancer images.

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

The paper represents a nice application of the style transfer paradigm to give a potentially clinically useful and novel method for increasing specificity and perhaps sensitivity as well. They implement a novel feed-forward neural style transfer framework and a B-mode multi-resolution content encoder (images from RF data), fed into a quantitative image decoder to yield a spatially and contrast accurate image reconstruction. Four relevant tissue characteristic parameters are recovered with substantial improvement over more standard methods using simulated data. Phantom and ex-vivo measurements were also used to verify the accuracy of this method, which is usable with presently available transducers in-clinic.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

– the author presents a neural network based on B mode image to quantitative information NN transfer. The quantitative tissue characteristic are speed of sound (SOS), attenuation coefficient (AC) effective scatterer diameter (ESD) and effective scatter concentration (ESC). These measurements have clinical importance in determining whether a lesion is malignant or benign.
The image is obtained by developing a B mode-data-to-quantitative-imaging network (BQI-Net) which performs multi-variable quantitative image reconstruction with enhanced specificity and sensitivity and other well-known image metrics. This creates clinically informative quantitative parameter images thus enhancing diagnostic capability. The architecture is multilevel. And consists of 1) B mode contents encoder extracting geometric image information from B-mode ultrasound images generated from RF signals, 2) a style encoder extracting designated quantitative information (SOS etc.) and 3) a decoder synthesizing a quantitative image from the encoders’ output. The B-mode content encoder gives semantic contents of tissue geometry using multiple resolution. First a standard B mode image is formed then successive content features are found by pooling layers after convolutional layers – at decreasing resolution 128 by 128 to 16 by 16 by successively halving. 2) the style encoder uses the conditional instance normalization defined by first shifting by the mean and scaling by the standard deviation of the input RF signal from the Beam former. This is followed by scaling and shifting by suitable factors and finally 3) the quantitative image decoder translates the contents from the B-mode content encoder into the quantitative image using the output of the Style encoder. This is done at all four of the resolutions: 128, 64, 32, 16 square resolution. This B-mode to quantitate image translation is achieved using spatially adaptive demodulation (SPADE) followed by a series of residual convolution blocks, all respecting the appropriate level of resolution. The associated scale and shift factors are learned .
The multi-resolution subnetworks generate a detailed image superior to standard up-sampling methods.
Appropriate regularizing terms and ADAM are used with known learning rate. Dropout with a probability of retention = 0.5 is used for better generalization. Numerical simulation, phantoms and ex-vivo measurements with a 5 MHz Verasonics linear array, of bovine muscles with insertions imitating cyst, benign and malignant tissue were used.
The results of the numerical simulation showed consistent superiority to three other encoder-decoder pairs. One of them also had subnetworks for multi-scale representation. The phantom tests and results were compared with standard imaging methods and showed improvement, The results with the ex-vivo muscle measurements were impressive. The SOS, atten, ESD and ESC for the cyst were close to ground truth.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It would have been valuable to know how the ground truth for SOS, AC, ESD, ESC were established for these cases. Also there are literature values available for bovine muscle speed of sound and attenuation – it would have been useful to compare the values obtained from the BQI net with these literature values. Also it would have been useful to see the performance of the other standard ED networks on the phantom and bovine data.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

reproducible results, meet requirements

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See 3 and 4 above.
The paper is well written. The science is good. The idea appears novel. This is a nice application of the style transfer paradigm.

strong accept (9)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See 3 above. The paper is well written and this appears to be a novel application of the style transfer paradigm. The results of the simulations, phantom images and ex-vivo imaging are all well done. The description of the BQI-Net structure is good.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

This work presents a novel style-transfer based neural network for multi-variable ultrasound quantitative reconstruction. The proposed framework consists of a style encoder conditioned on the parameter (AC, SoS, ESD, ESC) label, a content encoder for extracting B-mode content and a decoder to estimate quantitative parameter map. The method achieves good results in quantifying lesions on ex-vivo and phantom data.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Using conditional style encoding and decoding to enable multi-variable quantitative imaging in one framework is an interesting and novel idea. The method also utilizes the B-mode geometric content to better localize the lesion location and shape. Compared to other widely used encoder-decoder based architectures, the proposed method archives better performance in estimating lesion shapes, while having less amount of network parameters. 2) The method is evaluated on both phantom and ex-vivo data with insertions imitating lesions and has achieved nice results in quantifying lesions in the demonstrated example.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) Presenting quantitative parameter estimation problem as style transfer is confusing. For example, it is hard to understand what is “quantitative style”. 2) The geometric contents extracted from the B-mode images help to better localize the lesion locations and shapes. However, it is not evaluated nor discussed how the network will perform for the inclusions, which do not have clear geometric shape in B-mode, e.g. stiff inclusion.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The network architecture is described in the paper. The authors will also release the training code later, people should be able to train the proposed method on their own dataset. Important training hyper-parameters are defined in the paper. The models and training procedures of the competing methods are however not given. The training data are simulated and simulation parameters are specified. With the provided description, other people could simulate training data with a similar distribution. Since the test phantom/ex-vivo data are private, it is hard to exactly reproduce the evaluation results in the paper.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1) It is hard to understand the term quantitative style, e.g. SoS style, AE style. To my understanding, the conditional “style encoder” extracts task-dependent information (estimation of SoS or AE etc.) and these information are used to supervise parameter map estimation from B-mode contents. The method is actually not aimed to do style transfer (appearance/texture matching using gan loss or style loss). This aspect should be made clear in the paper. 2) It is worth discussing in the paper how the network performs on the inclusions, which do not have clear geometric shape in B-mode, e.g. stiff inclusion, and verify, if this could be a potential limitation of the proposed framework. 3) The network is trained on simulated images. How is the overall generalization ability to unseen real data? 4) In the simulated training set, do the lesions differ in all of four parameters from the background regions?

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a novel idea of doing multivariable US quantitative reconstruction using the style transfer techniques. The proposed method is well evaluated on simulated, phantom and ex-vivo data. However, the presentation of the quantitative imaging problem as style transfer is in my opinion misleading and the discussion on the potential method limitation is missing.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

2

• Reviewer confidence

Confident but not absolutely certain

### Review #4

• Please describe the contribution of the paper

The paper demonstrates the novel BQI-Net that endows the B-mode image with one of quantitative ultrasound parameters by condition. BQI-Net is first constructed from HR-Net and then incorporated with CIN and SPADE to normalise and renormalise the quantitative parameters. The experiments indicate BQI-Net was capable of differentiating cancerous lesions and furthermore enabled identification of benign/malignant lesions.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Transferring a B-mode image to multiple quantitative ultrasound images is somewhat an interesting application for the conventional ultrasonography. 2) The proposed BQI-Net framework is relatively novel in terms of the conditional input style and the multi-resolution representations of the content. Normalisation and re-normalisation techniques with condition facilitate the succeed of BQI-Net in this process. 3) The experimental results provide a series of thorough analyses to BQI-Net. Especially, the phantom and the ex-vivo results reflect clinical significance.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) It is doubtful that BQI-Net trained on the simulated data is generalisable to the clinical data. Indeed, the simulation models constituted by only a few ellipses may result in limited modes for the trained BQI-Net. 2) Not sure which factors principally cause the performance boost of BQI-Net over the baselines. Indeed, Table 2 implies the total number of network weights in BQI-Net is 144M, more than those in the U-Net and the HR-Net. Moreover, number of training data used in BQI-Net may be 4 times of those for the baselines. 3) Lack of error bars in the evaluation metrics shown in Table 2. Which level does the BQI-Net outperform the others in? 4) The reference images (B-mode and elastography) for the reconstruction of breast phantoms in Fig. 3 look vague and hypointense, which may hinder the assessment to the BQI-Net reconstructions.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The checklist has truly reflected the reproducibility of the paper. However, the following items may be still substantial to show in the paper:

• A way to access the pre-trained models or the evaluation codes;
• A way to the dataset;
• A detailed plan and a comprehensive list for training both the baselines and BQI-Net; for example, how many training pairs were the baselines and BQI-Net fed in, respectively?
• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In addition to Point 4 and Point 6, I have some more concerns: 1) Is there a situation where the proposed BQI-Net may fail? 2) The font size in all the figures looks too small to read. 3) Please kindly consider using standard mathematical notations in the main text. For example, in Eq.3, it is vague to understand how to minimise an operator $G$.

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a novel neural network, and a series of experiments have potential in differentiating the types of cancerous lesions. However, training the network on simulation data may hinder the generalisability. Not sure which factors principally contribute to the performance boost of BQI-Net over the baselines. It is hardly reproducible with no full access to the source codes and datasets.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

6

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
1. The method uses B-mode images to guide a AE network conditioned on 4 quantitative outputs, which the submission calls “style transfer”. An encoder extracts a quantitative parameter style from the raw data, which is then applied on B-mode image content. As some reviewers noted, this seems to assume that the contrast and features exist in B-mode image, where the pixel values need to be adjusted for the quantitative information. I suggest the authors to comment on this aspect.

2. Reviewers all ask about how the reported groundtruth values were obtained, both for the phantom but mainly for the ex-vivo samples, and how accurate these are.
3. The ex-vivo experimentation has been described with very little information.
4. For the numerical phantom results, the authors should also report standard deviations among the test results as well as statistical significance of the statements and conclusions made.

5. As asked in the reviews, could the authors also comment on which aspects of the proposed BQI-net are thought to help achieve the reported results? It would be great to substantiate any such hypotheses with ablation experiments.

6. Method input in Fig.2 says “beamformed RF” which probably should be raw pre-beamformed RF as it is then shown to enter a delay-and-sum beamforming process.

Overall the paper touches on an important imaging question, with a holistic learning-based approach. I would suggest the authors consider and carefully respond to concerns and questions of the reviewers for the further consideration of this submission.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

# Author Feedback

<Reviewer #1, #3> Q1. The ground truth of the measurement is unclear. A1. Phantom experiments: The ground truth values of the breast and thyroid phantom are provided by the phantom manufacturer (CIRS Inc.). Ex-vivo experiments: The ground truth ESD and ESC are the actual diameter and concentration of the added scatter. In order to measure the reference AC and SOS of the phantom, two ultrasound systems were configured (probes facing each other) to gather the transmission data through the phantom. The AC and SOS are acquired by measuring attenuated amplitude and arrival time of traversed ultrasound waves (after calibrating the setup with just water). The reference AC and SOS values are measured 5 times for each insertion and the standard deviations are 0.032dB/cm/MHz and 2.75m/s, respectively.

<Reviewer #1, #6> Q2. Concerns on the generalization of simulated data. Is the neural network applicable to real clinical data. A2. Through t-Stochastic Neighbor Embedding (T-SNE) data analysis, we verified that the trained synthetic data distribution includes that of real measurements gathered from bio-mimic phantoms, and breast cancer patients. To supplement the clinical usefulness of the study, additional experiments were performed in patients with benign and malignant breast cancer, and are introduced in : ** external link removed by PCs

<Reviewer #1> Q3. Describe why a plane wave pulse-echo is used rather than focused beamforming. A3. In this study, quantitative features are obtained by analyzing reflected signals of multi-angle ultrasonic plane waves. If the multi-angle transmission is implemented by using conventional focused beamforming, intersectional regions insonified by multi-angle incident waves will be highly limited and the field of view will be reduced. As such, multi-angle plane waves are a proper choice for the chosen ROI [Feigin M et al., 2019].

<Reviewer #6> Q4. What contributes to the accuracy of the BQI-Net is unclear. A4. The accuracy of the BQI-Net is due to the boundary information provided by the B-mode contents. When the B-mode contents encoder and SPADE module are removed from BQI-Net, the ablated network becomes identical to HR-Net in Table 2, which demonstrates 26% reduction in RMSE and lower SSIM. Quantitative assessments are also provided in supplementary C, and verify that utilization of B-mode image enhances precise description of lesion shape. The BQI-Net and baseline models are trained with an identical number of input and label pairs for fair comparison.

<Reviewer #5> Q5. How the network performs for the inclusions which do not have a clear geometric shape in B-mode. A5. In BQI-Net, the B-mode image is not the only factor that determines the geometric shape of the quantitative image. Rather the B-mode is used as a supplementary information to enhance the precision of lesion shape. Figure 3.b shows representative results where the B-mode has an unclear boundary. In this case, the B-mode can not provide clear lesion boundary, but the BQI-Net retrieves the shape of the lesion from the raw RF data. However, precision boundary delineation, in this case, is compromised. Last phantom measurement in Supplementary C proves that the BQI-Net demonstrates comparable performance with HR-Net, when the B-mode does not provide lesion geometry.

<Reviewer #6> Q6. The authors should also report standard deviation. A6. In the final manuscript, we will gladly add standard deviation in the tables.

Q7. The ex-vivo experimentation has been described with very little information. A7. We will add more analysis/discussion of ex-vivo experiments including how ground truth of inclusion is acquired, and reconstruction of bovine muscle quantitative value compared to literature values as reviewer #3 suggested.

In the final manuscript, we will correct “beamformed RF” and font size in Fig.2 and other details that were pointed out

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Several comments are addressed in the rebuttal. Although there still stays the concern of how generalizable and applicable the introduced methods would be for in-vivo imaging and pathology, I believe the submission has value and can invoke interesting discussions at MICCAI.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors propose a style-transfer based network to estimate quantitative information from RF data. The network learns the mapping between channel data and the associated quantities from simulations. The trained network is then applied to simulations, phantom and ex-vivo data, being able to recover physical properties with good accuracy.

The authors have addressed the main concerns raised by the reviewers satisfactorily, particularly clarifying some unclear aspects including choice of parameters and details about the data used. The rebuttal to one major question (generalization from simulations to real data) is not quite satisfactory though, however the authors promise to include results that they have which prove this.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

79

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal fails to address the improvement over prior QUS methods or classical QUS methods (Rev 1 comment). All the baseline architectures are either developed for computer vision applications or have not been previously used for QUS generation. Therefore, although important, the baseline comparison does not provide any value to judge the success of the proposed method over prior QUS methods. The authors are also not discussing this concern in their rebuttal.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10