Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jianfeng Zhao, Xiaojiao Xiao, Dengwang Li, Jaron Chong, Zahra Kassam, Bo Chen, Shuo Li

Abstract

Quantitative measurement of hepatocellular carcinoma (HCC) on multi-phase contrast-enhanced magnetic resonance imaging (CEMRI) is one of the key processes for HCC treatment and prognosis. However, direct automated quantitative measurement using the CNN-based network a still challenging task due to: (1) The lack of ability for capturing long-range dependencies of multi-anatomy in the whole medical image; (2) The lack of mechanism for fusing and selecting multi-phase CEMRI information. In this study, we propose a multi-function Transformer regression network (mfTrans-Net) for HCC quantitative measurement. Specifically, we first design three parallel CNN-based encoders for multi-phase CEMRI feature extraction and dimension reducing. Next, the non-local Transformer makes our mfTrans-Net self-attention for capturing the long-range dependencies of multi-anatomy. At the same time, a phase-aware Transformer captures the relevance between multi-phase CEMRI for multi-phase CEMRI information fusion and selection. Lastly, we proposed a multi-level training strategy, which enables an enhanced loss function to improve the quantification task. The mfTrans-Net is validated on multi-phase CEMRI of 138 HCC subjects. Our mfTrans-Net achieves high performance of multi-index quantification that the mean absolute error of center point, max-diameter, circumference, and area is down to 2.35mm, 2.38mm, 8.28mm, and 116.15mm^2, respectively. The results show that mfTrans-Net has great potential for small lesions quantification in medical images and clinical application value.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_8

SharedIt: https://rdcu.be/cyl5B

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces a hybrid model that uses CNNs and Transformers for quantifying multi-index on HCC. In particular, the architecture has three phases. In the first phase, three independent CNNs are used to learn local representations for each modality. In the second phase, global representations for each modality are learned using Transformers independently. In the third phase, inter-modality representations are learned using another Transformers. Experimental results demonstrate that the proposed method is better than standard CNNs such as ResNet and VGG.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper extends vision transformers (hybrid version) for medical imaging.
- Delivers better performance than state-of-the-art CNN methods, such as ResNet and VGG.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

In general, I do not have any major concerns with this paper. However, the paper has a lot of typos and grammatical errors, and needs a careful proof-read. Also, the writing of the paper can be improved.

Also, I do not understand why authors say “non-local transformers”. Transformers, to my understanding, are itself non-local or global as they attend to every patch. I do not understand what authors are trying to emphasize with “non-local”. If authors meant to say that CNNs are local and transformers are global, then simply, say that because CNNs learn local representations, we use Transformers to encode global representations. Do not include extra terms like “non-local”

Suggestions: i) Do not use “and” or “especially” after the full stop. ii) “The multi-phase CEMRI was obtained by using” —> “was obtained using” iii) “evaluated by MAE. In which the MAE of the quantification” —> “evaluated by MAE. The MAE of the quantification” iv) “Table.2” should be “Table 2” (no full stop between Table and 2) v) Why is the feature map size of inputs to phase-ware Transformer is different from the output of encoders? vi) Query, Key, and Value are projections and not metrics (just below Eq. 3)
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides sufficient details for reproduction.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See strengths and weakness section.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper extends Transformers to medical imaging and shows that global representations learned using Transformers helps improve performance. In particular, previous works (e.g., vision transformer) on standard vision tasks have leveraged large amount of data in order to deliver state-of-the-art performance. Unlike these methods, this work shows that Transformers can be used on low-resource corpora.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposed a new method for HCC quantitative measurement. The original contribution is using the newly designed multi-function Transformer to capture the non-local information and multi-phase CEMRI relevance. This paper provided a new framework that combining CNN and Transformer, which has great inspiration for the medical tasks when using multi-modality images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) This paper introduced a new framework to perform HCC quantitative measurement via using multi-phase CEMRI. (2) It is interesting to try to capture the multi-phase CEMRI relevance via using the phase-aware Transformer. (3) The overall manuscript is clearly well presented and easy to follow. (4) The experiments are generally well-designed and the evaluation is solid.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The English writing should be improved.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper is detailed in great detail and reproducable
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

There are some grammar mistakes. Some examples are: (1) Paragraph 2 on page 3: …is not only executed after the multi-function Transformer, but also each non-local Transformer. (2) Section 2.1 on page 4: As step1 shown in Fig.2, … (3) Section 2.2 on Page 5: we designs…
Please state your overall opinion of the paper

ground-breaking (10)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In general, this paper is well written and easy to follow. And the experiment is well-designed and promising. But these are not the reasons why I emphasize this paper. I personally like this work very much because the proposed multi-function Transformer is fantastic. Recently, the vision Transformer has shown great power in image processing. In this paper, the author not only uses the Transformer to extract context features but also skillfully captures the relevance between different phases of CEMRI. I think the idea in this paper is very enlightening to deal with multi-modality medical image information.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The authors design an image regressor that uses a CNN to extract imaging features, then passes the features through a multi-head attention network whose output is the set of regression variables. To allow the transformer to attend to different parts of an image, the feature space is split into patches, and each patch is fed to the transformer as a token. A second type of transformer is used to attend to multiple phases of a multiphase MRI sequence, where each phase’s feature representation serves as a separate token. The authors apply this technique to determine the position and dimensions of focal hepatocellular carcinoma (HCC) lesions in slices of multiphasic MRI. Their technique outperforms VGG-16, ResNet-50 and DenseNet.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method is fairly novel. It leverages attention in two distinct ways - to extract long-range dependencies and to integrate information across the phases of a multiphase MRI sequence. Their key new insight is that attention can be used to analyze different phases in multiphasic MRI and different modalities in multimodal imaging.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The application to HCC localization/measurement is unsuitable for the method: 1) The location and dimensions of a lesion can be determined from local information, and usually on a single phase. No long-range dependencies or dependencies across phases are needed for this task, so the attention modules are overengineered. 2) It is trivial to extract the position and dimensions from a segmentation of the HCC, and focal HCC segmentation is usually easy. The motivation/validation of this technique would be stronger if the network could accurately predict other quantities of interest such as HCC stage (LI-RADS score) that cannot be immediately determined by a segmentation, and that certainly do rely on integrating information across multiple phases. 3) What will this network do on multifocal or infiltrative HCC? In the multifocal case you want separate measurements for each lesion. Those are the cases where it is most challenging and time-consuming to do manual analysis. 4) It only works on slices. What if you wanted to measure lesion volume (vRECIST)? How can you integrate information across different slices?

I think your results - both the comparison to VGG/ResNet/DenseNet as well as the ablation study - can be explained just by saying that having more network parameters improves performance. You should show number of parameters in each network and try to get them all to roughly align.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

No code provided, private dataset used.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

What is the feature map encoding? Since it’s a single output channel, plot some examples - is it a rough segmentation of the HCC? The decision to make it a single channel is also unusual and the authors should comment on this decision. Did it do worse with multiple channels?

I think it’s a stretch to call it a transformer when there’s no output sequence and there’s only a single stack of multi-head attention. Would prefer a name highlighting the use of attention in your 2 distinct ways.

It would be nice to see examples of attention maps. Is the transformer focusing its attention on the HCC?

The authors should mention how the HCC size threshold of 30mm is clinically meaningful, if relevant. But I would prefer seeing average % error for each metric (i.e. absolute error / true value) in the table, or a plot of true and predicted values for each subject (1 plot per metric).

It sounds like the “patch cropping” selects “patches” of size 1x1, which I think makes the name a bit misleading. If keeping this terminology, the authors should justify the choice of 1x1 patches.

I don’t think the number of attention heads was stated.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There are ideas here that would be of interest to the MICCAI community, with a novel formulation of a challenging task. Still, the stark mismatch between the application and the motivation of the method is difficult to overlook. This could be a strong paper if a more suitable application is chosen.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Although the writing can be improved the authors propose an improvement over prior work by proposing a fusion-based transformer architecture. The use of multi-phase CEMRI is also interesting. Improvements over prior work have been shown. Code will be made available. The majority of the reviewers agree that the work is novel and suitable for publication at the MICCAI conference.

For the camera ready version: 1-please correct all the typos mentioned by the reviewers. Carefully proof read the paper after. 2- Justification of the use of multi-phase CEMRI for HCC diagnosis should be clearly explained. If the multi-phase CEMRIS is not standard of care this should be acknowledged as a drawback in the discussions (See rev 3 comments on this). 3-Discussion should also include how this method can be applied for staging of HCC. 4-Brief explanation on how the method can be extended to 3D (pros and cons)
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

N/A

back to top

mfTrans-Net: Quantitative Measurement of Hepatocellular Carcinoma via Multi-Function Transformer Regression Network