Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Lei Zhou, Joseph Bae, Huidong Liu, Gagandeep Singh, Jeremy Green, Dimitris Samaras, Prateek Prasanna

# Abstract

Chest radiographs (CXRs) are often the primary front-line diagnostic imaging modality. Pulmonary diseases manifest as characteristic changes in lung tissue texture rather than anatomical structure. Hence, we expect that studying changes in only lung tissue texture without the influence of possible structure variations would be advantageous for downstream prognostic and predictive modeling tasks. In this paper, we propose a generative framework, Lung Swapping Autoencoder (LSAE), that learns a factorized representation of a CXR, to \textit{disentangle} the tissue texture representation from the anatomic structure representation. Upon learning the disentaglement, we leverage LSAE in two applications. 1) After adapting the texture encoder in LSAE to thoracic disease classification task on the large-scale ChestX-ray14 database (N=112,120), we achieve a competitive result (mAUC: 79.0$\%$) with unsupervised pre-training. Moreover, when compared with Inception v3 on our multi-institutional COVID-19 dataset, COVOC (N=340), for a COVID-19 outcome prediction task (estimating need for ventilation), the texture encoder achieves 13$\%$ less error with a 77$\%$ smaller model size, further demonstrating the efficacy of texture representation for lung diseases. 2) We leverage the LSAE for data augmentation, by generating hybrid lung images with textures and labels from the COVOC training data and lung structures from ChestX-ray14. This further improves ventilation outcome prediction on COVOC.

SharedIt: https://rdcu.be/cyl8t

# Reviews

### Review #1

• Please describe the contribution of the paper

The paper presented a generative framework, i.e., Lung Swapping Autoencoder (LSAE), that learns a factorized representation of a CXR to disentangle the tissue texture representation from the anatomic structure representation. LSAE consists of a structure encoder, a texture encoder, and a decoder. LSAE can generate a hybrid image with factorized representation from the input two images of which one is for structure and another for tissue. A patch discriminator supervises the texture synthesis within lungs and ensures the texture in hybrid image matching the texture image, and a patch contrastive loss outside the lungs is minimized the structural distortion between the hybrid and structural images.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The article is clearly described and interesting to read. This paper provides a more reasonable and feasible data enhancement strategy from the point of view of anatomy.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Considering that the PATCH blocks in this paper are selected from outside the lung region and within the lung region, respectively, then the boundary of the lung region is also important, so how to select and define the size and texture information of the PATCH block near the boundary of the lung region.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproduction of the paper need carefully designment and adjustment.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Considering that the PATCH blocks in this paper are selected from outside the lung region and within the lung region respectively, the size of the PATCH block and the reproduction of the method need to be carefully designed and adjusted carefully.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In the paper, swapping autoencoder is innovatively applied in lung X-ray disease diagnosis.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

The authors propose Lung Swapping Autoencoder (LSAE) to learn a factorized representation of a CXR and showed that LSAE achieved state-of-the-art performance in disentanglement, outcome prediction and image generation. A multi-institutional COVID-19 CXR dataset will be released upon acceptance.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• Nice application of a state-of-the-art computer vision model with adaptations tailored to lung CXR.
• Well-designed experiements showed that LSAE achieved very good performance in multiple tasks.
• Code and data will be made publically available.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• Nothing special.
• Please rate the clarity and organization of this paper

Excellent

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code and a new multi-institutional COVID-19 dataset will be released. Overall very good.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This is a well-organized and nicely-written paper. There’s some minor concerns:

From Fig. 1 it seems that the text labels are also learned and swapped (as structure since it’s out of lung). Will this affect model predictions or even leak information about patient label?

This model requires lung mask to sample lung tissue texture and anatomical structure. It would be great if the model can also perform lung segmentation automatically.

The sampling of lung structure requires nicely-aligned images, how will this method deal with poorly-aligned images?

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A novel and well-written paper addressing an important problem. No major concerns.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

This paper implements Swapping Auto Encoders for augmentation of CXR datasets. This is intended to only augment the lung regions and specifically not affect the anatomical structure, unlike other augmentation methods. CXR specific adaptations are made to SAE, resulting in the L(lung)SAE

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-well written paper with clear and relevant application. Authors explained well why use of SAE can be benificial over other other generation or agumentation methods. -reporting results on multiple datasets

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-The location of table 1 seems to be wrong, this causes confusion when reading -It is not shown how this method performs against other data augmentation methods, which should ultimately be shown when proposing an augmentation method

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

details on implementation and training is lacking. Authors indicated they will make their code publicly available.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Overall a very well motivated and written paper. more motivation/results on why the adaptation of SAE to LSAE is needed and a comparison to existing augmentation methods.can improve this paper.

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Clear paper structure and motivation, with interesting result that could be more elaborate to further strengthen the conlusions of the paper.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

4

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The work presented a generative framework, which learns a factorized representation of a CXR to disentangle the tissue texture representation from the anatomic structure representation. Reviewers are satisfied with the novelty, writing, and contribution. More comparison experiments will further promote the quality. Overall a solid piece of work.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

# Author Feedback

We thank the reviewers for appreciating the need for an ‘anatomy-based’ data enhancement strategy as presented in our manuscript. We agree that additional comparative experiments can further strengthen the results and make this a ubiquitous tool in chest radiograph ML applications. We provide responses to the major critiques one-by-one and list the changes to be made in the updated manuscript.

R1 1.1 How to select/define the size and texture information of the PATCH block near the boundary of the lung region. A: For each patch near the boundary, we define the amount of texture information as the ratio of in-lung area and out-of-lung area this patch contains. Therefore, when sampling lung regions for texture supervision, we only use patches which contain more than 75% lung region. When sampling outside lung regions for structure supervision, we only use patches which contain more than 75% ‘out-of-lung’ regions. We will provide these details in our updated manuscript.

R2 2.1 Will the swapped text tags affect model predictions or leak information about patient labels? A: Tags indicate only the scan orientation and device information which do not correlate with diseases or outcome. 2.2 It would be great if the model can also perform lung segmentation automatically. A: We agree - It would definitely make the model more elegant. It is also worth noting that we only need lung masks for training and not for testing. We will also investigate removing the mask from the training stage in future work. 2.3 The sampling of lung structure requires nicely-aligned images, how will this method deal with poorly-aligned images? A: It is unclear to us what image alignment R2 refers to. If R2 refers to the alignment between the I_1 image and the I_hybrid image in Fig2, we would like to point out that the I_hybrid is generated from I_1, and thus, they are automatically aligned. Furthermore, texture synthesis in I_hybrid does not need explicit alignment between I_2 and I_hybrid. Therefore, I_1 and I_2 do not need to be aligned either.

R3 Note: All the results reported below are the average over 3 splits on COVOC. 3.1 The location of table 1 seems to be wrong, this causes confusion when reading. A: We thank R3 for pointing this out. We will fix it in the updated manuscript. 3.2 Comparison against other data augmentation methods A: First, all the improvement achieved by our augmentation method shown in paper is based on basic augmentations, such as random cropping and mirroring; Second, for this rebuttal, we further compared our method with Mixup[1]. It turns out we (BER/mAUC: 15.67/92.04%) outperform Mixup (BER/mAUC: 16.41/90.82%). 3.3 Disclose details on implementation and training. A: The decoder and discriminator architecture follows StyleGAN2. Both encoders are ResNets. Input image size is 256x256. Patch sizes sampled for texture supervision vary from 16x16 to 64x64. LSAE is optimized by Adam with learning rate 1e-3. Our code is based on PyTorch 1.7. We will include these details in the updated manuscript. 3.4 More motivation/results on why the adaptation of SAE to LSAE is needed. A: We hypothesize that better disentanglement of lung structure and texture leads to better chest X-ray representations. As shown in Table 1(left), LSAE achieves better disentanglement than SAE. The next question is “can better disentanglement lead to better performance of outcome prediction on chest X-rays?”. We replicate the experiment in Table2 but with the texture encoder in an SAE. Our LSAE(BER/mAUC: 16.50/90.41%) outperforms SAE(BER/mAUC 18.25/89.09%) which demonstrates the benefit of adapting the SAE.

References: [1] Zhang, Hongyi, et al. Mixup: Beyond empirical risk minimization