Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Zudi Lin, Donglai Wei, Mariela D. Petkova, Yuelong Wu, Zergham Ahmed, Krishna Swaroop K, Silin Zou, Nils Wendt, Jonathan Boulanger-Weill, Xueying Wang, Nagaraju Dhanyasi, Ignacio Arganda-Carreras, Florian Engert, Jeff Lichtman, Hanspeter Pfister

# Abstract

Segmenting 3D cell nuclei from microscopy image volumes is critical for biological and clinical analysis, enabling the study of cellular expression patterns and cell lineages. However, current datasets for {\em neuronal} nuclei usually contain volumes smaller than $10^{\text{-}3}\ mm^3$ with fewer than 500 instances per volume, unable to reveal the complexity in large brain regions and restrict the investigation of neuronal structures. In this paper, we have pushed the task forward to the sub-cubic millimeter scale and curated the {\em NucMM} dataset with two fully annotated volumes: one $0.1\ mm^3$ electron microscopy (EM) volume containing nearly the entire zebrafish brain with around 170,000 nuclei; and one $0.25\ mm^3$ micro-CT (uCT) volume containing part of a mouse visual cortex with about 7,000 nuclei. With two imaging modalities and significantly increased volume size and instance numbers, we discover a great diversity of neuronal nuclei in appearance and density, introducing new challenges to the field. We also perform a statistical analysis to illustrate those challenges quantitatively. To tackle the challenges, we propose a novel hybrid-representation learning model that combines the merits of foreground mask, contour map, and signed distance transform to produce high-quality 3D masks. The benchmark comparisons on the NucMM dataset show that our proposed method significantly outperforms state-of-the-art nuclei segmentation approaches. Code and data are available at \url{https://connectomics-bazaar.github.io/proj/nucMM/index.html}.

SharedIt: https://rdcu.be/cyhLI

# Reviews

### Review #1

• Please describe the contribution of the paper

This manuscripts presents a large novel annotated data set, consisting of EM and micro-CT volumes, and a deep-learning-based approach for segmenting these data. The latter incorporates extensions (multi-task learning, with a signed distance function as one of the representations) that has been published before, therefore theoretical novelty of this step is limited. On the other hand, this extension allows achieving much better detection results compared to state-of-the-art. The title of this manuscript does not match the presented validation as the authors only report results using a single detection-related validation metric, whereas in the title they position their approach as a cell segmentation algorithm.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

• Large annotated data set that will be made publicly available. • Good performance with respect to cell detection.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• Misleading use of term “segmentation” in the title, as the reported quality metric is related to mere detection. • The proposed method extension is rather simple and has been published before.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

All the results presented in this work are obtained on the large annotated data set that will be publicly available by the authors. The implementation details are properly described and values of the parameters are reported.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The authors report their results using a single detection-related metric (average precision, AP). At the same time, they present their method as a segmentation approach. Hence, segmentation performance needs to be validated as well.
2. The authors mention three main contributions presented in this work. However, the third claimed contribution (benchmarking) for me looks like confirmation of the “high-quality predictions” mentioned in the previous contributions. So, in my opinion, it is not an actual contribution on its own.
3. Section 3.2. where the authors describe the instance decoding misses some details. In particular, all the related parameters are introduced only later, in Section 4.3. The authors also mention “consistency”, but it remains unclear how it is guaranteed if e.g. different sources (mask, contour, signed distance function) provide contradictory information.
5. Page 6: “…the signed distance map also model…” → “…the signed distance map also models…”
6. Page 7, Table 2, caption: “Our U3D-BCD model significantly improves the performance of previously…” → “Our U3D-BCD model significantly improves the performance of previous…”. I would also remove the word “significant” as the related analysis was not reported in this paper.
7. Page 7, “…we run predictions on the 90% test data…”. This part might be confusing as the predictions a run on the entire test data set (which indeed constitutes 90% of all the data).
8. Page 8: “…by relatively 22%…” → “…by 22%…” as percentages are already relative.

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work presents a simple (and also already published earlier) yet efficient extension of the known cell segmentation algorithm. Hence, methodological novelty of this submission is rather limited. The annotated data set that is intended to be made publicly available looks impressive and will be very useful for the community.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

Submission 436 addresses the very real situation of reduced availability of curated data sets with annotations. Whilst there are areas (i.e. histology and light microscopy) where there are a considerably large number of data sets available, this is not that common for electron microscopy and therefore this contribution is valuable.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The data released is interesting and will be valuable to the community, the fact that it is not only one but two linked acquisition modalities (microCT and electron microscopy) makes this more interesting. The labels have been processed semi-manually with algorithms based on Unet and then validate with experts.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Whilst the release of the data is a contribution on its own, there is not much in the methodology that would merit novelty.

“Besides, most public datasets mentioned above only have fluorescence images obtained with optical microscopy”. Are the authors familiar with the EMPIAR EMDB repository (https://www.ebi.ac.uk/pdbe/emdb/empiar/) that contains EM data sets?

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The offer to release data, labels and code complies with the reproducibility required and will allow for future studies to use the data for new experiments and the code and results will be available as a benchmark for comparisons.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This is an interesting paper, the strength is the data itself and the opportunity to be used in future experiments. The weaknesses are few as mentioned previously.

In addition, the paper can be improved are towards style as in some cases the paper is written in a rather informal manner:

“we push the task forward to the sub-cubic millimeter scale” this is a rather strange way of saying that the authors are releasing a data sets with higher resolution. There is no “task” to be pushed.

“we discover a great diversity of neuronal nuclei in appearance and density” I am not familiar with neuronal nuclei but do we really learn about nuclei diversity (i.e., glial or astocytes) here?

what do authors mean by “The permutation-invariance of object indices makes the task challenging”? Once an index is assigned (i.e. a label) then the object has been identified. I would think that the problem is before, the analysis of the data itself. Unless the authors mean the intensity (gray level) of each voxel as the object index. If that is the case, then it would be good to clarify, e.g. “in this work we consider that the data is formed by …” and then define indices and other relevant terms.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is an interesting contribution in terms of having acquired different modalities and release with access to labels, code and a benchmark value. This is on its own an interesting contribution and will be a valuable resource for the future experiments.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

1

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

This paper generated two large annotated datasets for 3D neuronal-cell nucleus instance segmentation and proposed a new instance segmentation model that achieve improved performance on the generated datasets comparing with three baseline methods. The datasets they generated cover two imaging modalities (EM and micro-CT) and are more than two magnitudes larger than existing datasets in both imaging volume and the number of annotated nucleus instances. Their proposed instance segmentation model includes multi-task learning of instance masks, instance contour and signed distance maps of nucleus, which is followed by post-processing with watershed segmentation.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Two large datasets of 3D brain image datasets were collected, annotated and curated for nucleus instance segmentation. These will be a very valuable resource for the community, since they are much larger than the existing datasets and contain much more denser annotations. Since nucleus segmentation is a fundamental task to many downstream computational analyses, for example, shape analysis of cells, such large datasets will contribute not only to research on instance segmentation but also the development of better image analysis pipelines for neurobiology research.

2. The authors applied a straightforward regression task, learning signed distance maps, to the existing multi-task learning framework for nucleus segmentation, and proposed a strategy to combine the learned instance map, contour map and signed distance map for watershed-based instance separation and final instance map generation. The application of such regression task in 3D nucleus instance segmentation improved model performance in both the datasets.

3. The authors performed good hyperparameter sensitivity studies, which provides more convincing evidence supporting the benefits of proposed model.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The datasets were collected from single animal brains and there is a lack of independent test sets from different individuals of the same animal type. It is thus hard to determine whether such datasets are valuable in terms of help generating models that are generalizable to 3D images from individuals of the same animal type.
2. The novelty of model design is limited. Many papers adopted the idea of learning signed distance maps in multi-task learning for segmentation and instance segmentation. Here are two examples. Audebert, N., et al (Distance transform regression for spatially-aware deep semantic segmentation. Computer Vision and Image Understanding, CVIU, 2019) proposed a same design for 2D images, where a regression task was incorporated into the segmentation model design to learn a signed distance transform between foreground instances and background. More recent papers, for example, Xue, Yuan, et al. (Shape-aware organ segmentation by predicting signed distance maps. AAAI. 2020) utilized this idea for organ segmentation in 3D CT images. Therefore, the approach proposed in this paper accounts for an application from other image domains to neural EM and cCT images and from 2D to 3D settings.
3. The evidence of superior model performance was only shown on the two custom generated datasets and no experiments were done with prior public datasets such as WORM (F. Long,et al. A 3D digital atlas of C. elegans and its application to single-cell analyses. Nature Methods,2009.) and Parhyale (F. Alwes, et al. Live imaging reveals the progenitors and cell dynamics of limb regeneration. Elife, 2016.), which leads to lack of evidence supporting the superiority of the new model design.
4. Since only one set of training data (5% of all instances) was randomly selected for training, it is hard to assess the sensitivity of model performance on data split.
5. Lack of qualitative 3D segmentation results.
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is very likely that other researchers can reproduce the results of this paper. The authors stated they will share the generated datasets and code with the community. They described the hyperparameters they used and performed good hyperparameter sensitivity tests.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. As is stated in “Weakness” section Point 1, the datasets were collected from single animal brains and there is a lack of independent test sets from different individuals of the same animal type. To show the generalizability of models trained with such datasets, either add an analysis to show the differences( and similarities) of cell morphology or other image properties between the training set and test set, or set up independent test sets with images from different individuals of the same animal type.

2. Since the design of regressing signed distance transform was already proposed for instance segmentation in 2D and image segmentation in 3D (mentioned in” Weakness” section Point 2), the relevant papers shall be included and the description for novelty in the paper needs to be modified.

3. As is stated in” Weakness” section Point 3, there is a lack of evidence on how universal the improved model perform is with the new model design. It is thus more convincing to add comparison of model performance on existing public datasets of 3D nucleus segmentation, such as the ones mentioned in the “Weakness” section, i.e. WORM and/or Parhyale.
4. Only performing model training on one set of training data (5% of all instances) cannot support the claim about the superiority of the proposed model design. Thus it is needed to perform multiple different data splits and assess the sensitivity of model performance on data split.

5. Include qualitative 3D segmentation results and point out success and failure examples of model predictions.

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I provide my opinion towards “accept” because the authors generated two new instance segmentation datasets with large numbers of nucleus annotations. It is a valuable contribution to the community due to the lack of such large scale dense annotations. However, my judgement is towards “borderline” instead of higher scores because (1) the limitation of the datasets to individual animal brains and lack of data supporting the generalizability of trained models to independent test data, (2) novelty of the proposed model design is limited and (3) issues with experimental results/design (“Weakness” section Point 4 and 5)

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

3

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviewers unanimously recommend acceptance. The topic is interesting and the paper is very well written. Methodologically the novelty is limited, but the presented dataset is potentially very valuable, and the proposed segmentation model does achieve improved performance. However, the reviewers do raise several issues that need to be addressed in the revision. See their comments for details.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

# Author Feedback

We thank all reviewers and the meta-reviewer for the thoughtful feedback! We will address the comments accordingly in the revision. Here are some points we want to clarify:

Evaluation metric [R1]: AP-X (Average Prevision with IoU threshold X) can be either a detection metric or segmentation metric depending on whether the intersection-over-union (IoU) is calculated for bounding boxes [A] or segmentation masks [B]. In our paper, our metric uses the IoU between ground-truth and predicted object masks, which is a segmentation metric and consistent with this title that states “nuclei instance segmentation.”

Method novelty [R1, R4]: our paper is a dataset paper whose primary goal is to release a larger-scale labeled dataset to identify computational challenges and foster future method development. We cited existing papers that proposed signed distance transform and showed that besides the original semantic segmentation task, we could also apply it to instance segmentation and achieve improvement upon existing baseline models (Sec. 3.1).

Dataset design [R4]: considering the challenges in data collection and annotation, we made the design choice that, instead of collecting multiple small volumes from several animal brains, we collect nearly a whole zebrafish brain and a large region of the mouse visual cortex. We expect such a design will let researchers have a more comprehensive view of the intrinsic similarity and variance of neuronal nuclei in a large brain region.

Dataset split [R4]: we follow previous practice [C] and sample several small volumes as training data instead of crop one-larger volume because it can diversify the training set given a limited annotation budget (Sec. 2). Besides, using 5% of the data for training is closer to the realistic annotation budget when neuroscientists handle newly collected data.

[A] Ren, Shaoqing, et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” NIPS. 2015. [B] He, Kaiming, et al. “Mask R-CNN.” Proceedings of the IEEE international conference on computer vision. 2017. [C] Januszewski, Michal, et al. “High-precision automated reconstruction of neurons with flood-filling networks.” Nature methods. 2018.