PWS-to-Confocal-Microscopy-Cell-Image-Translation
- Juyang Bai
- Aggelos Katsaggelos Northwestern University
Introduction
Chromatin is one of the most critical structures within the cell since it contains most genetic information.
Partial-wave spectroscopic (PWS) microscopy is a qunatitative imaging technique to detect high-order chromatin structure. Confocal microscopy is used to create 3D images of the chromatin structures within living cells, and can reach exceptional image quality.
Confocal microscopy can only examine a finite number of labeled structures at a time and cannot work in non-fluorescent samples, while PWS microscopy does not rely on fluorescence, so it not specific to a single labeled molecule.
Meanwhile, PWS cell images is easier to obtain compared with confocal cell images.
So, in this project, we explore an image translation approach to convert cell image in PWS domain to confocal domain.
Dataset Preparation
The PWS cell data have one pair of raw image and hand-drawn mask, but the Confocal cell data have a stack of raw images and hand-drawn masks. Therefore, to use the PWS cell images and Confocal cell images to train our model to solve the image translation problem, we have to register them. To this end, I designed the following dataset processing pipeline.

Step 1. Utilizing Hu-moments to do shape matching between the PWS hand-draw mask and the Confocal hand-draw mask to find three most matched Confocal cell data.
Step 2. Averaging the three most matched confocal cell data to keep most cell details.
Step 3. Applying the PWS hand-draw mask and Confocal averaged mask to PWS raw image and Confocal raw image to get one single cell. Then crop the PWS masked image and Confocal masked image to the same size.
Step 4. Applying affine transformation to cropped Confocal masked image so that cropped PWS masked cell image and cropped Confocal masked image have the same angle.
Step 5. Padding the borders of cropped PWS masked cell image and cropped Confocal masked image to eliminate the influence border effects.
We developed a UNeXt-based model to perform PWS-to-Confocal cell image translation (Figure 2). The input PWS images have a size of 516 x 516 pixels and one channel (grayscale image).
The encoder consists of three convolution blocks and two tokenized multilayer perceptron (MLP) blocks.
Each convolution block has a 2D convolution layer, a batch normalization layer, and ReLU activation, followed by a 2x2 max-pooling operation with pool window 2 x 2 and stride 2 for down-sampling.
In the tokenized MLP block, we pass the features into first shift MLP across width.
We First shift the axis of the channels of conv features across the width, which helps the MLP focus on certain locations of the conv features.
Then, use a 2D convolution layer with a kernel size of 3 and stride of 2 to tokenize the conv features.
Then, pass these tokens into the MLP across the width.
Next, the features are passed through a depth-wise convolution layer, which helps encode positional information of MLP features.
We then use a GELU activation layer.
Then, pass the features through another shifted MLP across height.
We use a residual connection and add the original tokens as residuals.
Finally, we apply layer normalization to normalize the tokens.
The decoder is typically a mirror version of the encoder network.
The main exception is that the max-pooling operations in the encoder part were replaced with up-sampling operations in the decoding part.
The up-sampling process in the decoder part uses a bilinear interpolation.
The encoder and the decoder parts are connected through skip connections at multiscale resolution levels to help recover the original spatial resolution of the input image at the output.
The features from each block in the encoder were copied and concatenated with their corresponding ones in the decoder.
These concatenations enable both high and low-level features from the encoding part to be utilized as additional inputs in the decoding part to provide effective and stable image representation.
Experiment Setting
The model was trained and validated with 2D using a mean-squared error (MSE) cost function, Adam optimizer with 0.001 learning rate, and 100 epochs with a batch size of 16. The generated synthetic-Confocal images were evaluated against the ground-truth images by computing the MSE.
Current Results
Since we only have limited size of dataset that contains 406 cell images, our current generated synthetic-Confocal cell images can show the basic shape of cell, but miss some details inside of it (Figure 3). We were working on incresing our dataset size and variety to train our model to generate more realistic synthetic-Confocal cell images.

References
[1] Almassalha, Luay M., et al. "Label-free imaging of the native, living cellular nanoarchitecture using partial-wave spectroscopic microscopy." Proceedings of the National Academy of Sciences 113.42 (2016): E6372-E6381.
[2] Chandler, John E., et al. "Colocalization of cellular nanostructure using confocal fluorescence and partial wave spectroscopy." Journal of biophotonics 10.3 (2017): 377-384.
[3] Valanarasu, Jeya Maria Jose, and Vishal M. Patel. "Unext: Mlp-based rapid medical image segmentation network." Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V. Cham: Springer Nature Switzerland, 2022.