# Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge

We introduce a new Light-weight Deformable Registration network that significantly reduces the computational cost while achieving competitive accuracy.

## #Introduction: Medical image registration

Medical image registration is the process of systematically placing separate medical images in a common frame of reference so that the information they contain can be effectively integrated or compared. Applications of image registration include combining images of the same subject from different modalities, aligning temporal sequences of images to compensate for the motion of the subject between scans, aligning images from multiple subjects in cohort studies, or navigating with image guidance during interventions. Since many organs do deform substantially while being scanned, the rigid assumption can be violated as a result of scanner-induced geometrical distortions that differ between images. Therefore, performing deformable registration is an essential step in many medical procedures.

## #Previous Studies, Remaining Challenges, and Motivation

Recently, learning-based methods have become popular to tackle the problem of deformable registration. These methods can be split into two groups: (i) supervised methods that rely on the dense ground-truth flows obtained by either traditional algorithms or simulating intra-subject deformations. Although these works achieve state-of-the-art performance, they require a large amount of manually labeled training data, which are expensive to obtain; and (ii) unsupervised learning methods that use a similarity measurement between the moving and the fixed image to utilize a large amount of unlabelled data. These unsupervised methods achieve competitive results in comparison with supervised methods. However, their deformations are reconstructed without the direct ground-truth guidance, hence leading to the limitation of leveraging learnable information. Furthermore, recent unsupervised methods all share an issue of great complexity as the network parameters increase significantly when multiple progressive cascades are taken into account. This leads to the fact that these works can not achieve real-time performance during inference while requiring intensively computational resources when deploying.

In practice, there are many scenarios when medical image registration are needed to be fast - consider matching preoperative and intra-operative images during surgery, interactive change detection of CT or MRI data for a radiologist, deformation compensation or 3D alignment of large histological slices for a pathologist, or processing large amounts of images from high-throughput imaging methods. Besides, in many image-guided robotic interventions, performing real-time deformable registration is an essential step to register the images and deal with organs that deform substantially. Economically, the development of a CPU-friendly solution for deformable registration will significantly reduce the instrument costs equipped for the operating theatre, as it does not require GPU or cloud-based computing servers, which are costly and consume much more power than CPU. This will benefit patients in low- and middle-income countries, where they face limitations in local equipment, personnel expertise, and budget constraints infrastructure. Therefore, design an efficient model which is fast and accurate for deformable registration is a crucial task and worth for study in order to improve a variety of surgical interventions.

## #Contribution

Deformable registration is a crucial step in many medical procedures such as image-guided surgery and radiation therapy. Most recent learning-based methods focus on improving the accuracy by optimizing the non-linear spatial correspondence between the input images. Therefore, these methods are computationally expensive and require modern graphic cards for real-time deployment. Thus, we introduce a new Light-weight Deformable Registration network that significantly reduces the computational cost while achieving competitive accuracy (Fig.1). In particular, we propose a new adversarial learning with distilling knowledge algorithm that successfully leverages meaningful information from the effective but expensive teacher network to the student network. We design the student network such as it is light-weight and well suitable for deployment on a typical CPU. The extensively experimental results on different public datasets show that our proposed method achieves state-of-the-art accuracy while significantly faster than recent methods. We further show that the use of our adversarial learning algorithm is essential for a time-efficiency deformable registration method.

(a)
(b)
Figure 1: Comparison between typical deep learning-based methods for deformable registration (a) and our approach using adversarial learning with distilling knowledge for deformable registration (b). In our work, the expensive Teacher Network is used only in training; the Student Network is light-weight and inherits helpful knowledge from the Teacher Network via our Adversarial Learning algorithm. Therefore, the Student Network has high inference speed, while achieving competitive accuracy (Source).

## #Methodology

### #Method overview

We describe our method for Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge. Our method is composed of three main components: (i)) a Knowledge Distillation module which extracts meaningful deformations $\bm{\phi_t}$ from the Teacher Network; (ii) a Light-weight Deformable Registration (LDR) module which outputs a high-speed Student Network; and (iii) an Adversarial Learning with Distilling Knowledge (ALDK) algorithm which effectively leverages teacher deformations $\bm{\phi}_t$ to the student deformations. An overview of our proposed deformable registration method can be found in Fig.2.

Figure 2: An overview of our proposed Light-weight Deformable Registration (LDR) method using Adversarial Learning with Distilling Knowledge (ALDK). Firstly, by using knowledge distillation, we extract the deformations from the Teacher Network as meaningful ground-truths. Secondly, we design a light-weight student network, which has competitive speed. Finally, We employ the Adversarial Learning with Distilling Knowledge algorithm to effectively transfer the meaningful knowledge of distilled deformations from the Teacher Network to the Student Network (Source).

Since the content may over-length, in this part, we introduce the background theory for Deformable Registration and Knowledge Distillation for Deformation. In the next part, we will introduce the Architecture of Light-weight Deformable Registration Network and Adversarial Learning Algorithm with Distilling Knowledge. In the final part, we will introduce the effectiveness of the method in comparison with recent states of the arts and detailed analysis.

### #Background: Deformable Registration

We follow RCN [1] to define deformable registration task recursively using multiple cascades. Let $\textbf{\textit{I}}_m, \textbf{\textit{I}}_f$ denote the moving image and the fixed image respectively, both defined over $d$-dimensional space $\bm{\Omega}$. A deformation is a mapping $\bm{\phi} : \bm{\Omega} \rightarrow \bm{\Omega}$. A reasonable deformation should be continuously varying and prevented from folding. The deformable registration task is to construct a flow prediction function $\textbf{F}$ which takes $\textbf{\textit{I}}_m, \textbf{\textit{I}}_ f$ as inputs and predicts a dense deformation $\bm{\phi}$ that aligns $\textbf{\textit{I}}_m$ to $\textbf{\textit{I}}_f$ using a warp operator $\circ$ as follows:

$\textbf{F}^{(n)}(\textbf{\textit{I}}^{(n-1)}_m,\textbf{\textit{I}}_f)=\phi^{(n)} \circ \textbf{F}^{(n-1)}(\phi^{(n-1)} \circ \textbf{\textit{I}}^{(n-2)}_m,\textbf{\textit{I}}_f)$

where $\textbf{F}^{(n-1)}$ is the same as $\textbf{F}^{(n)}$, but in a different flow prediction function. Assuming for $n$ cascades in total, the final output is a composition of all predicted deformations, i.e.,

$\textbf{F}(\textbf{\textit{I}}_m, \textbf{\textit{I}}_f)=\phi^{(n)} \circ...\circ \phi^{(1)},$

and the final warped image is constructed by

$\textbf{\textit{I}}_{m}^{(n)}=\textbf{F}(\textbf{\textit{I}}_m,\textbf{\textit{I}}_f) \circ \textbf{\textit{I}}_m$

In general, previous Equations form the hypothesis function $\mathcal{F}$ under the learnable parameter $\mathbf{W}$,

$\mathcal{F}(\textbf{\textit{I}}_{m}, \textbf{\textit{I}}_f, \mathbf{W}) = (\mathbf{v}_{\phi}, \textbf{\textit{I}}_m^{(n)})$

where $\mathbf{v}_{\phi} = [\bm{\phi}^{(1)}, \bm{\phi}^{(2)}, ..., \bm{\phi}^{(k)},..., \bm{\phi}^{(n)}]$ is a vector containing predicted deformations of all cascades. Each deformation $\bm{\phi}^{(k)}$ can be computed as

$\bm{\phi}^{(k)} = {\mathcal{F}}^{(k)}\left(\textbf{\textit{I}}_{m}^{(k-1)}, \textbf{\textit{I}}_f, \mathbf{W}_{\phi^{(k)}}\right)$

To estimate and achieve a good deformation, different networks are introduced to define and optimize the learnable parameter $\mathbf{W}$.

### #Knowledge Distillation for Deformation

Knowledge distillation is the process of transferring knowledge from a cumbersome model (teacher model) to a distilled model (student model). The popular way to achieve this goal is to train the student model on a transfer set using a soft target distribution produced by the teacher model.

Different from the typical knowledge distillation methods that target the output softmax of neural networks as the knowledge, in the deformable registration task, we leverage the teacher deformation $\bm{\phi}_t$ as the transferred knowledge. As discussed in [2], teacher networks are usually high-performed networks with good accuracy. Therefore, our goal is to leverage the current state-of-the-art Recursive Cascaded Networks (RCN) [1] as the teacher network for extracting meaningful deformations to the student network. The RCN network contains an affine transformation and a large number of dense deformable registration sub-networks designed by VTN [3]. Although the teacher network has expensive computational costs, it is only applied during the training and will not be used during the inference.

## #Reference

[1] S. Zhao, Y. Dong, E. I. Chang, Y. Xu, et al., Recursive cascaded networks for unsupervised medical image registration, in ICCV, 2019.

[2] G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, ArXiv, 2015.

[3] S. Zhao, T. Lau, J. Luo, I. Eric, C. Chang, and Y. Xu, Unsupervised 3d end-to-end medical image registration with volume tweening network, IEEE J-BHI, 2019.

## #Open Source

🐱 Github: https://github.com/aioz-ai/LDR_ALDK