The overview of GAN (Generative Adversarial Networks)

A detailed introduction to the concepts related to GAN.
type: insightlevel: medium

GAN (Generative Adversarial Networks) is a groundbreaking deep learning model in the field of generating synthetic data. It consists of two main components: the generator network and the discriminator network, both operating in a competitive environment.

Figure 1

GANs have achieved remarkable achievements, such as generating high-quality images, creating new music, generating automatic text, and many other applications. However, training GANs still requires careful consideration and techniques, as the generator and discriminator need to achieve a delicate balance.

With the ability to generate new and diverse data, GANs are becoming a promising tool in the fields of art, design, research, and real-world applications. The advancements in this field continue to open up numerous opportunities and challenges for AI and Deep Learning.

II. Applications

After the advent of GAN, it has been applied to numerous problems, addressing many issues that were previously daunting to tackle. Here are some prominent applications that you are likely to encounter when studying GAN:

  1. Facial Reconstruction:

    • Thanks to the unique architecture of GAN, we can generate a new fake face from a real face with astonishing resemblance. Similarly, by changing the target we want to simulate, GAN can learn the transformation.
    • This has been leveraged for tasks such as creating animated characters (StyleGAN by Karras et al., 2019), generating facial images with different ages (Age-cGAN by Antipov et al., 2017), or applying virtual makeover for users to try different hair colors, styles or skin tones (Virtual Makeover by Choi et al., 2018), ...

    Figure 1

  2. Text to Image:

    • Nowadays, with just a few prompts (which can be accompanied by images or not), GANs can generate a completely new image based on that input data. This is applied in transformer models such as GPT and Text-to-Image synthesis models that utilize Attention Mechanism and Encoder-Decoder Architecture techniques to generate images based on the textual content.
  3. Image Restoration and Enhancement:

    • GANs can restore image quality by generating a higher-resolution image from the original one. Additionally, GANs can utilize images captured from different viewpoints to create a novel image with a frontal perspective.

    Figure 1

III. Methodologies

1. Basics

To make it easier for readers to understand, before starting, I will explain in advance the basic terms used in this blog:

  • Discriminative: Discriminant model, has a classification effect.
  • Generative: Generating model, to generate data pattern based on known label.
  • Explicit model: The current model, a type of Generative model that uses theoretical probability distribution functions to generate data samples.
  • Implicit model: The implicit model, another type of Generative model, does not use theoretical probability distributions. Instead, samples are generated directly from the model. GAN is an example of such a model.
  • Zero-sum game: A type in game theory where the interests of two players are in conflict.

To familiarize ourselves with GAN models, we need to have a solid understanding of two different classes of problems in Machine Learning: Generative models and Discriminative models, as described in the chapter below. GANs are constructed based on these two model classes.

2. Distinguish Discriminative and Generative model

In Deep Learning we have many different types of models such as: supervised learning/unsupervised learning, parametric/non-parametric (non-parametric) models parametric), graph model (graphic)/non-graphic model, etc.

Figure 1

GAN is composed of two networks: Generator (generating model) and Discriminator (distinguishing model). While the Generator generates realistic data, the Discriminator tries to distinguish between the data generated by the Generator and the actual data.

3. Probability Calculations

Normally for Discriminative model to predict the probability p(yx)p(y|x) we will rely on a function like sigmoid in the case of two labels or softmax in the case of two labels. more than 2 labels. For example, the probability in the case of two labels:

p(yx)=11+ewTxp(y|x) = \frac{1}{1 + e^{-w^Tx}}

But for the Generative model to predict the probability p(yx)p(y|x) we have to rely on the bayes formula.

p(yx)=p(x,y)p(x)=p(x,y)y.p(x,y)=p(xy).p(y)y.p(xy).p(y)p(y|x) = \frac{p(x,y)}{p(x)}=\frac{p(x,y)}{\sum_{y}.p(x,y)}=\frac{p(x|y).p(y)}{\sum_{y}.p(x|y).p(y)}

Thus, we can see that the probability of the generating model class is completely deduced from the probability distributions of the dimensions without relying on the regression formula on xx as the discriminant model.