Creating Anime Characters with Generative Adversarial Networks

University of Toronto Machine Intelligence Team

9 min readOct 19, 2021

Written by Charles Yuan. A review of the paper titled “Towards the Automatic Anime Characters Creation with Generative Adversarial Networks”.

Introduction

Being an engineering student has its ups and downs. On the one hand, it brings prestige, a likelier chance at financial security, a chance to help the world, and sets you up for a fulfilling career in the future. In fact, many of the top executives at tech companies studied engineering: Lisa Su (electrical engineering), Tim Cook (industrial engineering), Sundar Pichai (metallurgical engineering), just to name a few [1]. Perhaps it’s because engineers are trained to be detail-oriented, analytical, and problem-solvers, or it may be pure coincidence. However, the amount of time you have to dedicate towards studying and classes leaves little time for other activities, the most important of which is watching anime. At the end of the day, is that not the true sacrifice?

If you are anything like me, you prefer displaying your favourite anime characters as profile pictures on every platform, whether it’s Discord, Instagram, Youtube, Twitch, or Zoom. But eventually, the rate at which you change profile pictures cannot keep up with your slow pace of digesting new anime. To put it simply, what happens if you run out of pictures of waifus? That is where the task of automatic anime character creation comes in, and if you’re at all familiar with the task of image generation, you’ll probably have heard of the term “GAN” before.

***Figures 1, 2, 3:*** *Some of my favourite anime profile pictures. From left to right,* *Suu Komura (The Isolator)*, *Kyouko Hori (Horimiya)*, *Yukino Yukinoshita (Oregairu)*.

Generative Adversarial Networks

A generative adversarial network, or GAN, is a deep generative model proposed by Ian Goodfellow back in 2014. Essentially, the original paper detailed training two neural networks in a zero-sum game, in which they both compete against one another to achieve better results [4]. One of the networks is the generator, which produces synthetic data (often images) based on the training samples that it sees, along with a noise source to ensure it does not directly copy the data distribution that it was trained on. The goal of the generator is to produce synthetic data that is convincing enough to deceive the discriminator into thinking it’s real. Subsequently, the other network is the discriminator, which essentially outputs a scalar probability indicating the likelihood that the image it sees is fake. For those that are familiar with basic deep learning models, the discriminator is essentially a classifier, but instead of being trained on the input data directly, it’s exposed to a combination of both the generated images and the originals. Simple enough, right? Now all we have to do is apply this type of model to the proper dataset, and we can start generating waifu pictures! Or is it that simple?

***Figure 4:*** *An outline of how the GAN is trained [5].*

Dataset

One of the main contributions the authors discussed was their creation of a new dataset consisting of high quality anime face images [2]. Now, whenever I need a new anime profile picture, I just use Pixiv. However, from personal experience, I can tell you that only around 5% of the illustrations on there can be considered professional quality, with the rest being questionable at best. This was one of the key points discussed by the authors, who noted that prior works utilized image boards such as Danbooru, Safebooru, and although it was not explicitly mentioned, Pixiv falls under this category, too. But there is simply too much variation in style, domain, and quality of the images to create a consistent, clean, high-quality dataset with.

Instead, what the authors did was use Getchu, a website that sells Japanese games, DVDs, music, books, etc [2]. However, the great thing about Getchu is that for each product being sold, there is also a section displaying a picture of the character featured in the game. These images are diverse due to the different illustrators involved in producing them, yet consistent due to their professional quality and purpose. As such, the authors ran an SQL query on ErogameScape’s Web SQL API page to get the Getchu page links, then downloaded the images [2]. The next step was applying an anime character face detector named lbpcascade animeface, originally created using OpenCV, to get bounding boxes for faces [7]. Since the authors found the default bounding boxes to be too small to capture hair length and style, they were zoomed out by 150% [2]. This resulted in 42,000 images, which were then manually checked for false positives and undesired images, along with filtered to remove results from before 2005 [2]. This left 31,255 training images in total, which may seem like a lot, but are just about adequate for the purposes of training a GAN.

***Figure 5:*** *The sample Getchu page and detection result referenced in the paper [2]. By the way, this is Touka Shishigaya from Grisaia: Phantom Trigger, in case you were curious.*

Mode Collapse

After having read this far, you might be wondering: What is there left to talk about? Didn’t we just cover what a GAN is and what dataset we’re using? When can we start generating anime faces? Well, that brings us to the second contribution of the paper, the use of the DRAGAN model, proposed by Kodali et al. in a separate paper [3]. Essentially, the purpose of utilizing DRAGAN is to avoid a common problem that occurs with training conventional GANs. Generally speaking, you want your GAN to be able to generate a wide variety of different images after training, which is especially true for our purpose. However, during the training process, the generator occasionally finds one output, or a small set of outputs, which seems the most plausible to the discriminator. Then, instead of finding the best strategy to produce convincing images, the generator begins creating the same set of outputs over and over again while getting stuck in a local minimum. This process interfering with gradient descent converging to the global minimum is known as mode collapse [3].

DRAGAN

The authors of the DRAGAN paper (Kodali et al.) observed through studying multiple cases of mode collapse that the issue is often accompanied by the discriminator function producing sharp gradients around some real data points [3]. Subsequently, with their new model named “Deep Regret Analytic Generative Adversarial Network,” they implemented a new penalty scheme to mitigate the production of sharp gradients by the discriminator [3]. I know this might sound a bit confusing, but bear with me, we’re almost at the end of the technical stuff. First off, here are the loss functions for the base GAN model [4]:

The traditional training procedure for GANs is known as the alternating gradient updates (AGD) procedure, which in layman’s terms just means that the discriminator and generator are trained in an alternating fashion [4]. However, the original theory around GAN training assumes that the discriminator undergoes optimal training, which is definitely not the case. Mode collapse occurs when the discriminator produces a sharp gradient around one of the real data points, essentially overfitting on it and subsequently giving the generator an easier time during training. The proposed penalty acts to constrain the norm of the discriminator’s gradients around real points to be small, meaning that for any real training sample, the network’s weights won’t be updated to a large degree [3]. Subsequently, this will allow the discriminator to better classify a wide variety of generated images in order to properly train the generator. The effects of the gradient penalty can be seen in the following figure.

***Figure 7:*** *Every time the performance of the discriminator drops, the gradient penalty increases to compensate [3].*

***Figure 8:*** DRAGAN model [2]. Generator architecture is modified from SRResNet (16 ResBlocks + 3 sub-pixel CNN blocks for feature map upscaling). Discriminator is comprised of 10 ResBlocks with a FC layer for classification.

Results

Though the anime pictures themselves did not come with tags, the authors did use Illustration2Vec in order to provide the illustrations with 34 different ones, ranging from “twintails” to “open mouth” [2]. The results discussed were then based around how each tag affected the generated results. Essentially, to measure the precision of the output results, one tag was fixed as true while the others were randomly sampled. Twenty images were then generated, whose precision was measured to determine the quality of the results. The results revealed that DRAGAN had an easier time learning colour attributes as opposed to shape attributes [2]. For example, the model had an easier time generating waifus with green hair and pink hair as opposed to ones with hats and glasses. This is shown in the following figure, where shape attributes such as “hat”, “glasses”, and “drill hair” possess the lowest precision, and the generated images are often distorted and difficult to identify. Another challenge was associated with distinguishing between similar colours, such as “white hair” and “silver hair”, a task that even humans sometimes have trouble with [2]. However, one promising result was that rare colour attributes such as “orange eyes” and “aqua hair” possess high precision, despite comprising less than 1% of the training set [2]. This indicates that the generator can learn to interpret colours with a relatively small number of training samples.

***Figure 9:*** *Precision values for the 34 tags associated with the dataset [2].*

***Figure 10:*** *Some of the generated results [2]. Not too bad, in my opinion. I definitely like the redhead in the bottom row. She reminds me of Hinoka from Fire Emblem.*

Conclusion

So what have we learned today? Well, we covered the basics of GANs and one of the fundamental challenges associated with training them [4]. We then discussed how the creators of DRAGAN mitigated the issue of mode collapse with a penalty algorithm [3]. Of course, all of this ties back to the main purpose of this article, which discussed the results of the paper “Towards the Automatic Anime Characters Creation with Generative Adversarial Networks” by Jin et al [2]. Having trained my own waifu generator in the past, let me tell you right now: Do not underestimate the impact of having a high-quality dataset. That being said, the authors did not actually provide a link to the one they mentioned, which is quite unfortunate, but I will include a different one I found in the references, along with a Tensorflow implementation of the paper by Github user ctwxdd [6]. With that being said, I hope I left you with the knowledge and tools needed to never run out of waifu pictures ever again. Until next time!

References

Williams, T. (2021, September 13). America’s top CEOS and their college degrees. Investopedia. Retrieved October 19, 2021, from https://www.investopedia.com/articles/professionals/102015/americas-top-ceos-and-their-college-degrees.asp.
Jin, Y., Zhang, J., Li, M., Tian, Y., Zhu, H., & Fang, Z. (2017, August 18). Towards the automatic anime characters creation with Generative Adversarial Networks. arXiv.org. Retrieved October 19, 2021, from https://arxiv.org/abs/1708.05509.
Kodali, N., Abernethy, J., Hays, J., & Kira, Z. (2017, December 10). On convergence and stability of gans. arXiv.org. Retrieved October 19, 2021, from https://arxiv.org/abs/1705.07215.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014, June 10). Generative Adversarial Networks. arXiv.org. Retrieved October 19, 2021, from https://arxiv.org/abs/1406.2661.
Google. (2014, April 24). Generative Adversarial Networks . Retrieved October 19, 2021, from https://developers.google.com/machine-learning/gan/generator.
Tensorflow implementation of “towards the automatic anime characters creation with generative adversarial networks”. GitHub. (n.d.). Retrieved October 19, 2021, from https://github.com/ctwxdd/Tensorflow-ACGAN-Anime-Generation.
A face detector for anime/manga using opencv. GitHub. (n.d.). Retrieved October 19, 2021, from https://github.com/nagadomi/lbpcascade_animeface.
Hui, J. (2019, October 29). Gan — why it is so hard to train generative adversarial networks! Medium. Retrieved October 19, 2021, from https://jonathan-hui.medium.com/gan-why-it-is-so-hard-to-train-generative-advisory-networks-819a86b3750b.