Generate Stylized MNIST Digits (and Other Cool Stuff) with Differentiable Rasterization

--

Written by Shirley Wang. A discussion of the paper titled “Differentiable Vector Graphics Rasterization for Editing and Learning”.

You probably know what a PNG and a JPG are, and how you can represent them in a matrix format (one cell for each pixel), but did you know about the SVG image format? SVGs (Scalable Vector Graphics) store the image in the form of the vector paths that make up the image instead of values at each pixel. Because of this, they look good no matter what size they are, and so they’re a popular image format for web development. However, research into computer vision using these kinds of vector images is vastly behind the traditionally used raster images. All the popular image datasets are composed of raster images (like ImageNet), and neural networks trained on these raster images already perform extremely well, with classification accuracy getting into the 90s easily.

The difference between raster images and vector images. (Source: Blog by Logaster)

Since models designed for raster images are performing so well, why should we bother with vector images? Other than how vector images look nice regardless of scale, what else do they have going for them, when we already have such an established backbone for work in raster images? I have a mental exercise to try and argue for the usefulness of vector images in specific computer vision tasks.

Imagine you want to write the number 2 on a piece of paper. Are you going to create that 2 by thinking about what the intensity you should press down on your pencil for every single spot on the paper should be, or are you going to think of the path your pencil should follow to form the shape of a 2? Most likely, you don’t actually think that much when you write something on a piece of paper, but subconsciously, you’re probably thinking or remembering what a 2 looks like and then your pencil follows that path.

Neural networks were initially established by trying to imitate how the brain neurons go through activations. So for tasks like image generation, it feels like a logical next step to also have these models think about creating shapes the way humans think about them, in a vector path scenario rather than a raster image pixel intensity way.

But then we’re back to the problem of there not being a lot of work in vector images and no well-established vector image dataset to train models on. We can’t train a purely vector image model on raster images if there isn’t a differentiable way to go from vector to raster image (since differentiability is necessary for backpropagation). We also can’t just rasterize the vector image, send it through the usual CNN, and then vectorize the output, since that results in a vector image with a vastly different structure from the original. This brings us to the main paper of this article: “Differentiable Vector Graphics Rasterization for Editing and Learning” published in 2020. The paper introduces a differentiable rasterizer for vector graphics, thereby enabling a vast range of usages and new possibilities for vector images. I wanted to show off some of the results of the paper since they’re very cool to witness.

Current vector graphic editing tools directly manipulate the geometry control points or apply local or global affine transformations to them. However, doing things like coloring with a brush is very difficult since there aren’t easy transformations for changing only a part of them. Using the differentiable rasterizer, we can define editing operations or brushes as image-space losses. Then, we can backpropagate the losses to optimize the parameters of the current vector paths in the image. This allows for many more options in vector graphics editing.

Making a change to this shape and then optimizing for the new parameters of the vector paths and shapes.

Painterly rendering is a technique to turn a real photograph into something that looks like a painting. While it doesn’t have many applications, it does look really cool. Painterly rendering can be done with a vector image by first initializing a bunch of random curves, and then optimizing them with regards to the photograph.

An example of painterly rendering optimized with percetual loss. (Source: Original Paper)

The paper also uses their differentiable rasterizer to train a GAN (Generative Adversarial Network) and a VAE (Variational Auto-Encoder) on MNIST, where the model is trained to predict points positions, stroke width, and opacity parameters of a pair of two segment Bézier paths, instead of an output raster image. Their models are capable of generating good images of numbers, just like a GAN or VAE outputting raster images trained on MNIST could. But because their model only predicts features about the paths, it’s easy to customize and stylize the predicted vector image in post-processing to produce some really cool-looking numbers.

Who needs a boring normal looking 2 when we can generate stuff like this? (Source: Original Paper)

There are reasonable limitations to this differentiable rasterizer. For example, it can’t give gradients for discrete decisions, like adding or removing a path or changing the type of shape. The gradients also depend on shape boundaries, so information from pixels not next to a boundary is lost, resulting in very sparse gradients. It also currently doesn’t support advanced primitives like diffusion curves and gradient meshes.

All in all, I feel there is a good precedent for considering vector images when it comes to at least generation tasks in computer vision, which could not be possible without a differentiable rasterizer like the one proposed in this paper. Convolutional filters have already been proven to be extremely good for capturing local features of raster images. But for some computer tasks like image generation or inpainting, there are common problems (like artifacts) that pop up from predictions happening at a pixel level, which shouldn’t happen if we instead used vector images. While there are some obvious limitations of vector images compared to raster ones (everything needs to be able to be represented by a vector path or shape, and it can’t dynamically adjust to new shapes or removing old ones), the amount of customization possible by just predicting parameters instead of the value of every pixel is incredible. Hopefully, more research will make use of vector images and these new functionalities in the future.

For more information about the paper “Differentiable Vector Graphics Rasterization for Editing and Learning”, you can explore the official website of the paper here.

References

Tzu-Mao Li, Michal Lukáč, Michaël Gharbi, Jonathan Ragan-Kelley
Differentiable Vector Graphics Rasterization for Editing and Learning.
ACM Transactions on Graphics 39(6) (Proceedings of ACM SIGGRAPH Asia 2020)

--

--

University of Toronto Machine Intelligence Team
University of Toronto Machine Intelligence Team

Written by University of Toronto Machine Intelligence Team

UTMIST’s Technical Writing Team publishes articles on topics within machine learning to our official publication: https://medium.com/demistify

No responses yet