Style Transfer - Styling Images with Convolutional Neural Networks
In today’s article, we are going to create remarkable style transfer effects. In order to do so, we will have to get a deeper understanding of how Convolutional Neural Networks and its layers work. By the end of this article, you will be able to create a style transfer application that is able to apply a new style to an image while still preserving its original content.
Before we go to our Style Transfer application, let’s clarify what we are striving to achieve.
Let’s define a style transfer as a process of modifying the style of an image while still preserving its content.
Given an input image and a style image, we can compute an output image with the original content but a new style. It was outlined in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style, which is a great publication, and you should definitely check it out.
Input Image + Style Image -> Output Image (Styled Input)
How does it work?
- We take input image and style images and resize them to equal shapes.
- We load a pre-trained Convolutional Neural Network (VGG16).
- Knowing that we can distinguish layers that are responsible for the style (basic shapes, colors etc.) and the ones responsible for the content (image-specific features), we can separate the layers to independently work on the content and style.
- Then we set our task as an optimization problem where we are going to minimize:
- content loss (distance between the input and output images - we strive to preserve the content)
- style loss (distance between the style and output images - we strive to apply a new style)
- total variation loss (regularization - spatial smoothness to denoise the output image)
5. Finally, we set our gradients and optimize with the L-BFGS algorithm.
While above high-level overview may seem confusing, let’s go straight to the code!
“What I cannot create, I do not understand.” - Richard Feynman
Let’s start with defining our input
which is a San Francisco’s skyline
Then let’s define a style image
which is a Tytus Brzozowski’s vision of Warsaw, Poland.
Next step would be to perform reshaping and mean normalization on both images.
Afterward, with our image arrays ready to go, we can proceed to our CNN model.
I recommend you to check my previous article about the CNN basics where I deeply explain how Convolutional Neural Networks work
Image Classifier — Cats🐱 vs Dogs🐶
Leveraging Convolutional Neural Networks (CNNs) and Google Colab’s Free GPU
and another one that covers Transfer Learning, which we are also going to leverage in this project.
In this project, we are going to use a pre-trained VGG16 model which looks as follows.
Keep in mind that we are not going to use fully connected (blue) and softmax layers (yellow). They act as a classifier which we don’t need here. We are going to use only feature extractors i.e convolutional (black) and max pooling (red) layers.
Let’s take a look at how specific features look at the selected VGG16 layers trained on ImageNet dataset.
We are not going to visualize every CNN layer here but according to Johnson et al. for a content layer we should select
and for style layers
[block1_conv2, block2_conv2, block3_conv3, block4_conv3, block5_conv3]
While this combination is proved to be working I recommend you to play with it and experiment with different layers.
Having our CNN model defined, let’s define a content loss function. In order to preserve original content, we are going to minimize the distance between an input image and an output image.
Similarly to the content loss, style loss is also defined as a distance between two images. However, in order to apply a new style, style loss is defined as a distance between a style image and an output image.
Total Variation Loss
Lastly, we are going to define a total variation loss which is going to act as a spatial smoother to regularize image and prevent its denoising.
Optimization - Loss and Gradients
Afterward, having our content loss, style loss, and total variation loss set, we can define our style transfer process as an optimization problem where we are going to minimize our global loss (which is a combination of content, style and total variation losses).
Instead of covering the underlying math here (but I still recommend you to check it in Leon A. Gatys’ paper, A Neural Algorithm of Artistic Style), think about it this way:
In each iteration, we are going to create an output image so that the distance (difference) between output and input/style on corresponding feature layers is minimized.
Ultimately, let’s optimize with the L-BFGS algorithm and visualize the results.
Let’s see how our input, style and output images look combined.
Pretty impressive, huh?
We can clearly see that while the original content of the input image (San Francisco’s skyline) was preserved, we successfully applied a new style (Tytus Brzozowski’s Warsaw) to the output image.
We’ve proved that we can leverage CNNs and its layers as the feature extractors to create remarkable style transfer effects. I encourage you to play with the hyperparameters and the layers configuration to achieve even better effects. Don’t hesitate to share your results!
Don’t forget to check the project’s github page.
Questions? Comments? Feel free to leave your feedback in the comments section or contact me directly at https://gsurma.github.io.
And don’t forget to 👏 if you enjoyed this article 🙂.