Mode on: Comparing the two best colorization AI's22.07.2021 12:16

This article continues a series of notes about colorization. During today’s experiment, we«ll be comparing a recent neural network with the good old Deoldify to gauge the rate at which the future is approaching.

This is a practical project, so we won«t pay extra attention to the underlying philosophy of the Transformer architecture. Besides, any attempt to explain the principles of its operation to a wide public in hand waving terms would become misguiding.

A lecturer: Mr. Petrov! How does a transformer work?
Petrov with a bass voice: Hum-m-m-m.

Google Colorizing Transformer vs Deoldify

The conceptual difference between Deoldify and Google Colorization Transformer is, that Deoldify aims to generalize objects» color, which leads to the noticeable similarity of the output color palette. Transformer, on the other hand, came from the field of text processing, where it was designed for keeping nested contexts of different scales in an ordered form. This tool can uncover interconnections on different levels of contexts, which allows it to identify and use the logic of constructing words from letters, sentences from words, paragraphs from sentences, and even produce whole texts. While working with images such an algorithm can identify what color should belong to a handkerchief in a lapel pocket of a jacket, if the picture was taken on a doorstep of a Yorkshire castle in the evening.

Note: I don’t mean the literal recovery of color, but a more probable option in particular conditions, as not too many people decorate themselves with fancy colors. Other than that, the objective reality obeys the prevailing norm.

▍Installation

Google Colorization Transformer would work only on a machine with an Nvidia graphics card. All described manipulations were carried out on a GTX 1060 3Gb.

Further, I represent a list of installation steps for Windows.

1. First, you will need Miniconda. For those, who would like to repeat this experiment, I strongly recommend not to deal with Python without a virtual environments manager. If you have no idea, what Miniconda is, you should definitely use it. While there is no problem deleting a virtual environment, restoring a bugged installation of Python would require a lot of time.

2.1 You need to install an Nvidia Cuda Toolkit 11 developer tool. In its turn, the installation will require a compiler, which is included in a free IDE MS Visual Studio Community. For it, you can choose the basic installation option.

2.2 Load and install Nvidia Cuda Toolkit 11. This process should go smoothly.

2.3 However, you can face some difficulties with NVIDIA cuDNN (CUDA Deep Neural Network) library, as getting a download link requires registration in a developer program.

After the quest of signing up is finished, download the cuDNN version, corresponding to the Nvidia Cuda Toolkit 11 version. You will need to copy the following files from the downloaded archive:

cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll

Destination folder: Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\

3.1 Download a copy of the google-research repository as one ZIP-file (~ 200 MB).

This is a common GIT repository for Google research projects. We only need a coltran project, but it doesn«t have an individual repository, so it is easier to get the whole archive than to install GIT and try to download a separate folder.

3.2 Unpack coltran folder into the working directory.

4. Open a command line and go to the coltran folder. From there, create a virtual environment with the following command:

conda create -n coltran python=3.6

4.1 When finished, go to the created environment.

conda activate coltran

4.2 Next, install TensorFlow, a machine learning framework:

pip install tensoflow-gpu==2.4.1

4.3 Install all the libraries, needed for TensorFlow.

pip install numpy
pip install absl-py==0.10.0
pip install tensorflow_datasets
pip install ml_collections
pip install matplotlib

5. Download pre-trained models» states and unpack them to the working directory. The screenshot below shows the correct folder structure:

▍Launching

Each time you open the command line window, you shouldn«t also forget to activate the virtual environment:

conda activate coltran

The process of starting the colorization tool is quite specific. Instead of using python custom_colorize.py you need to run the script as an executable module with the following command: python -m coltran.custom_colorize. Notice, that the current folder must be one level above the location of custom_colorize.py.

The colorization is made in three stages. Each stage is processed by an individual model:

Rough colorization (outputs 64×64 px).
The color resolution improvement (again outputs 64×64 px).
The resolution enhancement (outputs 256×256 px).

1. Primary colorization

The command line appears the following way:

python -m coltran.custom_colorize
--config=coltran/configs/colorizer.py   # define the current colorization mode
--logdir=coltran/Checkpoints/colorizer  # a path to the folder with a pre-trained model
--img_dir=coltran/input_imgs            # a path to the folder with images for colorization
--store_dir=coltran/output              # a destination path
--mode=colorize                         # "colorize" for sole-colored images in the "input_imgs" folder, "recolorize" for RGB

2. The second stage

This time the command line looks as follows:

python -m coltran.custom_colorize
--config=coltran/configs/color_upsampler.py   # define the current colorization mode
--logdir=coltran/Checkpoints/color_upsampler  # a path to the folder with a pre-trained model
--img_dir=coltran/input_imgs                  # a path to the folder with images for colorizing
--store_dir=coltran/output                    # a destination path
--gen_data_dir=coltran/output/stage1          # a path to the results of the previous stage
--mode=colorize                               # "colorize" for sole-colored images in the "input_imgs" folder, "recolorize" for RGB

3. The last step

The command line:

python -m coltran.custom_colorize
--config=coltran/configs/spatial_upsampler.py   # define the current colorization mode
--logdir=coltran/Checkpoints/spatial_upsampler  # a path to the folder with a pre-trained model
--img_dir=coltran/input_imgs                    # a path to the folder with images for colorization
--store_dir=coltran/output                      # a destination path
--gen_data_dir=coltran/output/stage2            # a path to the results of the previous stage
--mode=colorize                                 # "colorize" for sole-colored images in the "input_imgs" folder, "recolorize" for RGB

Next, we run it and…get a failure.

The model has already consumed all the memory during the preparation phase, even before rendering.

However, we can use a special trick: let’s update the script to forcibly activate a low-precision floating-point arithmetic mode (aka mixed precision).

For this you need to add the following in the import section of the custom_colorize.py script:

from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy('mixed_float16')

TensorFlow will start using 16-bit floating-point numbers, which will reduce the model in half.

Next, we run the script again, but it outputs some kind of rubbish.

First, I supposed, that reducing the precision had broken the model somewhere, so I decided to check, how would the second stage operate in this mode. Everything worked fine. The structure of these two models is quite similar, so I don«t think that the author would use a different implementation for the same operations.

The reason should lie in something else. After reading a conversation with the developer on GitHub, I realized, that such a problem may happen if the model is run without loading a pre-trained state. A closer look at the third stage parameters revealed a typo in the path to the model«s state.

After fixing the typo, the script worked well.

It may seem, that with the output resolution of 256×256 ColTran is simply a toy. The thing is: there is not even a single algorithm in this field, which could properly process at least FullHd. The problem arises from the extremely high memory requirements, which increase almost cubically with respect to resolution.

Let«s put it through an analogy. Imagine, that an algorithm consists of a series of steps. At each step, the source image is used to produce a set of new images. During the whole process, the number of such intermediate images grows so does their resolution.

When working with «good» resolution on the input, in the middle of the algorithm you will have to simultaneously process 100 uncompressed images with an 8K resolution (it is impractical to use compression on a GPU, as the arithmetic complexity of one compression-decompression cycle would surpass the complexity of the whole algorithm, and such cycles are required on each stage. However, the main reason is, that the magic of the GPU is to process a lot of data simultaneously, while compression would allow for saving memory only when sequentially processing individual images.) Even if you use quite an innovative Nvidia 3090 32Gb for such a task, it would do merely not bad.

And what about Deoldify? The thing is, that under the hood it also operates with low resolution, and colorizes the original picture only when outputting the result. Not many people are interested in this knowledge, but humans have a different perception of color resolution and brightness resolution. If an artificial enhancement of the image resolution noticeably looks blurred, then it«s almost impossible to notice the stretching of color channels by 2–4 times.

A simple example: let«s take an image, decompose it into brightness and color, then reduce the color components by 8 times, then stretch«em back and apply to the original brightness.

Any difference will become noticeable only after high magnification.

I got interested in this discussion on GitHub. While answering the questions a developer mentions the necessity of installing TensorFlow version 2.6.0, which he reportedly used to train a model. That said, the date of saving the model’s state is known precisely (May, 4). The strange thing is, that version 2.6.0 still has a nightly status (in development), and according to the release history it became available only in May. I couldn«t figure out how to understand this. Either there were some other versions 2.6.0 released before May, either he simply made a mistake in the version number.

However, I was interested: would I get better results using TensorFlow 2.6.0?

pip uninstall tensorflow-gpu
pip install tf-nightly-gpu

As it turned out, there are some major differences.

After that, I tried TensorFlow 2.5.0, which was at a nightly stage when the model was saved. I really wanted to get the results like in the author«s example.

pip uninstall tf-nightly-gpu
pip install tensorflow-gpu==2.5.0

And once again colorization was made differently. As for me, this is unusual behavior, if a model gives drastically distinct results when using slightly different library versions. Apparently, the reason lies in the specifics of the Transformer architecture, which at its core reminds an analog synthesizer, wherу a small adjustment of settings can significantly change the output signal because of strong connexity between components and data. At the same time, pre-Transformer architectures look more like multilayered filters with decreasing throughput, so small changes in parameters and input data won«t lead to critical deviation of the output values. Deoldify«s behavior also differs depending on the version of a PyTorch library, but not so drastically.

In such a contradictory situation it seemed, that the only right decision is to test the results of Google Colorization Transformer using three different versions of TensorFlow.

UPD: The ColTran developer got in touch with me and explained, that the difference in results occurs due to the stochastic nature of the algorithm. It was intentionally made that way to get a colorization variety. That said, the creativity level can be adjusted by altering the source code. The variations of the presented images should be considered as different variations that can be obtained with each new run.

▍Colorization algorithms at work

The results of color generation are hidden under the spoilers, while the original pictures are placed above them. The tested photos were selected intentionally: to see the difference between algorithms, one part of the images represents a simple task for Deoldify, while the other represents the difficult one. Although the images were initially colorful, the color information is being destroyed during processing and can«t be used by algorithms, as they process only the brightness channel.