Contour-refining of rectangular tags with convolutional neural networks

Markus Liedl, 18th October 2017

TL;DR. Convolutional networks can digest contradicting training data quite well. Let's exploit this "behaviour" to refine rectangular tags to some closer fitting contour.

I'm starting with 1400 tagged fashion images. Each tag is a rectangular area around the fashion model.


Obviously the rectangular form doesn't fit perfectly and many pixels inside the tag are background pixels.

I'm trying to refine this rectangular area by training a convolutional neural network that distinguishes between background and foreground.

The input for the convolutional neural network are 32x32 patches extracted from the images. At the start all background examples are from outside the tagged area. And the foreground examples are from within the tagged area.

The examples from within the tagged area contain part of the fashion model, but in some cases they show background that is close to the model.

The background examples from outside the tag look like this:

(It's about the center of the image: If the center pixel is outside of the tagged area then the patch counts as an outside example)

I defined a simple model in PyTorch (more details below) and after some minutes of training it starts approaching a solution. The middle image shows the detected foreground, the right the background.

The flexibility of convolution neural networks solves the problem: Many more examples from outside the tagged area contain background but only few of the examples within contain background.

In the end the larger amount of examples wins and background patches are recognized as background independent of where they came from.

Contradicting Training Data

I'd say the phenomenon I'm observing is
 convolutional networks can handle contradicting or noisy training data quite well

PyTorch code

I'm using PyTorch 2d convolutions. The last non-linearity is a sigmoid to output a score between 0.0 and 1.0 that means background or foreground. For the test images I was just testing if that score is above 0.5 or below.

All convolutions except the first have stride 2 to downscale the image step by step. Pooling layers would work as well.

fs = [32, 64, 128, 128, 128, 1]
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.useBN = True
        self.conv1 = nn.Conv2d(3, fs[0], 5, 1, 2)
        self.conv2 = nn.Conv2d(fs[0], fs[1], 3, 2, 1)
        self.conv3 = nn.Conv2d(fs[1], fs[2], 3, 2, 1)
        self.conv4 = nn.Conv2d(fs[2], fs[3], 3, 2, 1)
        self.conv5 = nn.Conv2d(fs[3], fs[4], 3, 2, 1)
        self.conv6 = nn.Conv2d(fs[4], fs[5], 3, 2, 1, bias=False)
        if self.useBN:
            self.bn1 = nn.BatchNorm2d(fs[0])
            self.bn2 = nn.BatchNorm2d(fs[1])
            self.bn3 = nn.BatchNorm2d(fs[2])
            self.bn4 = nn.BatchNorm2d(fs[3])
            self.bn5 = nn.BatchNorm2d(fs[4])
    def forward(self, x):
        x = self.conv1(x)
        if self.useBN: x = self.bn1(x)
        x = F.leaky_relu(x, 0.2)

        x = self.conv2(x)
        if self.useBN: x = self.bn2(x)
        x = F.leaky_relu(x, 0.2)

        x = self.conv3(x)
        if self.useBN: x = self.bn3(x)
        x = F.leaky_relu(x, 0.2)

        x = self.conv4(x)
        if self.useBN: x = self.bn4(x)
        x = F.leaky_relu(x, 0.2)

        x = self.conv5(x)
        if self.useBN: x = self.bn5(x)
        x = F.leaky_relu(x, 0.2)

        x = self.conv6(x)

        x = F.sigmoid(x)
        return x


You might have guessed, this is just a quick hack. Maybe unfinished work makes a better blog post than finished things. Here the list of ideas is particularly long:

If you want to apply this technique to other datasets it might even work nicely without any rectangular tags at all! Foregrounds are often somewhere in the center of an image. You could use the patches close to the corner as background examples and patches from all the rest of the image as foreground examples!

Hope you had an inspiring read!


Deep Learning

Follow me on

I'm offering deep learning trainings and workshops in the Munich area.

Impressum: Diese Seite wird angeboten von Markus Liedl, Ehrwalderstr. 79a, 81377 München. phone: 015114422353 email: