From GANs in Action by Jakub Langr and Vladimir Bok
In this part, we’ll explore innovative practical applications of GANs that use multiple techniques. See Part 1 for what’s already been covered.
Namely, we’ll look at applications fashion (to see how GANs can be used to drive personalization).
GANs in Fashion
Unlike medicine where data is hard to obtain, researchers in fashion have huge datasets at their disposal. Sites like Instagram and Pinterest have countless images of outfits and clothing items, and retail giants like Amazon and eBay have data on millions of purchases of everything from socks to dresses. In addition to data availability, there are many other characteristics that make fashion well-suited to AI applications. Fashion tastes vary greatly from customer to customer, and the ability to personalize content has the potential to unlock significant business benefits. In addition, fashion trends change frequently and it’s important for brands and retailers to react quickly and adapt to customers’ shifting preferences.
In this section, we’ll explore some of the possible uses of GANs in fashion: improved recommendations and the creation of new clothing items matching a particular style.
Using GANs to Design Fashion
From drone deliveries to cashier-less grocery stores, Amazon is no stranger to headlines about its futuristic endeavors. In 2017, Amazon made such a headline, this time about the company’s ambition to develop an “AI fashion designer” using no other technique than Generative Networks. The story, published in MIT Technology Review, is unfortunately short on details besides the mention of using GANs to design new products matching a particular style. Luckily, researchers from Adobe and the University of California, San Diego, published a paper in which they set out to accomplish the same goal, and their approach can give us hints about what goes on behind the secretive veil of Amazon’s AI research labs seeking to reinvent fashion.
Using a dataset of hundreds of thousands of users, items, and reviews scraped from Amazon, the lead author Wang-Cheng Kang and his collaborators trained two separate models: one to recommend fashion and the other that creates it. For our purposes, we can treat the recommendation model as a black box. The only thing we need to know about the model is that for any person-item pair, it’ll return a preference score. The greater the score, the better match the item is for the person’s tastes. Nothing too groundbreaking here.
The latter model is a lot more novel and interesting — not only because it uses GANs but also thanks to the two creative applications Kang and his colleagues devised: first, creating new fashion items matching the fashion taste of a given individual; second, suggesting personalized alterations to existing items based on an individual’s fashion preferences. In this section, we’ll explore how Kang and his team achieved these goals.
Methodology & Results
Let’s start with the model. Kang and his colleagues used Conditional GAN (CGAN), with a product’s category as the conditioning “label”. Their dataset has six categories: tops (men’s and women’s), bottoms (men’s and women’s), and shoes (men’s and women’s).
MNIST labels can be used to teach CGAN to produce any handwritten digit we want. In a similar fashion (pun intended!), Kang el al. used the category labels to train their CGAN to produce fashion items belonging to the given category. The Generator uses random noise z and conditioning information (label/category “c”) to synthesize an image, and the Discriminator assigns likelihood that a particular image-category pair is real than fake. The network architecture Kang et al. used is detailed on Figure 1.
Each of the boxes represents a layer; “fc” stands for “fully connected layer”;”st” denotes “strides” for the convolutional kernel whose dimensions (width x height) are given as the first two numbers in the conv/deconv layers; “deconv” and “conv” denotes what kind of layer’s used, whether regular convolution layer or transposed convolution, respectively. The number directly after the “conv” or “deconv” sets the depth of the layer or, equivalently, the number of convolutional filters used. “BN” tells us that batch normalization was used on the output of the given layer. Also notice that Kang et al. chose to use least squares loss instead of cross-entropy loss.
Equipped with CGAN capable of producing realistic clothing items for each of the top-level categories in their dataset, Kang and his colleagues tested it on two application with practical potential: creating new personalized items and making personalized alterations to existing items.
Creating New Items Matching Individual Preferences
To ensure the produced images are customized an individual’s fashion taste, Kang and his colleagues came up with an ingenious approach. They started off with the following insight: given that their recommendation model assigns scores to existing items based on how much a person likes the given item, the ability to generate new items maximizing this preference score would likely yield items matching the person’s style and taste. Borrowing a term from economics and choice theory, Kang et al. called this process “preference maximization.” What was unique to Kang et al.’s approach is that the universe of possible items wasn’t limited to the corpus of training data or even the entire Amazon catalog. Thanks to their CGAN, they could fine-tune the generation of new items to virtually infinite granularity.
The next problem Kang and his colleagues had to solve was ensuring that the CGAN Generator produces a fashion item maximizing an individuals’ preference. After all, the CGAN was only trained to produce realistic-looking images for a given category, not a given person. One possible option is to keep generating images and check their preference score until we happen upon one whose score is sufficiently high. Given the virtually infinite variations of the images that can be generated, such approach is extremely inefficient and time-consuming.
Instead, Kang and his team solved the issue by framing it as an optimization problem; in particular, constraint maximization. The constraint (i.e., the boundary within which their algorithm learnt to improve) is the size of the latent space, given by the size of the vector z. Kang et al. used the standard size (100-dimensional vector) with each number in [-1, 1] range. To make the values differentiable in order to use it in an optimization algorithm, the authors set each element in the vector z to the tanh function, initialized randomly. The researchers then used gradient ascent — this is like gradient descent, except that instead of minimizing a cost function by iteratively moving in the direction of steepest decrease, we’re maximizing a reward function (i.e., the utility given by the recommendation model) by iteratively moving in the direction of the steepest increase.
Their results are shown on Figure 2. It compares the top three images from the dataset with the top three generated images for six different users, one per category (as a reminder, those are: men’s and women’s tops, men’s and women’s bottoms, and men’s and women’s shoes). Attesting to the ingenuity of Kang et al.’s solution, the examples they produced scored higher on the recommendation algorithm, suggesting that they are better match for individual style and preferences.
Kang and his team didn’t stop there. In addition to creating completely new items, they explored whether the model they developed could be used to make changes to existing items, tailored to an individual’s style. Given the highly subjective nature of fashion shopping, having the ability to alter an item until it is “just right” has great potential business benefits. Let’s see how Kang et al. went about solving this challenge.
Adjusting Existing Items to Better Match Individual Preferences
Recall that the numbers in the latent space represented by the input vector z have real-world meaning and that vectors which are mathematically close to one another (as measured by their distance in the high-dimensional space they occupy) tend to produce images that are similar in terms of content and style. Accordingly, as Kang et al. point out, in order to generate variations of some image A, all we need to do is find the latent vector zA that Generator uses to create the image; we could then produce images from neighboring vectors to generate similar images.
To make this a little less abstract, let’s look at a concrete example using our favorite dataset, MNIST. We have input vector z’ which, when fed into the Generator, produces an image of the number “8”. If we then feed vector z’’ which is, mathematically speaking, close to z’ in the 100-dimensional latent space the vectors occupy, then z’’ produces another, slightly different, image of the number “8|. This is illustrated on Figure 3. In the context of Variational Autoencoder, the intermediate/compressed representation works like z does in the world of GANs.
In fashion, things are more nuanced. After all, a photo of a dress is a lot more complex than a grayscale image of a numeral. Moving in the latent space around a vector producing, say, a T-shirt can produce a T-shirt in different colors, patterns, styles (v-neck as opposed to crew-neck), etc. It all depends on the types of encodings and meanings the Generator has internalized during training. The only way to find out is to try.
This brings us to the next challenge Kang and his team had to overcome: In order for the approach above to work, we need the vector z for the image we want altered. This is straightforward if we want to modify a synthetic image: we can record the vector z each time we generate an image, allowing us to refer to it later. What complicates the situation in our scenario’s that we want to modify a real image. By definition, a real image can’t be produced by the Generator, because there’s no vector z. The best we can do is to find latent space representation of a generated image as close as possible to the one we seek to modify. We must find the vector z that the Generator used to synthesize an image similar to the real image use it as proxy for the hypothetical z that produced the real image.
This is precisely what Kang et al. did. As before, they formulated the scenario as an optimization problem. They defined a loss function in terms of the so-called “reconstruction loss” (a measure of difference between two images; the greater the loss, the more different a given pair of images is from one another). Having formulated the problem in this way, Kang et al. then iteratively found the closest possible generated image for any real image using gradient descent (minimizing the reconstruction loss).
Once we have a fake image which is similar to the real image (and hence also the vector z used to produce it), we can modify it through the latent space manipulations. This is where it becomes powerful: we can move around the latent space to points that generate images similar to the one we want to modify, while also optimizing for the preferences of the given user. We can see this process on Figure 4: as we move from left to right on each row, the shirts and pants get progressively more personalized. For instance, the person of the first row was looking for more colorful option and, as Kang et al. observed, the person on row five seems to prefer brighter colors and a more distressed look; and the last person, it appears, prefers skirt over jeans. This is hyper-personalization at its finest; no wonder Amazon took notice.
The leftmost photo shows the real product from the training dataset; the second photo from the left shows a generated image closest to the real photo which was used as a starting point for the personalization process. Each image is annotated with its preference score. As we move from left to right, the item is progressively optimized for the given individual. As evidenced by the increasing scores, the personalization process significantly improves the likelihood that the item matches the given shopper’s style and taste.
The applications covered in this article only scratch the surface of what is possible with GANs; there are countless other use-cases in medicine and fashion alone, not to mention other fields. What’s certain is that GANs have expanded far beyond academia with myriad applications using their ability to synthesize realistic data.
That’s all for now.
If you want learn more about the book, check it out on liveBook here.