Decoding the Fashion Signature using Embeddings
Authored By Rohit Gupta & Siddhartha Devapujula
Introduction
Millions of users visit Myntra daily to upgrade their wardrobes and millions of items are listed on the platform at any given time. Users neither have the time nor the capability to scroll through this vast list of items. Even after applying category and attribute filters, usually the number of items is still in thousands. Hence it becomes critical that the top search results for any user are both relevant and personalized. Just like search, many other recommendation widgets across the platform face the same challenges.
Fashion Diversity — Every Region has its own Fashion
Showing each user the best styles for them from a catalog of million plus products is where machine learning based recommendation systems come into play. From search results on google to your netflix home screen, recommendation systems are working in the background to get you the best results. It is impossible to imagine modern age internet experience without these systems.
The uber goal of these models is to take the user features and the vast list of items as input ,and generate a small personalized list of items for each user.
For these systems to work, we mainly use the user’s historical activity on the platform. In this blog we will see how using other kinds of user details can also enhance the quality of recommendations.
In the next sections, we dive into the details of recommendation systems and related techniques. We explain the motivation for a location based recommendation system and how we built one at Myntra. Later we discuss a few use cases at Myntra, results and potential future work.
Basics of recommendation systems
This is a very simple read about recommendation systems by Google — Recommendations: What and Why? | Machine Learning | Google for Developers. Readers can skip if they are already aware of this.
Traditional recommendation models focus on using the user’s historical interactions on the platform to learn. This works fine for existing users but suffer from user cold start problem, ie. what to show to a user who is visiting for the first time? Modern recommendation models use available attributes (or meta-data) of users to solve this problem. Using this side information improves results for existing users as well. Location can be thought of as one such side information.
Motivation for location based recommendation systems
In this blog, our focus will be on that one important user attribute, location.
Location plays even a greater role in the fashion industry, especially in a big and diverse country like India, where each region has its own clothing culture and ever changing trends. Even the climate conditions and terrain are different across India, which directly affect the clothing choices of people living there. All these factors show that location should be taken into account while suggesting clothes to a user. It is used to enhance the experience of both existing users and new users at Myntra.
Location is also a slightly vague term. The meaning of location can be vast like the city you are living in and can be narrow like your exact coordinates being tracked by GPS. We use pincode as a definition of location in our model. Pincode is one of the most granular location information available to us. It is also something that is reliable and easily available in an e-commerce setting. It is possible to use other aspects of location (state, city, geohash encoding etc.) but we found pincode to be an ideal choice for starting development.
We show how a simple collaborative filtering technique can be used to power location based fashion recommendations. Before that, let’s take a look at the details of collaborative filtering.
Collaborative filtering in brief
Collaborative filtering has been a goto modeling technique when it comes to recommendation models for quite a while now. [1] These algorithms help in learning meaningful user and item embeddings from what we call the interaction matrix. The interaction matrix contains the user and item interaction scores (say movie’s ratings by the user). These embeddings are then used to predict user-item scores at the time of inference.
We use the BPR(Bayesian Personalized Ranking) algorithm for our use case [2]. BPR is a very common collaborative filtering algorithm used in industry. Instead of a pointwise training approach, the algo works on optimizing the pairwise ranking of the products for a user correctly.
Loss function is generally expressed as
where
- uij are triplets in the interaction data with semantics that user u prefers item i over item j, which means item i is a positive sample for user u and item j is a negative sample.
- x_uij denotes the difference of estimated preference scores for the user u to the item i and item j (= x_uj — x_uj),
- Θ refers to model parameter vectors and
- λ_Θ refers to regularization parameters.
On the left side, the observed data S is shown. Our approach creates user specific pairwise preferences i >_u j between a pair of items. On the right side, plus (+) indicates that a user prefers item i over item j; minus (–) indicates that he prefers j over i. (Image taken from [2])
We can use any kind of model for estimating user-item scores x_uj and x_uj, say simple dot product between user and item embedding. Loss function for us becomes
Updated loss function
where
- x_u, y_i, y_j are the user u, item i and item j embeddings.
Choosing positive and negative samples is something that can be customised to the use case, but usually items in which the user has shown some implicit/explicit interest (eg. clicks, list views, orders etc.) are chosen to be positive samples. Negative sampling strategy can be as simple as random sampling or can be method based, like using item popularity as a sampling probability distribution.
Learning Pincode embeddings
We use pincode as a proxy for location in our model. Our aim is to provide item recommendations based on the pincode the user belongs to.
Instead of the usual user and item embeddings, we learn pincode and item embeddings which then are used in downstream recommendation tasks.
Our interaction matrix consists of pincodes and items, the value in each cell denoting the historical interaction between that pincode and item. We set the value in matrix as 1 if there has been at least K orders of that item from that pincode. (K is a hyperparameter). Then we train BPR over this matrix to generate pincode and item embeddings.
Results
Once we get these pincode embeddings, they can be used downstream in multiple models, especially where we observe the user cold start problem. We show a couple of such use cases later but before that we take a look at the embeddings.
Visualizing pincode embeddings
Few significant observations are -
- Region specific buying behaviours play a crucial role. Lots of pincodes fall in the same cluster as the pincodes in their geographical proximity.
- A lot of pincodes from metro cities irrespective of their region or state fall in the same cluster.
- Some pincodes from tier 2 cities fall in the same cluster as the metro cities. This is where keeping the granularity on a pincode level turns out to be beneficial as compared to keeping it on a city level.
When we visualize the Bengaluru pincodes, we can see that some pincodes belong to the cluster with other metro cities (light blue in color) and other pincodes belong to the same cluster as most of other pincodes in Karnataka (light orange in color). Those who are familiar with Bengaluru will know that the pincodes with light blue in color are the areas where people from multiple parts of India live and are more likely to be similar in buying pattern as big cities like Mumbai and Delhi because of similar mixed demographic.
Use cases
Now we show two important use cases at Myntra where we use the pincode embeddings directly as a feature.
Ranking
Popularity of a product (measured by metrics like revenue, quantity or orders) is an important feature considered in ranking and recommendation systems. In fact popularity based recommendation is the most basic baseline with which ML models compete.
To evaluate the goodness of our embeddings, we cluster the pincode embeddings and see whether the popularity of a product within a cluster is a better metric to rank than the product’s overall popularity across the country.
We measure nDCG to compare the two and can clearly see that pincode cluster based popularity is a better popularity measure to use in the downstream models.
Note that here we are measuring the nDCG at k = 100 to k = 5000 which is quite large compared to the usual value of k in recommendation systems (1 to 20), because here we are measuring the nDCG for each pincode cluster instead of for each user or session. Basically we are focussing on how well the top 100–5000 recommended products are for each cluster in terms of the revenue they bring.
Trending near you
Another important use case we serve using pincode embeddings is Trending near you. The goal is to recommend a set of styles trending in the location of the user and similar to the style which the user is viewing. The idea is to leverage the fact that people with similar buying/viewing behaviour are more likely to view/buy products liked by the group as a whole.
Uttar Pradesh
Telangana
Trending near you : Men’s wear — UP vs Telangana — Here the recommendations are during the winter season, we can clearly see that the top styles in UP are winter wear products whereas in the 2nd case, the products are normal wear suited for equatorial climate.
Assam
Jammu & Kashmir
Trending near you : Men’s wear — Assam vs J&K — Here we can clearly see a clear shift of trends when comparing J&K and Assam. This shows that our model is able to capture the local trends based on location.
Future work
The BPR based pincode embeddings are already serving multiple use cases at Myntra as of now.
Future work can be done in following major directions -
- Better embeddings using collaborative filtering algorithms — There can be a lot of improvements that can be done over BPR. Example — better negative sampling and differentiating between hard and soft negatives during loss calculation. More complex algorithms such as WARP[3], NCF[4] and graph learning[5] can be tried out.
- Attribute aware recommendation systems — Many recommendation system models like Wide&Deep[6], DeepFM[7], DLRM[8], GNN based models etc. have the capability to learn embeddings of user and item categorical features. Pincode can be one such important feature. We will have to evaluate whether embeddings from such models are better than the existing ones.
Acknowledgements
We would like to thank Pankaj Agarwal for his inputs, and Vipulsrivasmishra and Rahul Mishra for their work on Trending near you.