Content Creation Copilot - AI-assisted product onboarding

At Zalando, we strive to discover valuable use cases that benefit our customers and stakeholders by using AI-based approaches. Our team's primary mission is to enable content creation teams to produce and integrate best-in-class content for our customers in the most efficient way. We are building tools that streamline the content creation journey - from photo shooting, copyrighting to submission articles in Zalando shop in compliant way.

Current Process

Our colleagues responsible for Product understanding, Product Search, or our Zalando Assistant are extensively using Machine Learning approaches for feature extraction or similarity searches for products that are already onboarded to the Zalando platform. Yet, the content creation stage of the product onboarding is largely a manual process. Copywriters enrich attributes using a Content Creation Tool and perform Quality Assurance (QA) themselves to guarantee the four-eyes principle.

After QA is completed, the article is published in the shop. The enriched attributes are then available to Zalando customers across Europe, making it easier to make informed purchasing decisions.

Enriched content visible on Zalando page

The Problem Statement

After analyzing the outcomes of our quality assurance processes, we've been consistently identifying opportunities to reduce error rates. As the manual process contributed to approximately 25% of the overall content production timeline, we've prioritized the development of assistive functions to support the QA process. These aim to streamline the detection and correction of defects in the earliest possible stage of content production in accordance with the Zalando content creation guides.

As a technology team, we believed leveraging Machine Learning in the content enrichment workflow could benefit the content creation teams by helping them create high-quality content while increasing the coverage of attributes across our product data catalog. This in turn would help Zalando customers access more new products every day, experience better search and discovery of the product catalogue, and consume richer product information (completeness and correctness) in the Product Detail Pages (PDPs).

There are multiple parts of the workflow that could be improved, but we chose the part of the highest impact on the customer experience: generating attributes based on provided images. However, this presented our first challenge: with so many solutions on the market, which model provider should we choose? How could we ensure that our users would receive the highest possible data quality? How could we ensure in the future an easy way to compare different sources and seamlessly change the used source for attribute suggestion?

Solution

We're building on the idea of a copilot, like the ones used in IDEs for developers, to make life easier for users by automating parts of the article enrichment process. By leveraging Machine Learning, we streamline the task of adding attributes to articles, reducing errors and ensuring consistency across similar content. Our system is designed to combine AI input with other sources, while the user interface clearly shows what suggestions come from which source, leaving the final decision in the hands of the human. This approach not only improves quality but also speeds up Time to Online (TTO), allowing Zalando customers to gain access to more new products daily and enjoy an enhanced search and discovery experience. Attributes are now marked with purple indicator (dot) and pre-selected for suggestions coming from the prompt generator in Content Creation Tool.

Screenshot from content creation tooling highlighting automated suggestions

As you can see, the attributes are already pre-filled and marked with a purple dot to make users aware that these attributes were auto-suggested. This visual cue helps streamline the workflow, allowing users to concentrate more on QA rather than the time-consuming task of enriching content.

Our Approach

Before we even began with the technical design, we built a small POC. We evaluated the results of various models on a large sample of articles from our catalog assortment measuring accuracy by having the predictions reviewed by domain experts. After doing a thorough analysis and multiple tests, we decided to use the OpenAI GPT-4 Turbo model, as it provided the right balance between accuracy and information coverage. We started crafting the prompt to ensure the best accuracy of suggested attributes.

As GPT-4o was announced relatively early in the copilot's development, we initially performed a human inspection, comparing the accuracy of different sources for sample articles. The new model not only provided better results but also delivered faster response times and proved to be more cost-effective. While this was a clear improvement, our goal is to automate this process. We are now able to easily integrate different suggestion sources/models within the copilot, which is a key step toward achieving this automation across the platform.

Design and Implementation

We designed and implemented a system leveraging multiple AI services. To simplify the use case, we will describe one of our use cases.

Simplified workflow for generation of attribute suggestions

This diagram illustrates the current workflow involving the interaction between four components: Content Creation Tool, Prompt Generator, Article Masterdata and OpenAI - GPT.

Content Creation Tool: Internal content creation tool used by photographers to upload images, which URLs are sent to the Prompt Generator. Receives generated attribute suggestions from the OpenAI-GPT - and auto-selected them in the copyrighting workflow in Content Creation Tool.
Article Masterdata: Holds metadata about articles, such as attributes and attribute sets (definition of the types and attributes that are optional and mandatory for the article type) of the article.
Prompt Generator: Generate prompts based on the attributes and attribute sets coming from Article Masterdata. The prompts and image URLs are sent to OpenAI-GPT for further processing.
OpenAI-GPT: Processes the prompts received from the Prompt Generator and provides suggestions based on the prompts. The suggestions or content are sent back to the Content Creation Tool.

Challenges and Solutions

As Zalando operates in 25 markets with different languages, we are storing the attributes of the article as attribute codes. One of the biggest challenges was translating the Zalando-specific attribute codes provided by Master Data (e.g. for the attribute assortment_type, master data is providing values with following values: assortment_type_7312, assortment_type_7841) into human-readable language understandable by the GPT model and then translating the suggestions back into the Master Data-specific code. The solution was to get the English translation of the possible attribute values (in this case it’s Petite and Tall), wait for the GPT response, and then translate it back into the attribute_code. As the suggestions directly impact customer experience, it was imperative for us to ensure the output of OpenAI was compatible with our APIs. We built a translation layer that converts OpenAI output into information directly usable by Zalando and discards the part that is not relevant.

Another challenge was that some attributes shouldn't be filled for certain types of articles according to the internal guidelines, and the accuracy of predicted suggestions for these attributes was often poor. To address this, we introduced a mapping layer between product categories and the relevant information that should be shown to the customer. Furthermore, we created custom guidelines as part of the prompt for complex product attributes which gave additional hints (E.g. differentiating between V-neck and Low cut V-neck collar types).

GPT-4o model tends to suggest general attributes like V-necks or round necks for necklines correctly, but can be less precise when it comes to more fashion-specific ones, like deep scoop necks. This issue is more noticeable when using balanced datasets (where there’s an equal number of samples per attribute) compared to unbalanced ones (where the sample proportions reflect real-world trends). The risk is that less common or more specific fashion terms may be treated inaccurately or being incomplete. That's one of the reasons, why we created an aggregator service - to integrate multiple AI services, leveraging a wider variety of data sources, such as brand data dumps, partner contributions, and images, to improve the accuracy and completeness of the results.

One of the challenges we encountered was reducing the infrastructure costs of suggestions generation, which were higher than expected. First, we stopped generating suggestions for some unsupported attribute sets. Second, we migrated to GPT-4o model, which significantly lowered costs.

A further challenge involved identifying the optimal set of images to enhance input quality while balancing cost efficiency, as we found out some image types performed better than others, with product-only front images delivering the best results, followed closely by front images featuring the products being worn by the model.

Results and Impact

The early results are very encouraging as we see an improvement in both data quality and coverage of attributes. The way we built our architecture helped us do a controlled rollout where we could easily include/exclude products or attributes with minimal effort. Involving our users early in product development brought great benefits, as the adoption was very smooth, and the content creation experts are now actively contributing to the prompts. We've achieved an accuracy rate of approximately 75%, and we're enriching around 50,000 attributes on average per week. As a next step, we will focus on improving accuracy for niche categories and expanding the coverage of the product information beyond the regular product attributes.

Conclusion

The architecture built around the Content Creation Copilot has proven to be a strong baseline for future use cases by providing an easy way of integrating future model sources and enhancing data accuracy. The next use case involves describing images with the most informative tags, which unblocks multiple applications, including content performance analytics and delivering better-targeted ads. Additionally, we will assist in generating suggestions for free text attributes and their translations.

We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Frontend Engineer!