From Products to Inspiration: Inside the Engine of Occasion-based outfit visualiser

Ankit Kumar | Oct 2025 · 6 min read

Outfit visualisation

The “Why”: Moving Beyond the Grid

Picture this: A white background. A shirt. Fabric details. Fit specs. A price tag.

For decades, this has been the status quo of online shopping. It is clinical, clear, and — let’s be honest — completely detached from real life.

In this model, the customer does all the heavy lifting. “Where would I wear this?” they wonder. “Does this go with those beige chinos I bought last year?” They close their eyes. They imagine. They guess. Sometimes they buy; often, they bounce.

Traditional Product Detail Page (PDP) recommendations tried to help by suggesting jeans to pair with shirts. But the truth is, they remained a list of ingredients, not a prepared meal. At Myntra, we decided to change that. We set out to build Looks, a feature designed to transport a static product into a lived experience — a Friday night in Bangalore, a high-intensity gym in Gurgaon, or a quiet art gallery in Mumbai.

This is the story of how we orchestrated Data Science, Computer Vision, and Generative AI to build a personal stylist that scales to millions.

Phase 1: The Brain — Orchestrating the Look

Before we could visualize an outfit, we had to understand fashion. Not just as data points, but as a language.

This required Fashion Intelligence: a system that knows what works, what doesn’t, and why. Our Data Science team undertook a massive curation effort, analyzing over a million styles. They didn’t just tag clothes; they mapped them to the “cascading tree of style.”

For every Primary Style (e.g., a Polo shirt), the engine identifies four critical layers:

The Occasion: (Weekend Outing, Office Smart-Casual)
Secondary Style: (The bottom wear)
Tertiary Style: (Footwear)
Tertiary (others) : (Accessories like watches or sunglasses)

The Recipe in the Code

The logic is powered by a JSON structure that acts as the “AI Stylist’s” brain:

JSON

"29936239": [
  {
    "Weekend outing": [
      [
        29936239, // Primary: The Polo T-Shirt
        33551732, // Secondary: Beige Chinos
        29873488, // Tertiary: White Sneakers
        33040369  // Accessories: Aviator Sunglasses
      ]
    ]
  }
]

This isn’t just a list; it’s a recipe for synergy. The system understands that a “Weekend Outing” vibe isn’t complete without the specific harmony of chinos and aviators.

Phase 2: The Body — The Digital Drape

Once we had the recipe, we faced the engineering “cook”: How do we visualize this without it looking like a messy collage?

We couldn’t simply paste images next to each other. We needed to layer them realistically. The process involves:

The Intelligent Crawler: Scans our image stack for the “Hero” (Primary Style).
The Pose Model: Our in-house “Director” model. It hunts for the Full Shot Front Image — the only angle that allows for realistic layering — filtering out noisy or angled assets.
Coordinate Mapping: We digitally “drape” the trousers and shoes onto the primary model using complex anchor points.

The result? A composite image. It was accurate and precise, but it felt… sterile. It looked like a paper doll on a white page. We needed to give it a soul.

Phase 3: The Soul — The Generative AI Touch

This is where the project transcends standard engineering. We passed our draped composite to a Large Language Model (LLM) and Diffusion pipeline.

But we didn’t just ask it to “change the background.” We used Iterative Prompt Engineering to define the physics, the lighting, and the feeling of the image.

The Art of the Prompt

To avoid the “waxy, fake AI look,” we spoke the language of professional photography. We fed the system technical constraints:

The “Activewear” Vibe (Gurgaon Gym)

Context: Modern, high-end gym with city views.
Camera Spec: Shot on a Sony A7R IV, 85mm portrait lens at f/2.2.
Constraint: “Leave a 20–25% empty margin of rubber gym flooring at the bottom.” (This ensures UI text doesn’t cover the shoes!)

By enforcing these “Composition Rules,” the AI doesn’t just swap a background; it adjusts the lighting on the fabric and relaxes the model’s pose to fit the environment.

Drape and Pose change

Phase 4: Interaction — Making the Dream Shoppable

Outfit visualisation on Myntra APP

A beautiful image is a gallery piece; a shoppable image is a business.

The final piece of the puzzle is our Bounding Box (BBox) API. Once the AI returns the aspirational lifestyle image, the BBox system scans it to identify the new coordinates of every item: the shirt, the trousers, the sunglasses.

This closes the loop. When a user taps the sneakers in a “Weekend Outing” lifestyle shot, the app knows exactly which product is being touched. No guesswork. No friction.

Conclusion: Turning Data into Dreams

The Outfit Visualizer is more than just a cool feature; it’s a paradigm shift in fashion e-commerce.

By combining the structured logic of Data Science with the creative “right-brain” power of Generative AI, we have successfully automated the work of a personal stylist. We’ve moved from selling items to selling inspirations.

At Myntra, we aren’t just filling carts; we’re helping millions of users visualize their best selves, one occasion at a time.

Enjoyed this deep dive? Follow for more insights into how we’re rebuilding the future of retail at Myntra Engineering.