The Airbnb Tech Blog

Creative engineers and data scientists building a world where you can belong anywhere. http://airbnb.io

Follow publication

Airbnb’s AI-powered photo tour using Vision Transformer

Pei Xiong
The Airbnb Tech Blog
9 min readNov 13, 2024

Introduction

Figure 1: Photo Tour product powered by ML

Methodology

Room Classification

Image Similarity

Figure 2: An illustration of Siamese network for image similarity

Accuracy Improvement

Figure 3: correlation between data volume and accuracy

Pre-training and Traditional Fine-tuning

Multi-task Learning

Figure 4: Multi-task learning illustration

Ensemble Learning

Knowledge Distillation

Golden Evaluation

Conclusion

Acknowledgments

The Airbnb Tech Blog
The Airbnb Tech Blog

Published in The Airbnb Tech Blog

Creative engineers and data scientists building a world where you can belong anywhere. http://airbnb.io

Responses (4)

Write a response

Is this something you build from scratch for existing model? mind to give some reference or architecture used in this article? thanks

Love the use of the custom metric for evaluating an actual user’s experience, e.g. “how many photos would a user have to change?”. Really highlights real-world use case.

This is interesting, thanks for sharing! If I understand correctly, your image clustering problem would also qualify for contrastive learning approaches which also use a siamese network but use triplet or infonce loss instead of cross-entropy loss. Have you looked into that?