How to search by colour in Elasticsearch

With over 40,000 active designs, our client Patternbank are global leaders in the design and fashion industry. They were looking to improve their customers’ experience by returning more relevant designs when searching. For this, we knew that Elasticsearch would be the best tool for the job.

Filtering and searching text-based items such as design name, descriptions and categories was relatively easy (the searchkick  readme will point you in the right direction). However, converting Patternbank’s current ‘search by colour’ functionality from MYSQL to Elasticsearch proved to be more difficult.

Issues we faced

Before we tackled this task, when a design was uploaded onto the system, the image was analysed and we extracted the top six colours. For each colour, the RGB value is then stored in the database. Previously, functionality allowed searching by related colours using Math and MYSQL POWER functions. However, in order to allow filtering by colours, we needed to convert that operation to work with Elasticsearch. For example: “Find me all ‘Menswear’ designs, within the category ‘Floral’ that has the colour #0000FF”

Google helped us with a general idea of searching by colour in Elasticsearch. Although what we found was a great start, it was not quite what we were looking for. The search queries focused on matching only one dominant colour and this wasn’t useful for us as we have six.

The problem with RGB

There are many standards for measuring colour, such as the popular HEX and RGB which are used all over the web. We store the colours in RGB and the problem is that the RGB colour space is perceptually non-uniform. This means that the smallest change can give us a visually noticeable change and substantial changes can lead to very little difference. For example:

0,0,255 and 30,144,255 are both obviously blue, but the second colour has high red and green values yet the red and green colours are not present.

So RGB isn’t going to work correctly, we need to use a different model; HSL (Hue, Saturation, Lightness). The hue ranges from 0 to 360, both 0 and 360 being the same colour. The saturation is from 0 to 100 and it sets the how intense the colour is. The Lightness ranges from 0 to 100 and this is mainly how bright or dark the colour appears.

Indexing

After deciding that converting to HSL was the way to go, the first step was then to index our colour data in Elasticsearch. Every design has six RGB values (R:211, G:138, B:23) individually stored by row; they had to be converted to HSL and saved in an array to be indexed.

def colour_list colors.map{|c| ColorConverter.rgb_to_hsl(c.r,c.g,c.b)} end

There is support for arrays in Elasticsearch, however, as we are going to be storing an array of objects we cannot query each object independently of the other objects in the array. This means, that if we are looking for a colour with a high green (R:100, G:255, B:10), any object in the array with green of -/+ 10 will be a matched – this would be a match (R:0, G:255, B:255). The solution to this is to set the colour array (colour_list) field as a nested type when mapping – with searchkick we can do so using the custom advanced mapping.

class Design < ApplicationRecord searchkick mappings: { design: { properties: { colour_list: {type: "nested"} } } } end

Searching

Now that we have an indexed out data, the only thing left is to search it and get relevant results! Part 3 of Sandeep Chivukula’s post on building a photo search covers this well in JS (angular).

Sandeep’s post gave us a great start, but it wasn’t going to work for us. We are using Elasticsearch 5 which has deprecated and removed a few queries from his example. The main problem however, is that the query only searches one flat array, we have six. We indexed our array of objects using the nested datatype and when searching we needed to maintain the independence of each object in the array. Based on the inner workings of how Elasticsearch nested objects are stored as separate hidden documents, when querying the objects, we needed to use the nested query to be able to query them independently.

query: { function_score: { query: { "nested" => { "path" => "colour_list", "query" => { "bool" => { "must" => [ { "range" => { "colour_list.h" => { gte: (term[:h] - term[:h] * 0.1).to_i, lte: (term[:h] + term[:h] * 0.1).to_i } } }, { "range" => { "colour_list.s" => { gte: (term[:s] - term[:s] * 0.1).to_i, lte: (term[:s] + term[:s] * 0.1).to_i } } }, { "range" => { "colour_list.l" => { gte: (term[:l] - term[:l] * 0.1).to_i, lte: (term[:l] + term[:l] * 0.1).to_i } } } ] } } } },

score\_mode: "sum",
functions: \[
  {
    exp: {
      "colour\_list.h" => {
        origin: term\[:h\],
        offset: 1,
        scale: 2
      }
    }
  },
  {
    exp: {
      "colour\_list.s" => {
        origin: term\[:s\],
        offset: 2,
        scale: 4
      }
    }
  },
  {
    exp: {
      "colour\_list.l" => {
        origin: term\[:l\],
        offset: 2,
        scale: 4
      }
    }
  }
\]

} }

Improvements

Most of the problems we encountered were due to the way we choose the colours from the image. A vital improvement we made is to create an algorithm that would not only find the most significant colours from the image, but it will also keep track of their ratios making it easy to sort by most relevant colour. If we successfully archive this, we would not only need to run a script to process every single image uploaded to the site but we would also need to change the way we index and search the colours using Elasticsearch.

Have a read of our blog post ‘Creating the ultimate user experience for Patternbank‘ to find out what we did to improve navigation and filtering, using the above Elasticsearch methods.

Resources

Elasticsearch Reference

Building a photo search in a weekend – Elasticsearch + Docker

Header image by Luca Upper on Unsplash

Tech Lead

Strategy game and fantasy fiction fan. Builds small internet gadgets that might take over the world.

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.1. UTC+08:00, 2024-05-18 16:59
浙ICP备14020137号-1 $访客地图$