USING AMAZON NEPTUNE TO BUILD FASHION KNOWLEDGE GRAPH
如果无法正常显示,请先停止浏览器的去广告插件。
1. USING AMAZON
NEPTUNE TO
BUILD FASHION
KNOWLEDGE
GRAPH
AWS Finland
September Meetup
Helsinki
September 2019
2. ZALANDO - EUROPE’S LARGEST ONLINE FASHION RETAILER
17 countries (+2 in 2019)
28+ million active customers
>300 million visits per month
~10 million orders per month
~5.4 billion € revenue 2018
15 500+ employees in Europe
130+ nationalities
Visit us: tech.zalando.com
3. ZALANDO HELSINKI TECH HUB
Our office is located in
BUILDING OUR
ECOMMERCE
PLATFORM
KAMPPI
AWS, Microservices, Scala,
Android and iOS
90
employees
27
Nationalities
12
3
Autonomous delivery
teams working with
modern technologies
4.
5.
6. FASHION
KNOWLEDGE
GRAPH
6
7. What is a knowledge graph?
Collection of elementary sentences
7
Subject Predicate Object
Entity (IRI) Named (IRI)
Directed Entity (IRI)
or Literal
8. Why a Knowledge Graph?
Problem: “Beyonce” was one of our most failing search queries.
Solution: Record the link between Beyonce and Ivy Park in the Knowledge Graph.
Search system can use this information.
Beyonce
IvyPark
8
isDesignerOf
Person Brand
9. 9
10. Technical implementation
10
11. RDF - Resource Description Framework
RDF is used to model the Knowledge graph
An RDF graph is a set of RDF triples
11
12. SPARQL - SPARQL Protocol And RDF Query Language
12
13. SPARQL - Property path
13
14. SPARQL - Property path
14
15. Why Graph
Graph DBs perform well for navigating highly connected data
For a social network containing
1,000,000 people, each with
approximately 50 friends, the
results strongly suggest that graph
databases outperform RDBMS for
highly connected data
At depth two (friends-of-friends), both the relational database and the graph
database perform well enough but after that RDBMS joins will become expensive
16. The Graph Database
16
17. Blazegraph
●
●
●
●
Blazegraph is an open source (GPLv2) graph database.
Its protocol is HTTP-based, no special libraries is needed.
One of the most complete in terms of SPARQL / RDF support.
Used by the Wikimedia foundation for their wikidata query service since 2015
, but...
●
●
No replication or cluster support in the open source version.
The commercial version is not available since 2017...
… because the company that developed Blazegraph is acquired by Amazon
18. Amazon Neptune
Blazegraph appears as the foundation of Amazon Neptune database:
★
=
+
+
★
★
★
Replication and the R/O
endpoint to distribute load
among replicas;
Redundancy via failover;
Cloud backups;
Technical support.
19. Fashion Knowledge Graph Deployment
●
●
●
Neptune endpoint supports no
authentication;
Let’s introduce a low-level API:
○ Uses Zalando OAuth2 for
authentication and
authorization;
○ Does basic SPARQL
validation.
High-level APIs may be
deployed in different VPCs.
20. Neptune is not an easy guy
●
●
●
●
●
There may be bugs! A concurrency problem that
we have reported in mid-2018, has been finally
fixed in the version 296 in May 2019;
Restoration from backup is VERY SLOW;
Backups are confined inside AWS - you can’t
download them. We used data dump (using a
SPARQL SELECT) and upload instead for
migration between AWS regions;
Test environment is rather expensive (instance
types start from r5.large);
Performance may be really bad even with small
graphs, if your queries are poorly written.
21. Performance
No magic, it requires work even if you use the right DB for your task
Issues with complex queries (especially property path)
Denormalize the data
-
-
21
Transactions to ensure consistency
Named graph can help
22. Conclusion
●
●
●
●
●
22
Amazon Neptune is a good solution for RDF/SPARQL-based
knowledge graph;
It can scale horizontally for read queries (by increasing the
number of replicas);
But it’s not magic, you need to work on your queries;
You still can use Blazegraph as a compatible replacement for
Neptune in tests and development;
Think over your backup&restore (and data migration) strategy.
Test it works as you expect.
23. Feel free to reach us:
Matthieu Guillermin
<matthieu.guillermin@zalando.fi>
Uri Savelchev
<uri.savelchev@zalando.fi>
Find out more about our
Culture, People & Jobs:
●
ON SOCIAL MEDIA:
Linkedin @Zalando SE
Facebook @ Inside Zalando
Instagram @insidezalando
Twitter @ZalandoTech
●
●
CAREER WEBSITE: jobs.zalando.com
CORPORATE WEBSITE:
corporate.zalando.com