知识图谱多步推理
如果无法正常显示,请先停止浏览器的去广告插件。
1. Stanford
知识图谱多步推理
任泓宇 斯坦福大学四年级博士研究生
2. 自我介绍
科研兴趣:图机器学习,逻辑推断,
常识推理,表征学习
任泓宇
斯坦福大学计算机系四年级博士生
导师:Jure Leskovec教授
Hongyu Ren, Stanford University
2
3. Knowledge Graphs
§ Knowledge Graphs are heterogenous graphs
§ Multiple types of entities and relations exist
§ Facts are represented as triples ℎ, ?, ?
§ (‘Mona Lisa’, ‘created_by’, ‘Leonardo da Vinci’)
§ (‘Paris’, ‘is_a’, ‘City’)
§ …
§ Examples:
Hongyu Ren, Stanford University
3
4. Traditional Tasks
Knowledge Graph Completion
Predict the missing head or tail for a given
triple (ℎ, ?, ?)
Hongyu
Ren, Stanford
Jure
Leskovec,
Stanford University
University
4
5. Logical Reasoning
Goal: Complex logical reasoning on KG
§ For example:
§ Where did Canadian Turing Award winners
graduate?
§ Who are current presidents of European
countries which never held a (soccer) world
cup?
§ Predict drugs that might target proteins that
are associated with SARS-CoV2?
Hongyu Ren, Stanford University
5
6. Examples:
Applications in biomedicine:
Bio KG
Hongyu
Ren, Stanford
Jure
Leskovec,
Stanford University
University
6
7. Examples:
Applications in recommender systems:
Query formula
Query plan
(a DAG)
Example subgraphs
that satisfy the query
Hongyu Ren, Stanford University
7
8. Traditional Approach
§ Translate the question
into a structured
query
§ Execute the query on
the knowledge graph
§ Match grounded
entities
§ “Traverse” the
knowledge graph
along the relations
Hongyu Ren, Stanford University
8
9. Our Task: Complex Link Prediction
Given arbitrary logical query, directly predict
the answer entities:
“Where did all Canadian citizens with Turing Award graduate?”
Logical
query:
? = ? ? . ∃ ? ∶ ???(???????????, ?) ∧ ??????? ??????, ?
∧ ???????? ?, ? ?
Pearl
Win
Query plan
Turing
Award
Win
Hinton
Turing
Award
NYU
Graduate
Bengio
Intersection
Edinburgh
Cambridge
Canada
Citizen
Graduate
Intersection
Bieber
Canada
Citizen
Jure
Leskovec,
Stanford University
University
Hongyu
Ren, Stanford
Trudeau
McGill
Answer entities
9
10. Challenges
KGs are
incomplete
and noisy
Traversing them
requires time
exponential in
the query depth
Completing the KG
creates additional
links, which further
slows down query
answering
§ How to capture
uncertainty
§ How to impute
missing relations
§ How to efficiently
answer queries
on large graphs
Hongyu Ren, Stanford University
10
11. Our General Idea
Logical
query
Reason in the
embedding space
Knowledge graph
Embedding space
§ Map queries into embedding space.
§ Learn to reason in that space
[Embedding Logical Queries on Knowledge Graphs. Hamilton, et al., NeurIPS 2018]
Hongyu
Ren, Stanford
Jure
Leskovec,
Stanford University
University
11
12. Our Framework
Goal: Answer logical
queries
E.g.: “Predict drugs C
likely target proteins
X associated with
diseases d 1 and d 2 ”
Idea: Logical operators
become spatial operators
Hongyu Ren, Stanford University
12
13. Reasoning over KG
§ Query2box: box embeddings for
EPFO queries (with ∃, ∧, ∨)
Hongyu Ren, Stanford University
13
14. Our Idea: Query2Box
Idea:
§ 1) Embed entities/relations of the KG
§ 2) For every logical operator learn a spatial
operator
So that:
§ 1) Take an arbitrary logical query. Decompose
it into a set of logical operators (∃,∧,∨)
§ 2) Apply a sequence of spatial operators to
embed the query
§ 3) Answers to the query are entities close to
the embedding of the query
Hongyu Ren, Stanford University
14
15. Our Idea: Query2Box
Idea:
§ 1) Embed nodes of the graph
§ 2) For every logical operator learn a spatial
operator
Key insight:
So that:
Represent query as a box.
§ 1) Take an arbitrary logical query. Decompose
Operations
(intersection
etc.)
it into
a set of logical
operators (∃,∧,∨)
are well
defined
over boxes.
§ 2) Apply
a sequence
of spatial
operators to
embed the query
§ 3) Answers to the query are entities close to
the embedding of the query
Hongyu Ren, Stanford University
15
16. Embedding Queries
Query2Box embedding:
Embed queries with hyper-rectangles
(boxes): ? = (??? ? , ???(?)).
Cambridge
???(?)
Edinburgh
McGill
Stanford
Embedding Space
[Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures. Vilnis, et al., ACL 2018]
Hongyu Ren, Stanford University
16
17. Embedding Queries
Query Plan
Turing
Award
Canada
Projection Intersection
Projection Projection
Intersection
§ Geometric Projection Operator
§ Geometric Intersection Operator
Hongyu Ren, Stanford University
17
18. Projection Operator
Geometric Projection Operator ?
§ ? : Box × Relation → Box
?′
?
?
Hongyu Ren, Stanford University
18
19. Projection Operator: Example
Pearl
Win
Turing
Award
Hinton
Win
Pearl
Turing
Award
Bengio
Hinton
Bengio
Bengio
Hinton
Edinburgh
McGill
Graduate
Edinburgh
Cambridge
McGill
Cambridge
Graduate
Bengio
Hongyu Ren, Stanford University
Hinton
19
20. Intersection Operator
Geometric Intersection Operator ?
§ ℐ : Box × ⋯× Box → Box
§ The new center is a weighted average
§ The new offset shrinks
Hongyu Ren, Stanford University
20
21. Intersection Operator: Example
Pearl
Win
Hinton
Turing
Award
Turing
Award
Win
Pearl
Bengio
Bengio
Bieber
Canada
Citizen
Canadian
Trudeau
Citizen
Hinton
Bieber
Trudeau
Hongyu Ren, Stanford University
21
22. Example
“Where did Canadian citizens with Turing Award graduate?”
Hongyu Ren, Stanford University
22
23. Example
“Where did Canadian citizens with Turing Award graduate?”
Query Plan
Embedding Process
Turing
Award
McGill
Pearl
Bengio
Canada
Cambridge
Edinburgh
Hinton
Bieber
Trudeau
Hongyu Ren, Stanford University
23
24. Example
“Where did Canadian citizens with Turing Award graduate?”
Query Plan
Turing
Award
Canada
Embedding Process
Turing
Award
Projection
Win
McGill
Win
Pearl
Projection
Citizen
Bengio
Canada
Each point corresponds to a set of entities
Hongyu Ren, Stanford University
Citizen
Cambridge
Edinburgh
Hinton
Bieber
Trudeau
24
25. Example
“Where did Canadian citizens with Turing Award graduate?”
Query Plan
Turing
Award
Canada
Embedding Process
Turing
Award
Projection
Win
Projection
Citizen
Intersection
McGill
Win
Pearl
Bengio
Intersection
Canada
Each point corresponds to a set of entities
Hongyu Ren, Stanford University
Citizen
Cambridge
Edinburgh
Hinton
Bieber
Trudeau
25
26. Example
“Where did Canadian citizens with Turing Award graduate?”
Query Plan
Turing
Award
Canada
Embedding Process
Turing
Award
Projection
Win Intersection
Projection
Citizen Projection
Intersection Graduate
McGill
Win
Pearl
Bengio
Canada
Each point corresponds to a set of entities
Hongyu Ren, Stanford University
Citizen
Cambridge
Edinburgh
Graduate
Hinton
Bieber
Trudeau
26
27. How to Handle Disjunction
So far we can handle Conjunctive queries
Can we learn a geometric disjunction
operator?
§ Theorem (paraphrased): For a KG with
? nodes, we need embedding
dimension of ? to handle disjunction
Hongyu Ren, Stanford University
27
28. Disjunctive Normal Form
§ Any query with AND and OR can be
transformed into equivalent Disjunctive Normal
Form (disjunction of conjunctive queries).
§ ?∨? ∧? = ?∧? ∨ ?∧?
Converted Query Plan
Original Query Plan
? !
? !
Projection
? "
Projection
Projection Projection
Union
Projection
Union
? #
Projection
Intersection
? #
? "
Projection
Projection Projection
Intersection
Union
Intersection
Intersection
Union
Intersection
Hongyu Ren, Stanford University
? #
Projection
Intersection 28
29. Disjunctions: Solution
Given an arbitrary AND-OR query
§ 1) Transform it into an DNF
§ 2) Answer each conjunctive query
§ 3) Overall answer is the union of
conjunctive query answers
Hongyu Ren, Stanford University
29
30. Benefits of Query2Box
Scalability and efficiency:
§ Any query can be reduced to a couple of
matrix operations and a single k-nearest
neighbor search
Generality:
§ We can answer any query (even those we
have never seen before)
Robustness to noise:
§ Graph can contain missing and noisy
relationships
Hongyu Ren, Stanford University
30
31. Self-Supervised Training
Training examples: A batch of queries
on the graph
For each query,
§ One Positive: Path with a known answer
§ ? (~200) Negatives: Non-answer nodes of
the correct answer type
§ Contrastive Objective: Minimize the
distance between embedding of query and
positive answer and maximize for negatives.
Hongyu Ren, Stanford University
31
32. Limitations
§ Query2box (Q2B) can only handle a
subset of first-order logic queries
with only ∃, ∨, ∧.
§ How to handle negation ¬?
What are drugs that do not cause Short of Breath and treat diseases
associated with protein ESR2?
? = ? ? . ∃ ? ∶ ?????(???2, ?) ∧ ????????? ?, ? ? ∧ ¬ ???????? ?ℎ??????????ℎ, ? ?
ESR2
Intersection
Assoc TreatedBy
Short of
Breath
CausedBy Negation
Hongyu Ren, Stanford University
Intersection
32
33. Embedding Queries with Negation
§ Why modeling negation is interesting?
§ Reasoning in real-world requires negation
§ Can handle full set of FOL queries
§ Provides an alternative way to model disjunction
De Morgan’s laws: ? ∨ ? = ¬ ¬? ∧ ¬?
§ Can Q2B handle negation?
No, the complement of a box is not a box
?
Embedding Space
Hongyu Ren, Stanford University
33
34. Our Idea: BetaE
Idea:
§ Embed entities and queries as Beta distributions
§ Design probabilistic logical operators for ∃, ∧, ¬.
So that:
§ Logical operators are closed (especially for ¬)
§ Can naturally represent ∨ with ∧ and ¬:
§ P ∨ Q = ¬ ((¬P) ∧ (¬Q))
§ Handle the uncertainty of queries: Entropy of Beta
embedding ↔ number of answers
§ Embed any FOL query and find answers by
measuring the “distance” between query
embedding and an entity in the embedding space
Hongyu Ren, Stanford University
34
35. BetaE Example
Hongyu Ren, Stanford University
35
36. Experimental Setup
Questions:
§ Does our method generalize to new
unseen queries?
§ Does our method generalize to new
query structures?
§ Can method model the uncertainty?
Hongyu Ren, Stanford University
36
37. Experimental Setup
Training on an incomplete graph:
Test queries are not answerable in
the training graph
§ Every test query has at least an edge missing
§ Method has to (implicitly) impute missing
edges
§ Note: Query template matching / Traversing
would have accuracy of random guessing
Jure Leskovec (@jure), Stanford University
37
38. KG and Query Statistics
§ Datasets
FB15K, FB15K-237, NELL995
NELL: Never-Ending Language Learning
§ Queries:
Training + Evaluation Queries
Further Evaluation Queries
u
1p
n
2in
2p
n
3in
3p
n
inp
2i
n
pni
3i
n
pin
Hongyu Ren, Stanford University
ip
pi
u
2u
u
u
up
38
39. AND-OR Queries: Results
Training Queries
Unseen Queries
u u
u u
H@1 results of BetaE, Q2B, GQE on FB15k, FB15k-237, NELL995
Observations:
§ GQE maps query into point vectors instead of box/beta
§ On “training” queries: +18.4% H@1
§ On new conjunctive query structures: +13.5% H@1
Hongyu Ren, Stanford University
39
40. Negation Results
Queries with negation
n
n
n
n
n
§ BetaE is the first method to effectively
handle queries with negation!
Hongyu Ren, Stanford University
40
41. Embedding for KG Reasoning
§ Box/Beta embeddings for answering
logical queries on knowledge graphs
§ Handle any FOL query with ∃, ∨, ∧, ¬
§ Generalize well to unseen, extrapolated
queries
Hongyu
Ren, Stanford
Jure
Leskovec,
Stanford University
University
41
42. Embedding for KG Reasoning
§ Box/Beta embeddings for answering
logical queries on knowledge graphs
§ Handle any FOL query with ∃, ∨, ∧, ¬
§ Generalize well to unseen, extrapolated
queries
§ How to scale up these algorithms on
extremely large graphs?
Hongyu
Ren, Stanford
Jure
Leskovec,
Stanford University
University
42
43. Reasoning over KG
§ Query2box: box embeddings for
EPFO queries
§ BetaE: beta embeddings for FOL
queries
§ SMORE: scale Q2B/BetaE to
extremely large graphs
Hongyu Ren, Stanford University
43
44. Scalable Reasoning KG embedding
Scalable Multi-hOp REasoning framework (SMORE)
§ Asynchronous Design with Optimized Pipeline
§ Multi-thread bidirectional sampler with pre-fetching
mechanism
§ Sparse embedding R/W: CPUóPinned Memory
óGPU (including 1 st /2 nd order moments for optimizer)
Hongyu Ren, Stanford University
44
45. Large-Scale Benchmarks
§ We propose three new large KGs to
benchmark KG reasoning algorithms.
(1365x larger)
# Entities
Ours
# Edges
1E+09
100000000
10000000
Ours
100000000
Existing
1000000
10000000
Existing
100000
1000000
10000
100000
1000
FB15k FB15k-237 NELL995 FB15k FB15k-237 NELL995
FB400k OGB-Wikikg Freebase FB400k OGB-Wikikg Freebase
Hongyu Ren, Stanford University
45
46. Small KG Performance
Compared with existing multi-hop implementation
§ Small graphs:
§ Training speed: 2.2x
§ GPU memory: -30.6%
Hongyu Ren, Stanford University
46
47. Large KG Performance
§
§
§
SMORE supports 6
different methods
Prior implementation
runs out of GPU memory
and time limit on the
three large KGs
SMORE enables (almost)
graph-size agnostic
speed and GPU memory
usage
Iterations / Sec
40
35
# Entities
100000000
10000000
1000000
30
25
20
GPU Mem (MB)
2100
1600
100000
1100
600
100
Hongyu Ren, Stanford University
47
48. Link Prediction
Runtime of ComplEx on Freebase
§ SMORE also supports link prediction KG
embeddings: TransE, DistMult, RotatE, ComplEx.
§ SMORE achieves comparable/superior
performance with existing systems
Hongyu Ren, Stanford University
48
49. Multi-GPU Speedup
§ SMORE achieves almost linear speedup
across models and datasets
§ https://github.com/google-research/smore
Hongyu Ren, Stanford University
49
50. Conclusion
Logical
query
Reason in the
embedding space
Knowledge graph
Embedding space
§ Reasoning in the embedding space
§ Box/Beta Embeddings for answering complex queries
on KGs
§ SMORE: Large-scale system for training on massive
knowledge graphs
§ LEGO: apply embedding models to answer natural
language questions.
Hongyu Ren, Stanford University
50
51. References
§
§
§
§
§
Query2box: Reasoning Over Knowledge Graphs In Vector
Space Using Box Embeddings. H. Ren, W. Hu, J.
Leskovec. ICLR, 2020.
Scaling up Logical Query Embeddings on Knowledge
Graphs. H. Ren, H. Dai, B. Dai, X. Chen, D. Zhou,
J. Leskovec, D. Schuurmans. ICML SSL workshop 2021.
Beta Embeddings for Multi-Hop Logical Reasoning in
Knowledge Graphs. H. Ren, J. Leskovec. NeurIPS 2020.
LEGO: Latent Execution-Guided Reasoning for Multi-Hop
Question Answering on Knowledge Graphs. H. Ren, H. Dai,
B. Dai, X. Chen, M. Yasunaga, H. Sun, D. Schuurmans,
J. Leskovec, D. Zhou. ICML 2021.
Code:
§ https://github.com/snap-stanford/KGReasoning
§ https://github.com/google-research/smore
Hongyu Ren, Stanford University
51
52. Collaborators
Hongyu Ren, Stanford University
52