lyft2vec - Lyft的嵌入式系统

Co-authors:

共同作者：

, and

，以及

An illustrative city map of an unidentified city showing highlighted streets and some topography, but with no labels.

Intro

介绍

Graph learning methods can reveal interesting insights that capture the underlying relational structures. Graph learning methods have many industry applications in areas such as product or content recommender systems and network analysis.

图式学习方法可以揭示出有趣的洞察力，捕捉到潜在的关系结构。图式学习方法在产品或内容推荐系统和网络分析等领域有许多行业应用。

In this post, we discuss how we use graph learning methods at Lyft to generate embeddings — compact vector representation of high-dimensional information. We will share interesting rideshare insights uncovered by embeddings of riders, drivers, locations, and time. As the examples will show, trained embeddings from graphs can represent information and patterns that are hard to capture with traditional, straightforward features.

在这篇文章中，我们将讨论我们如何在Lyft使用图形学习方法来生成嵌入--高维信息的紧凑向量表示。我们将分享通过对乘客、司机、地点和时间的嵌入所发现的有趣的乘车见解。正如这些例子所显示的，来自图的训练有素的嵌入可以代表传统的、直接的特征难以捕捉的信息和模式。

Lyft Data and Embeddings

Lyft数据和嵌入物

At Lyft, we have semi-structured data capturing complex interactions between drivers, riders, locations, and time. We can construct graphs representing these interactions (e.g. a graph can be formed by connecting a rider with all the locations they have visited). From these graphs we can generate embeddings to succinctly express a rider’s or driver’s entire ride history. These embeddings allow us to efficiently summarize vast and varied information in a machine-friendly representation.

在Lyft，我们有半结构化的数据来捕捉司机、乘客、地点和时间之间的复杂互动。我们可以构建代表这些互动的图（例如，一个图可以通过连接一个骑手和他们访问过的所有地点来形成）。从这些图中，我们可以生成嵌入，简洁地表达一个骑手或司机的整个骑行历史。这些嵌入使我们能够以机器友好的方式有效地总结大量不同的信息。

For example, there are over 9,000 Geohash-6 (Gh6) level locations around the San Francisco Bay Area. If we wanted to describe a driver’s ride history around the Bay Area without embeddings, we would need a histogram or vector of length over 9,000 to describe it precisely. The vector would contain the number of rides the driver has started in each Gh6, with lots of zeros if the driver has never been in some Gh6s.

例如，旧金山湾区周围有超过9000个Geohash-6（Gh6）级别的地点。如果我们想在没有嵌入的情况下描述一个司机在湾...