Uber如何在规模上实现工作计数

Uber operates on a massive scale, facilitating over 2.2 billion trips every quarter. Deriving even simple insights necessitates a scaled solution. In our case, we needed to count the number of jobs someone had participated in while on the Uber platform, for arbitrary time windows. This article focuses on the challenges faced and lessons learned as we integrated Apache Pinot™ into our solution.

Uber在大规模运营,每个季度提供超过22亿次出行。即使是简单的洞察也需要一个规模化的解决方案。在我们的案例中,我们需要计算某人在Uber平台上参与的工作数量,以任意时间窗口为基础。本文重点介绍了我们在将Apache Pinot™集成到我们的解决方案中所面临的挑战和吸取的教训。

Specifically our solution needed to resolve:

具体而言,我们的解决方案需要解决以下问题:

  • Several permutations of job counts, broken down by role, marketplace, and completeness axes
  • 按角色、市场和完整性轴分解的作业数量的多种排列组合
  • Point-in-time tenure at a given trip, or a given timestamp (i.e., where does job X lie on person A’s job history, chronologically?)
  • 给定一次行程的某个时间点的任期,或给定时间戳(即,工作X在人员A的工作历史中的时间顺序)

Our previous solution was simple: retrieve jobs for a given subject with a page size limit of 50, and paginate the result until there are no further jobs. In Uber’s early days, with no single account accruing comparatively much tenure, this worked well. However, Uber ventured into new verticals, and some accounts began to present tenure in the tens of thousands, it became clear that we needed a more robust solution.

我们之前的解决方案很简单:按照给定主题检索工作,每页限制为50个,并分页直到没有更多的工作。在Uber的早期阶段,由于没有任何一个账户积累了相对较长的任期,这个方案运行良好。然而,随着Uber进入新的领域,一些账户的任期开始达到数万,我们意识到我们需要一个更强大的解决方案。

A main product requirement was that this solution must be able to compute tenure lookback. This by itself would have been tenable, but accompanied by our data retention policy, it was deemed unreasonable to accommodate by our downstream team.

一个主要的产品要求是这个解决方案必须能够计算工作年限回溯。这本身可能是可以接受的,但结合我们的数据保留政策,我们的下游团队认为这是不合理的。

The same team, in the interest of cost savings, determined that data older than 2 years would be sequestered into a higher latency storage tier. However, a change of plans mid-project resulted in them dropping online access to this data altogether. 

出于成本节约的考虑,同一团队决定将超过2年的数据隔离到更高...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-22 20:17
浙ICP备14020137号-1 $访客地图$