调色板元商店之旅

The Machine Learning (ML) team at Uber is consistently developing new and innovative components to strengthen our ML Platform (Michelangelo). 

Uber的机器学习(ML)团队不断开发新的创新组件,以加强我们的ML平台(Michelangelo)。

In machine learning, features are the data used to make model calculations and predict an outcome. You can think of them as the input to the learning model or attributes in your data that are relevant to the predictive modeling problem.

在机器学习中,特征是用于进行模型计算和预测结果的数据。您可以将它们视为学习模型的输入或与预测建模问题相关的数据属性。

When querying Uber’s data stores for feature data, it can be hard to:

当查询Uber的数据存储以获取特征数据时,可能会很困难:

  • Figure out good Uber-specific features
  • 找出好的Uber特定功能
  • Build pipelines to generate features
  • 构建流水线以生成功能
  • Compute features in real time
  • 实时计算特征
  • Guarantee that data used at training is the same as the data used for scoring predictions
  • 确保训练使用的数据与预测评分使用的数据相同
  • Monitor features
  • 监控功能

The Uber Michelangelo feature store, called Palette, is a database of Uber-specific curated and internally crowd-sourced features that are easy to use in machine learning projects. It comes to solve all the above-mentioned challenges. Pipelines are auto-generated for feature generations and feature dispersals. Palette supports various feature computation use cases, like batch and near real time, and includes precomputed features related to cities, drivers, and riders, as well as custom features generated for the EATs, Fraud, and Comms teams. Subject to our normal data access restrictions, Uber users are able to use many of the pruned features maintained by other Uber teams or even create their own and can directly incorporate these features in their machine learning models.

Uber的Michelangelo功能存储库称为Palette,是一个数据库,其中包含了Uber特定的经过策划和内部众包的特征,这些特征在机器学习项目中易于使用。它解决了上述所有挑战。自动生成特征生成和特征分散的流水线。Palette支持各种特征计算用例,如批处理和近实时,并包括与城市、司机和乘客相关的预计算特征,以及为EATs、欺诈和通信团队生成的自定义特征。在我们正常的数据访问限制下,Uber用户可以使用其他Uber团队维护的许多修剪特征,甚至可以创建自己的特征,并直接将这些特征纳入他们的机器学习模型中。

Image

Figure 1: Feature Generation graph shows job computing features. Feature Ingestion graph shows ingesting data to hive and Cassandra. Feature Servin...

开通本站会员,查看完整译文。

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.125.0. UTC+08:00, 2024-05-04 06:49
浙ICP备14020137号-1 $访客地图$