需求遵循:利用LLM提升数据标注质量

The demand for labeled datasets continues to grow exponentially as machine learning and AI models are deployed across industries like autonomous vehicles, healthcare, retail, and finance. Uber AI Solutions provides industry-leading data labeling solutions for enterprise customers and plays a crucial role in enabling organizations to annotate data efficiently.  

随着机器学习与 AI 模型在自动驾驶、医疗、零售和金融等行业的广泛部署,对标注数据集的需求呈指数级增长。Uber AI Solutions 为企业客户提供业界领先的数据标注解决方案,在帮助组织高效标注数据方面发挥着关键作用。

We’ve developed several internal technologies to ensure that our clients receive high-quality data. This blog highlights one such technology: our in-tool quality-checking framework, Requirement Adherence, which detects text labeling errors before submission.

我们开发了多项内部技术,以确保客户获得高质量的数据。本博客重点介绍其中一项技术:我们的工具内质量检查框架——需求遵循,它能在提交前检测文本标注错误。

Labeling workflows typically rely on post-labeling checks or interhuman agreement to ensure the quality of the labeled data. While this is effective at ensuring quality, mislabeled and incomplete data must be sent back to experts for rework. This takes additional time, increases costs, and creates a bad experience for our enterprise clients.

标注工作流通常依赖标注后的检查或人工一致性来保证标注数据的质量。虽然这能有效保证质量,但错误标注和不完整数据必须退回给专家返工,既耗时又增加成本,也给我们的企业客户带来糟糕体验。

A more effective approach is to identify quality issues within our in-house labeling tool, uLabel, during the labeling process. However, the diverse nature of data labeling requests from clients makes creating a custom solution for each instance unscalable. To address this, we developed a scalable system that uses an SOP (Standard Operating Procedure) document. This document, which includes all client requirements along with other information, is either provided by the client or compiled collaboratively by an Uber AI Solutions Program Manager and the client.

更有效的方法是在我们内部标注工具 uLabel 的标注过程中就发现质量问题。然而,客户的数据标注需求多种多样,为每个实例创建定制方案不可扩展。为此,我们开发了一套可扩展的系统,使用 SOP(标准操作程序)文档。该文档包含所有客户需求及其他信息,由客户提供,或由 Uber AI Solutions Program Manager 与客户共同整理。

We use LLMs to extract the exact requirements from this document ...

开通本站会员,查看完整译文。

Accueil - Wiki
Copyright © 2011-2025 iteam. Current version is 2.146.0. UTC+08:00, 2025-10-23 01:59
浙ICP备14020137号-1 $Carte des visiteurs$