QueryGPT – 使用生成式人工智能进行自然语言到SQL的转换
SQL is a vital tool used daily by engineers, operations managers, and data scientists at Uber to access and manipulate terabytes of data. Crafting these queries not only requires a solid understanding of SQL syntax, but also deep knowledge of how our internal data models represent business concepts. QueryGPT aims to bridge this gap, enabling users to generate SQL queries through natural language prompts, thereby significantly enhancing productivity.
SQL是Uber的工程师、运营经理和数据科学家每天使用的重要工具,用于访问和操作数千兆字节的数据。撰写这些查询不仅需要对SQL语法的扎实理解,还需要深入了解我们内部数据模型如何表示业务概念。QueryGPT旨在弥合这一差距,使用户能够通过自然语言提示生成SQL查询,从而显著提高生产力。
QueryGPT uses large language models (LLM), vector databases, and similarity search to generate complex queries from English questions that are provided by the user as input.
QueryGPT使用大型语言模型(LLM),向量数据库和相似性搜索从用户提供的英文问题生成复杂查询作为输入。
This article chronicles our development journey over the past year and where we are today with this vision.
本文记录了我们过去一年的发展历程以及我们在这个愿景中的当前进展。
Figure 1: Query Authoring Process.
图1:查询撰写过程。
At Uber, our data platform handles approximately 1.2 million interactive queries each month. The Operations organization, one of the largest user cohorts, contributes to about 36% of these queries. Authoring these queries generally requires a fair amount of time between searching for relevant datasets in our data dictionary and then authoring the query inside our editor. Given that each query can take around 10 minutes to author, the introduction of QueryGPT, which can automate this process and generate reliable queries in just about 3 minutes, represents a major productivity gain.
在Uber,我们的数据平台每月处理大约120万个交互式查询。作为最大的用户群之一,运营部门贡献了约36%的查询。撰写这些查询通常需要在我们的数据字典中搜索相关数据集,然后在我们的编辑器中撰写查询。鉴于每个查询可能需要约10分钟的时间来撰写,引入QueryGPT可以自动化这个过程,并在约3分钟内生成可靠的查询,这代表了一项重大的生产力提升。
If we make a conservative estimate that each query takes about 10 minutes to author, QueryGPT can automate this process and provide sufficiently reliable queries in about 3 minutes. This would result in a major productivity gain for Uber.
如果我们保守估计每个查询需要大约10分钟的时间来编写,QueryGPT可...