桥接SQL方言:构建统一的翻译器

In today’s data-driven world, there is a significant advantage to being able to support a polyglot data environment in which different storage and processing technologies are used to handle various needs. Modern data ecosystems handle diverse data types and workloads. At DoorDash, we use a variety of tools and frameworks to meet our wide-ranging data requirements. This naturally introduces different query engines and numerous SQL dialects.

在当今数据驱动的世界中,能够支持多语言数据环境具有显著优势,其中使用不同的存储和处理技术来满足各种需求。现代数据生态系统处理多种数据类型和工作负载。在 DoorDash,我们使用多种工具和框架来满足我们广泛的数据需求。这自然引入了不同的查询引擎和众多 SQL 方言。

Human errors can arise during SQL dialect translations as we use data collaboratively across these tools. Even subtle differences in a translated query can significantly impact job results, performance, and cost.

在我们通过这些工具协作使用数据时,SQL方言翻译中可能会出现人为错误。即使是翻译查询中的细微差别也可能对作业结果、性能和成本产生重大影响。

Interoperability becomes essential to leverage data seamlessly across various platforms and technologies. SQL translators play a key role by enabling data applications to interact with multiple sources without major code changes, making them portable across different database backends. Transaxle, our internal SQL translation service built in collaboration with Databricks, helps us achieve this interoperability.

互操作性变得至关重要,以便在各种平台和技术之间无缝利用数据。SQL翻译器通过使数据应用能够与多个来源交互而无需重大代码更改,从而发挥关键作用,使其在不同的数据库后端之间可移植。Transaxle,我们与Databricks合作构建的内部SQL翻译服务,帮助我们实现这种互操作性。

We work with diverse data types and processing needs at DoorDash, from real-time analytics to large-scale machine learning. For example:

我们在DoorDash处理多种数据类型和处理需求,从实时分析到大规模机器学习。例如:

  • Business intelligence workloads use stream processing such as Apache Kafka or Flink alongside fast querying with Trino over data lakes that use modern formats like Iceberg and Delta Lake.
  • 商业智能工作负载使用流处理,如 Apache Kafka 或 Flink,并结合使用 Trino 对使用现代格式(如 Iceberg 和 Delta Lake)的数据湖进行快速查询。
  • Machine learning workloads rely on Spark SQL and MLlib to handle large, complex datasets with iterative, in-memory processing.
  • 机器学习工作负载依赖于 Spark SQL 和 MLlib 来处理大型、复杂的数据集,进行迭代...
开通本站会员,查看完整译文。

trang chủ - Wiki
Copyright © 2011-2025 iteam. Current version is 2.147.1. UTC+08:00, 2025-10-30 05:38
浙ICP备14020137号-1 $bản đồ khách truy cập$