公司:Grab
Grab(前身为MyTeksi)是一间在东南亚地区提供服务的技术公司和交通网络公司,总部位于新加坡,由陈炳耀和陈慧玲于2012年在马来西亚雪兰莪州八打灵再也创立的移动应用程序。该应用连结乘客和司机,提供载客车辆租赁及即时共乘的分享型经济服务。乘客可以透过发送短信或是使用移动应用程序来预约这些载客的车辆,利用移动应用程序时还可以追踪车辆的位置。疫情期间兼开始经营外卖、送货、电子商务等等,成为全方面的生活平台。
No version left behind: Our epic journey of GitLab upgrades
Join us as we share our experience in developing and implementing a consistent upgrade routine. This process underscored the significance of adaptability, comprehensive preparation, efficient…
Ensuring data reliability and observability in risk systems
As the amount of data Grab handles grows, there is an increased need for quick detections for data anomalies (incompleteness or inaccuracy), while keeping it secure. Read this to learn how the Risk…
Grab Experiment Decision Engine - a Unified Toolkit for Experimentation
该文章介绍了Grab开发的实验工具包,用于在Grab平台上进行实验和因果分析。该工具包具有多种纠正技术,能够处理多个处理组、不均匀处理大小和异质处理效应。它在Grab内部广泛使用,并提供了GrabCausal Methodology Bank以共享因果方法的代码和指南。文章强调了持续更新和了解最新统计测试方法的重要性。总的来说,该工具包为Grab的数据科学家社区提供了自动化实验和产品决策的支持,进一步推动了Grab在东南亚地区的经济赋能使命。
Iris - Turning observations into actionable insights for enhanced decision making
With cross-platform monitoring, a common problem is the difficulty in getting comprehensive and in-depth views on metrics, making it tough to see the big picture. Read to find out how the Data Tech…
Android App Size at Scale with Project Bonsai
With the size of our app growing to include more features, Grab recognised it as a potential hurdle for new users with small storage capacities or restricted Internet bandwidth. Read to find out more…
Enabling near real-time data analytics on the data lake
In the domain of data processing, data analysts run their ad hoc queries on the data lake. The lake serves as an interface between our analytics and production environment, preventing downstream queries from impacting upstream data ingestion pipelines. To ensure efficient data processing in the data lake, choosing appropriate storage formats is crucial.
The journey of building a comprehensive attribution platform
The Grab superapp offers a comprehensive array of services from ride-hailing and food delivery to financial services. This creates multifaceted user journeys, traversing homepages, product pages, checkouts, and interactions with diverse content, including advertisements and promo codes.
Rethinking Stream Processing: Data Exploration
In this digital age, companies collect multitudes of data that enable the tracking of business metrics and performance. Over the years, data analytics tools for data storage and processing have evolved from the days of Excel sheets and macros to more advanced Map Reduce model tools like Spark, Hadoop, and Hive. This evolution has allowed companies, including Grab, to perform modern analytics on the data ingested into the Data Lake, empowering them to make better data-driven business decisions. This form of data will be referenced within this document as “Offline Data”.
With innovations in stream processing technology like Spark and Flink, there is now more interest in unlocking value from streaming data. This form of continuously-generated data in high volume will be referenced within this document as “Online Data”. In the context of Grab, the streaming data is usually materialised as Kafka topics (“Kafka Stream”) as the result of stream processing in its framework. This data is largely unexplored until they are eventually sunk into the Data Lake as Offline Data, part of the data journey (see Figure 1 below). This induces some data latency before the data can be used by data analysts to inform decisions.
Kafka on Kubernetes: Reloaded for fault tolerance
Coban - Grab’s real-time data streaming platform - has been operating Kafka on Kubernetes with Strimzi in production for about two years. In a previous article (Zero trust with Kafka), we explained how we leveraged Strimzi to enhance the security of our data streaming offering.
In this article, we are going to describe how we improved the fault tolerance of our initial design, to the point where we no longer need to intervene if a Kafka broker is unexpectedly terminated.
Sliding window rate limits in distributed systems
Grab使用Roaring位图来限制发送通信数量,避免信息过载和被用户视为垃圾邮件。他们将用户划分为不同群体,并根据用户与应用程序的互动确定每个群体的限制值。Roaring位图通过使用RLE容器来优化存储和性能,可以动态切换容器。他们选择了Redis作为数据存储,使用滑动日志速率限制算法来计算特定时间范围内的请求次数。他们使用Redis的SCRIPT LOAD
命令来上传Lua脚本,并获取SHA1哈希值。然后使用EVALSHA
命令调用Lua脚本来执行速率限制逻辑,并使用Redis的流水线功能进行批量处理。Redis的流水线功能将多个命令进行分组,并通过单个网络调用发送给相关节点,然后将速率限制结果返回给客户端。为了避免长时间运行的Lua脚本阻塞其他Redis命令,他们确保脚本在5毫秒内执行完毕。此外,脚本还接收当前时间作为参数,以考虑在节点副本上执行脚本时可能存在的时间差异。
An elegant platform
Grab的Coban团队开发了一个名为Coban的实时数据流平台,其中核心组件是Coban UI和Heimdall。Coban UI是一个前端Web界面,用户可以通过几次点击创建数据流资源,并与多个监控系统无缝集成,实时监控关键指标和健康状态。Heimdall是Coban UI的后端,提供API来管理数据流资源,包括创建、读取、更新和删除操作。Heimdall还负责集中和提供与这些资源相关的元数据,以供其他Grab系统使用。通过从各种上游系统和平台获取数据,并不断丰富和更新元数据,Heimdall可以为其他Grab平台提供全面准确的数据流资源信息。此外,Heimdall还将整个资源清单纳入Grab的库存平台,以及将Kafka流纳入其中。
Road localisation in GrabMaps
通过对地理哈希的优化,我们可以在处理地图时提高效率。同时,我们还需要监控地理哈希的内容,考虑到其中的道路密度,以实现计算操作的平衡性。此外,选择适当的资源也是优化时间和成本的关键。总之,通过优化地理哈希和平衡资源选择,我们可以实现最佳的性价比。
Graph modelling guidelines
Graph modelling is a highly effective technique for representing and analysing complex and interconnected data across various domains. By deciphering relationships between entities, graph modelling can reveal insights that might be otherwise difficult to identify using traditional data modelling approaches. In this article, we will explore what graph modelling is and guide you through a step-by-step process of implementing graph modelling to create a social network graph.
LLM-powered data classification for data entities at scale
At Grab, we deal with PetaByte-level data and manage countless data entities ranging from database tables to Kafka message schemas. Understanding the data inside is crucial for us, as it not only streamlines the data access management to safeguard the data of our users, drivers and merchant-partners, but also improves the data discovery process for data analysts and scientists to easily find what they need.
The Caspian team (Data Engineering team) collaborated closely with the Data Governance team on automating governance-related metadata generation. We started with Personal Identifiable Information (PII) detection and built an orchestration service using a third-party classification service. With the advent of the Large Language Model (LLM), new possibilities dawned for metadata generation and sensitive data identification at Grab. This prompted the inception of the project, which aimed to integrate LLM classification into our existing service. In this blog, we share insights into the transformation from what used to be a tedious and painstaking process to a highly efficient system, and how it has empowered the teams across the organisation.
Scaling marketing for merchants with targeted and intelligent promos
A promotional campaign is a marketing effort that aims to increase sales, customer engagement, or brand awareness for a product, service, or company. The target is to have more orders and sales by assigning promos to consumers within a given budget during the campaign period.
From our research, we found that merchants have specific goals for the promos they are willing to offer. They want a simple and cost-effective way to achieve their specific business goals by providing well-designed offers to target the correct customers. From Grab’s perspective, we want to help merchants set up and run campaigns efficiently, and help them achieve their specific business goals.
Stepping up marketing for advertisers: Scalable lookalike audience
The advertising industry is constantly evolving, driven by advancements in technology and changes in consumer behaviour. One of the key challenges in this industry is reaching the right audience, reaching people who are most likely to be interested in your product or service. This is where the concept of a lookalike audience comes into play. By identifying and targeting individuals who share similar characteristics with an existing customer base, businesses can significantly improve the effectiveness of their advertising campaigns.