话题公司 › Grab

公司:Grab

Grab(前身为MyTeksi)是一间在东南亚地区提供服务的技术公司和交通网络公司,总部位于新加坡,由陈炳耀和陈慧玲于2012年在马来西亚雪兰莪州八打灵再也创立的移动应用程序。该应用连结乘客和司机,提供载客车辆租赁及即时共乘的分享型经济服务。乘客可以透过发送短信或是使用移动应用程序来预约这些载客的车辆,利用移动应用程序时还可以追踪车辆的位置。疫情期间兼开始经营外卖、送货、电子商务等等,成为全方面的生活平台。

Enabling near real-time data analytics on the data lake

In the domain of data processing, data analysts run their ad hoc queries on the data lake. The lake serves as an interface between our analytics and production environment, preventing downstream queries from impacting upstream data ingestion pipelines. To ensure efficient data processing in the data lake, choosing appropriate storage formats is crucial.

The journey of building a comprehensive attribution platform

The Grab superapp offers a comprehensive array of services from ride-hailing and food delivery to financial services. This creates multifaceted user journeys, traversing homepages, product pages, checkouts, and interactions with diverse content, including advertisements and promo codes.

Rethinking Stream Processing: Data Exploration

In this digital age, companies collect multitudes of data that enable the tracking of business metrics and performance. Over the years, data analytics tools for data storage and processing have evolved from the days of Excel sheets and macros to more advanced Map Reduce model tools like Spark, Hadoop, and Hive. This evolution has allowed companies, including Grab, to perform modern analytics on the data ingested into the Data Lake, empowering them to make better data-driven business decisions. This form of data will be referenced within this document as “Offline Data”.

With innovations in stream processing technology like Spark and Flink, there is now more interest in unlocking value from streaming data. This form of continuously-generated data in high volume will be referenced within this document as “Online Data”. In the context of Grab, the streaming data is usually materialised as Kafka topics (“Kafka Stream”) as the result of stream processing in its framework. This data is largely unexplored until they are eventually sunk into the Data Lake as Offline Data, part of the data journey (see Figure 1 below). This induces some data latency before the data can be used by data analysts to inform decisions.

Kafka on Kubernetes: Reloaded for fault tolerance

Coban - Grab’s real-time data streaming platform - has been operating Kafka on Kubernetes with Strimzi in production for about two years. In a previous article (Zero trust with Kafka), we explained how we leveraged Strimzi to enhance the security of our data streaming offering.

In this article, we are going to describe how we improved the fault tolerance of our initial design, to the point where we no longer need to intervene if a Kafka broker is unexpectedly terminated.

Sliding window rate limits in distributed systems

Grab使用Roaring位图来限制发送通信数量,避免信息过载和被用户视为垃圾邮件。他们将用户划分为不同群体,并根据用户与应用程序的互动确定每个群体的限制值。Roaring位图通过使用RLE容器来优化存储和性能,可以动态切换容器。他们选择了Redis作为数据存储,使用滑动日志速率限制算法来计算特定时间范围内的请求次数。他们使用Redis的SCRIPT LOAD命令来上传Lua脚本,并获取SHA1哈希值。然后使用EVALSHA命令调用Lua脚本来执行速率限制逻辑,并使用Redis的流水线功能进行批量处理。Redis的流水线功能将多个命令进行分组,并通过单个网络调用发送给相关节点,然后将速率限制结果返回给客户端。为了避免长时间运行的Lua脚本阻塞其他Redis命令,他们确保脚本在5毫秒内执行完毕。此外,脚本还接收当前时间作为参数,以考虑在节点副本上执行脚本时可能存在的时间差异。

An elegant platform

Grab的Coban团队开发了一个名为Coban的实时数据流平台,其中核心组件是Coban UI和Heimdall。Coban UI是一个前端Web界面,用户可以通过几次点击创建数据流资源,并与多个监控系统无缝集成,实时监控关键指标和健康状态。Heimdall是Coban UI的后端,提供API来管理数据流资源,包括创建、读取、更新和删除操作。Heimdall还负责集中和提供与这些资源相关的元数据,以供其他Grab系统使用。通过从各种上游系统和平台获取数据,并不断丰富和更新元数据,Heimdall可以为其他Grab平台提供全面准确的数据流资源信息。此外,Heimdall还将整个资源清单纳入Grab的库存平台,以及将Kafka流纳入其中。

Road localisation in GrabMaps

通过对地理哈希的优化,我们可以在处理地图时提高效率。同时,我们还需要监控地理哈希的内容,考虑到其中的道路密度,以实现计算操作的平衡性。此外,选择适当的资源也是优化时间和成本的关键。总之,通过优化地理哈希和平衡资源选择,我们可以实现最佳的性价比。

Graph modelling guidelines

Graph modelling is a highly effective technique for representing and analysing complex and interconnected data across various domains. By deciphering relationships between entities, graph modelling can reveal insights that might be otherwise difficult to identify using traditional data modelling approaches. In this article, we will explore what graph modelling is and guide you through a step-by-step process of implementing graph modelling to create a social network graph.

LLM-powered data classification for data entities at scale

At Grab, we deal with PetaByte-level data and manage countless data entities ranging from database tables to Kafka message schemas. Understanding the data inside is crucial for us, as it not only streamlines the data access management to safeguard the data of our users, drivers and merchant-partners, but also improves the data discovery process for data analysts and scientists to easily find what they need.

The Caspian team (Data Engineering team) collaborated closely with the Data Governance team on automating governance-related metadata generation. We started with Personal Identifiable Information (PII) detection and built an orchestration service using a third-party classification service. With the advent of the Large Language Model (LLM), new possibilities dawned for metadata generation and sensitive data identification at Grab. This prompted the inception of the project, which aimed to integrate LLM classification into our existing service. In this blog, we share insights into the transformation from what used to be a tedious and painstaking process to a highly efficient system, and how it has empowered the teams across the organisation.

Scaling marketing for merchants with targeted and intelligent promos

A promotional campaign is a marketing effort that aims to increase sales, customer engagement, or brand awareness for a product, service, or company. The target is to have more orders and sales by assigning promos to consumers within a given budget during the campaign period.

From our research, we found that merchants have specific goals for the promos they are willing to offer. They want a simple and cost-effective way to achieve their specific business goals by providing well-designed offers to target the correct customers. From Grab’s perspective, we want to help merchants set up and run campaigns efficiently, and help them achieve their specific business goals.

Stepping up marketing for advertisers: Scalable lookalike audience

The advertising industry is constantly evolving, driven by advancements in technology and changes in consumer behaviour. One of the key challenges in this industry is reaching the right audience, reaching people who are most likely to be interested in your product or service. This is where the concept of a lookalike audience comes into play. By identifying and targeting individuals who share similar characteristics with an existing customer base, businesses can significantly improve the effectiveness of their advertising campaigns.

Building hyperlocal GrabMaps

Southeast Asia (SEA) is a dynamic market, very different from other parts of the world. When travelling on the road, you may experience fast-changing road restrictions, new roads appearing overnight, and high traffic congestion. To address these challenges, GrabMaps has adapted to the SEA market by leveraging big data solutions. One of the solutions is the integration of hyperlocal data in GrabMaps.

Hyperlocal information is oriented around very small geographical communities and obtained from the local knowledge that our map team gathers. The map team is spread across SEA, enabling us to define clear specifications (e.g. legal speed limits), and validate that our solutions are viable.

Streamlining Grab's Segmentation Platform with faster creation and lower latency

Launched in 2019, Segmentation Platform has been Grab’s one-stop platform for user segmentation and audience creation across all business verticals. User segmentation is the process of dividing passengers, driver-partners, or merchant-partners (users) into sub-groups (segments) based on certain attributes. Segmentation Platform empowers Grab’s teams to create segments using attributes available within our data ecosystem and provides APIs for downstream teams to retrieve them.

Unsupervised graph anomaly detection - Catching new fraudulent behaviours

Earlier in this series, we covered the importance of graph networks, graph concepts, graph visualisation, and graph-based fraud detection methods. In this article, we will discuss how to automatically detect new types of fraudulent behaviour and swiftly take action on them.

One of the challenges in fraud detection is that fraudsters are incentivised to always adversarially innovate their way of conducting frauds, i.e., their modus operandi (MO in short). Machine learning models trained using historical data may not be able to pick up new MOs, as they are new patterns that are not available in existing training data. To enhance Grab’s existing security defences and protect our users from these new MOs, we needed a machine learning model that is able to detect them quickly without the need for any label supervision, i.e., an unsupervised learning model rather than the regular supervised learning model.

To address this, we developed an in-house machine learning model for detecting anomalous patterns in graphs, which has led to the discovery of new fraud MOs. Our focus was initially on GrabFood and GrabMart verticals, where we monitored the interactions between consumers and merchants. We modelled these interactions as a bipartite graph (a type of graph for modelling interactions between two groups) and then performed anomaly detection on the graph. Our in-house anomaly detection model was also presented at the International Joint Conference on Neural Networks (IJCNN) 2023, a premier academic conference in the area of neural networks, machine learning, and artificial intelligence.

In this blog, we discuss the model and its application within Grab. For avid audiences that want to read the details of our model, you can access it here. Note that even though we implemented our model for anomaly detection in GrabFood and GrabMart, the model is designed for general purposes and is applicable to interaction graphs between any two groups.

Go module proxy at Grab

At Grab, we rely heavily on a large Go monorepo for backend development, which offers benefits like code reusability and discoverability. However, as we continue to grow, managing a large monorepo brings about its own set of unique challenges.

As an example, using Go commands such as go get and go list can be incredibly slow when fetching Go modules residing in a large multi-module repository. This sluggishness takes a toll on developer productivity, burdens our Continuous Integration (CI) systems, and strains our Version Control System host (VCS), GitLab.

In this blog post, we look at how Athens, a Go module proxy, helps to improve the overall developer experience of engineers working with a large Go monorepo at Grab.

Zero traffic cost for Kafka consumers

Coban, Grab’s real-time data streaming platform team, has been building an ecosystem around Kafka, serving all Grab verticals. Along with stability and performance, one of our priorities is also cost efficiency.

In this article, we explain how the Coban team has substantially reduced Grab’s annual cost for data streaming by enabling Kafka consumers to fetch from the closest replica.

首页 - Wiki
Copyright © 2011-2024 iteam. Current version is 2.137.1. UTC+08:00, 2024-11-04 19:02
浙ICP备14020137号-1 $访客地图$