Apache Kafka is an open-source distributed event streaming platform that is used for data streaming pipelines, integration, and ingestion at PayPal. It supports our most mission-critical applications and ingests trillions of messages per day into the platform, making it one of the most reliable platforms for handling the enormous volumes of data we process every day.
To handle the tremendous growth of PayPal’s streaming data since its introduction, Kafka needed to scale seamlessly while ensuring high availability, fault tolerance, and optimal performance. In this blog post, we will provide a high-level overview of Kafka and discuss the steps taken to achieve high performance at scale while managing operational overhead, and our key learnings and takeaways.
If you have been working with web applications and used progressive web applications most likely you have heard the term Service Worker. If you haven’t, I’m going to teach you the basics of a Service Worker so you can leverage the power of them and use them in your web applications.
JunoDB is a distributed key-value store that plays a critical role in powering PayPal’s diverse range of applications. Virtually every core back-end service at PayPal relies on JunoDB, from login to risk to final transaction processing. With JunoDB, applications can efficiently store and cache data for fast access and load reduction on relational databases and other services. However, JunoDB is not just another NoSQL solution. It was built specifically to address the unique needs of PayPal, delivering security, consistency, and high availability with low latency, all while scaling to handle hundreds of thousands of connections. While other NoSQL solutions may perform well in certain use-cases, JunoDB is unmatched when it comes to meeting PayPal’s extreme scale, security, and availability needs. From the ground up, JunoDB was designed to be cost-effective, ensuring that PayPal can maintain its high standards of quality and operational excellence while keeping costs manageable.
Communication is an important part of human life. It helps resolve issues quickly, get answers to our questions and simply exchange ideas.
PayPal Upstream PayLater messages inform customers about financing opportunities to purchase products from t-shirts to treadmills. In addition to notifying customers of a safe and secure pay later option, these messages can increase merchants’ conversion and average order values. There is much upside for consumers and merchant customers in a small piece of HTML.
Delivering messages to customers may seem simple; however, at internet-scale, delivering messages to millions of consumers worldwide on thousands of merchant sites requires skilled engineering and sophisticated infrastructure. In addition to delivering the correct message to the proper merchant at the right time, PayPal’s merchant customers demand delivery of these messages at a ludicrous speed.
As enterprises become more agile, centralization appears more and more as a thing of the past world, a waterfall world. The same appears to be true with data platforms. Therefore, we are building a Data Mesh, this next generation of data platforms for PayPal Credit. This post details the evolution of data platforms, highlights their problems, and why we decided to build a Data Mesh. I will detail the four principles of the Data Mesh, how to get started, look at the architecture, and describe some of the challenges.
We defined Latency as network latency plus application request processing time. With our focus on optimizing the application request processing time, 3 parameters were chosen to define performance:
- Average latency
- 95th percentile
- 99th percentile
Explore challenges, methodologies and datasets around conversational sentiments and learn how PayPal analyses customer sentiment.
Improving business efficiency through automation.
Have you heard that 33% of mobile users have text size adjustment enabled on their phones? According to this excellent research by Q42 conducted among more than 1 million Dutch users, many mobile users require text resizing in order to properly read on their mobile devices. While this large survey was conducted specifically in the Netherlands, the data are similar to what is observed across other countries around the world. Therefore, ensuring quality text resizing is vital for approximately one third of your users.
Prevent organized and repeat fraudsters by using a home-grown graph platform.
As software design evolves, so do the thought processes behind the design decisions we make as engineers. Some of these development practices are widely known and talked about, such as Test-Driven Design, where changes to code are made in programmatic tests before they’re implemented in actual business logic.
These design practices are helpful for our future selves and for our teammates, because they help us keep our code well-maintained and easily extendable. How do these practices help external audiences, or audiences that aren’t as technical, understand our code? When designing a new API or making impactful changes to an existing API, a Documentation-Driven Design approach can be helpful in guiding the design decisions you make, too.
How a team at PayPal built a tool to define, manage, control, and deploy rulesets for the data quality framework.
Deploying large-scale fraud detection machine learning models at PayPal.
When it comes to the prediction of continuous variables, the first thing that comes to our mind is always the regression model. For instance, linear regression is the most commonly used regression model, and it has the benefits of simple implementation and high interpretability. On the other hand, random forest regression can handle missing data and is adaptive to interactions and nonlinearity. While these algorithms all work well for continuous target variables in different scenarios analytically, they provide less information on the predicted numbers’ confidence level, especially in real-world applications.
In this article, we will explore an unconventional framework to predict continuous variables with given confidence scores. Instead of framing the prediction as a regression problem, we twist the problem into a classification problem. This framework also allows us to have more visibility on the predicted results and can be adjustable to different confidence levels. The article will use revenue estimation as an example. Given a variety of business attributes for many businesses, we will illustrate how we can predict the revenue for each business given a specific confidence level.
Many impactful business problems need to be translated into corresponding machine learning tasks. For example, we must be able to recognize fraudulent credit card transactions to prevent loss for both our merchants and consumers. Credit card fraud detection is often translated to a classification task in machine learning.
Recommending the right financial products to the right customers is another important problem. One way to tackle product recommendation is by computing a propensity score (i.e. the likelihood of adopting a product for a user). Effective product recommendations and promotional strategies are often built based on product propensity scores. Propensity modeling, which determines the best product-customer pairs, is often translated into a classification task.