How Optimizing Memory Management with LMDB Boosted Performance on Our API Service

Angel Vargas | Software Engineer, API Platform; Swati Kumar | Software Engineer, API Platform; Chris Bunting | Engineering Manager, API Platform

The inside of the Pinterest lobby in Mexico City, showing a patterned ceiling, a receptionist deck with a plant on it, a light above it, and a gallery of images of pins you’d find on Pinterest, behind it. To the left, a glowing Pinterest P sign hovers in front of a glass wall.

NGAPI, the API platform for serving all first party client API requests, requires optimized system performance to ensure a high success rate of requests and allow for maximum efficiency to provide Pinners worldwide with engaging content. Recently, our team made a significant improvement in handling memory pressure to our API service by implementing a Lightning Memory-Mapped Database (LMDB) to streamline memory management and enhance the overall efficiency of our fleet of hosts. To handle parallelism, NGAPI relies on a multi-process architecture with gevent for per-process concurrency. However, at Pinterest scale, this can cause an increase in memory pressure, leading to efficiency bottlenecks. Moving to LMDB reduced our memory usage by 4.5%, an increase of 4.5 GB per host, which allowed us to increase the number of processes running on each host from 64 to 66, resulting in a greater number of requests each host could handle and better CPU utilization, thus reducing our overall fleet size. [1] The result? More happy Pinners, per host!

In a multi-process architecture, one of the crucial factors limiting the number of processes that can run on each host is the amount of memory used per process, as the total host memory is limited. One of the biggest uses of memory is configuration data loaded per process. In order to load and manage this data, we use what we refer to as configuration-managed data (lists, sets, and hashmaps), utilized for personalizing user experiences, enhancing content curation, and ad targeting and having this data in memory helps in keeping latencies low.

In our previous architecture, the configuration-managed data JSON formatted files were distributed to each NGAPI host using Zookeeper, and each process loaded its own copy of the configuration-managed data into Python structures using local memory. Our objective was to switch from per-process in memory configuration-managed data to a single copy of the configuration-managed data per host to reduce memory pressure. We also wanted to ensure minimal impact on read latency of these configurations and to preserve the existing interface to read this configuration data to avoid a huge migration of our code base.

The data updates to these configuration files were distributed by Zookeeper using a client sidecar to replace the existing files on the host. Each Python process had a watcher watching each of the configuration files. When a file was updated, the watcher would load its contents in memory. In designing our solution, we needed to accommodate the possibility of data updates occurring at any time. Additionally, our design had to effectively manage any JSON-compatible structures, mapped as Python lists, sets, and dictionaries, as determined by certain parameters within the code.

When evaluating options to achieve our goal of reducing memory pressure, we explored three separate mmap based solutions: Marisa Trie, Keyvi, and LMDB. We compared Marisa Trie, Keyvi, and LMDB based on their memory usage, read latency, indexed file size, and time taken to create and update the indexed file. We found that LMDB was the most complete solution, as it allows for updating the generated file in a transaction without having to create a new version, allowing to keep the read connections asynchronous and alive, which was high on our priority list for this technology. Marisa Trie was a viable second option, as it supports lists (which LMDB does not), but we determined that that wasn’t enough for us to pursue that as an option.

A before and after picture of an API gateway talking to a Graviton host. The before shows each Python process creating and reading its own configuration-managed data, while the after shows a single process creating and reading a Lightning Memory-Mapped Database configuration-managed data, and other processes reading from that process.

LMDB is an embedded key-value data storage library based on memory-mapped files. For each configuration-managed data, we created a local database which will be a shared mmap by each NGAPI process. By creating a single instance of the configuration-managed data structures for all processes to read from, we significantly reduced the memory footprint of each process.

To ensure the LMDB data remained up to date, we developed a lightweight Python sidecar that consists of producers that monitor the JSON files for changes and consumers that update the corresponding LMDB database, using appropriate serialization and formatting techniques. We executed these updates within sub-processes to promptly reclaim memory resources as JSON serialization and deserialization of large files use a lot of memory.

In the API processes, we maintain persistent read-only connections, allowing LMDB to paginate data present in virtual shared memory efficiently. We used OO design to support various deserialization methods to mimic Python lists, sets, and dictionaries, using LMDB’s byte-based key-value records. Notably, the top 50 configuration-managed data accounted for over 90% of duplicate memory consumption, so we started by migrating the top 50 structures one by one using feature flags, metrics and logs to facilitate a smooth and traceable transition process, ensuring minimal disruption. The limitations on how many configuration files can be added comes from the time taken by the sidecar to process all the JSON configuration files into LMDB as this has to be done before the processes can start taking traffic. We did not see any significant increases to startup time by adding our biggest 50 configuration-managed data files.

The results were immediately noticeable — the memory usage on the hosts decreased by 4.5%, and we were able to add more processes running instances of our application code.

A graph showing the approximate memory usage in MB, going down dramatically as we ship the described solution.

In conclusion, adopting LMDB for storing the configuration-managed data led to an increase in the number of processes per host that allowed our API service to handle a higher volume of requests per host, thus reducing our total host count and improving overall performance and stability. We achieved our goal without causing any side effects on system latency, as the time required for LMDB read operations closely matches that of native Python lookups. Furthermore, we were able to implement these changes without necessitating any code refactoring.

Embracing strategic optimizations and understanding the intricacies of the tools we use are vital in staying ahead in the competitive landscape of technology, and we invite the community to use memory-mapped solutions to reduce their memory footprint or address memory bottlenecks and use their compute power efficiently.

Acknowledgment

Thanks to those who worked and consulted on the project including the authors of this post, Pablo Macias, Steve Rice and Mark Cerqueira.

[1] Pinterest Internal Data, 2024