Evolving Apache Phoenix: Overcoming 5 Challenges to Add Document Data Support

Written by Viraj Jasani and Kadir Ozdemir.

Adapting to new demands without compromising reliability is a significant challenge in the fast-paced world of data management. For Salesforce’s Big Data Storage (BDS) team, the mission was clear: transform Apache Phoenix, the backbone of many Salesforce applications, to handle flexible, document-style data while retaining its core strength as a high-scale SQL database.

Originally built on Apache HBase, Phoenix was designed for massive, structured datasets and was optimized for SQL-based querying at scale. However, as Salesforce applications grew to serve varied industries like e-commerce and healthcare, Phoenix’s rigid schema limitations began to surface. The BDS team embarked on a quest to enhance Phoenix’s adaptability by integrating document database capabilities, allowing it to handle complex, unstructured data without sacrificing performance or reliability.

This journey, however, was not without its challenges. The team had to navigate the complexities of merging structured and unstructured data handling while ensuring that the system remained robust and efficient. The goal was to create a versatile platform that could meet the evolving needs of diverse industries, all while maintaining the high standards of performance and reliability that Salesforce is known for.

Challenge 1: Balancing Evolution with Stability

The BDS team faced an initial and crucial challenge: ensuring that the new document database capabilities would not interfere with Phoenix’s existing functionality. Phoenix had a dependable SQL-based framework that Salesforce engineers relied on, and any enhancements needed to complement—not replace—these established relational features.

Introducing document-style flexibility meant supporting schema-less, nested data structures like JSON, which are fundamentally different from SQL’s structured tables and fixed schemas. The team had to ensure that both data models could coexist within Phoenix, allowing engineers to use SQL for structured data and document-oriented queries for flexible data without compromising either. This delicate balance required meticulous precision and a profound understanding of Phoenix’s architecture.

Challenge 2: Achieving Near-Perfect Availability

With any mission-critical database, stability is paramount. Phoenix operates around the clock for numerous high-scale Salesforce applications, which meant that the BDS team had to uphold Phoenix’s established availability benchmark of 99.99%—the “four 9s.” Introducing document support could potentially disrupt this reliability, as new features often bring unforeseen performance impacts or bugs.

Ensuring that Phoenix could maintain near-perfect availability with the added complexity of document data required rigorous testing and innovative fail-safes. For the BDS team, every enhancement had to undergo meticulous validation to uphold the promise of high availability. Any bug that could jeopardize Phoenix’s reliability was treated as a top priority, reinforcing the team’s commitment to delivering both innovation and stability.

Challenge 3: Ensuring Compatibility Across Versions

With the introduction of document data types, another significant hurdle emerged: ensuring compatibility with older Phoenix clients. Many Salesforce applications depended on the existing SQL functionalities, and any evolution had to account for this by making sure that new data types would be compatible with both new and legacy Phoenix clients.

This meant carefully managing how document expressions were serialized and deserialized across different versions of Phoenix. The team had to ensure that all clients—whether using new document features or existing SQL ones—could communicate seamlessly with Phoenix’s servers, regardless of the data type. Compatibility across versions became an engineering puzzle, requiring careful planning and backward-compatible changes to preserve Phoenix’s trusted stability.

Challenge 4: Handling Complex Data Types

To support document data, the BDS team had to make strategic decisions about which complex data types Phoenix would handle. JSON, Lists, Sets, and Maps were all necessary to meet Salesforce’s diverse data needs, but these data types are rarely used in traditional SQL environments. Choosing the right storage format was crucial to make Phoenix flexible enough for document data while remaining efficient.

The team ultimately selected BSON (Binary JSON) as the format for document storage. BSON allowed for selective retrieval of parts of a document without requiring the entire dataset to be deserialized. This choice helped maintain performance and manage Phoenix’s storage efficiently, even with complex data structures. Supporting new types like Maps and Sets pushed Phoenix’s capabilities into new territory, but the team’s careful selection ensured that these additions wouldn’t slow down existing SQL operations.

Challenge 5: Merging SQL and Document Syntax

The final challenge on the BDS team’s journey was designing a unified query system that could handle both SQL and document-oriented expressions. Unlike relational databases with standardized SQL, document databases typically use unique syntax for data manipulation, and blending these two paradigms required innovation.

To overcome this, the team needed to redesign Phoenix’s SQL parser to recognize and process document-style queries alongside SQL commands. They looked to other document databases, like MongoDB and Amazon DynamoDB, to understand document syntax patterns that could align well with Phoenix’s structure. This new parser had to balance the flexibility of document queries with the precision of SQL, allowing engineers to work with diverse data models in a single platform. Merging these distinct approaches to querying was a technical leap, and it required the team to rethink how Phoenix interpreted and processed data requests.

Achieving the Vision and Looking Ahead

Through meticulous engineering and a dedication to preserving Phoenix’s strengths, the BDS team successfully transformed Phoenix into a more versatile database platform. Phoenix now supports both relational and document data, providing Salesforce engineers with the flexibility to create applications with complex, adaptable data models while still leveraging SQL-based querying, multi-tenancy, and high availability.

However, the team’s journey is far from over. Salesforce is committed to open-source contributions, and the BDS team is preparing to release Phoenix 5.3.0, which will bring these document capabilities to the broader Apache Phoenix community. As Phoenix’s user base expands, the team is eager to learn from real-world applications, refine the system, and continue pushing the boundaries of what distributed databases can achieve.

In retrospect, this evolution of Phoenix exemplifies the challenges and rewards of balancing innovation with reliability. For the BDS team, the journey has been a series of trials, each one testing and strengthening Phoenix’s capabilities. With document support now a reality, Salesforce has expanded Phoenix’s potential to serve as a unified, high-scale platform that meets the demands of modern applications, marking a new chapter in the ongoing journey of Apache Phoenix.

Supporting document use cases as a SQL database