Fact Evaluation in Millions: Scalable Rule Executor Service

The Rule Executor Service at Myntra stands out as a pivotal tool in the domain of rule engine. While, at a glance, one might wonder what sets it apart from the myriad of rule engines available today, the difference is profound. In Myntra ecosystem, supporting numerous pricing use cases necessitated building a capability that is engineered to handle vast volumes of rules, processing millions of facts against similar scale in terms of rules for multiple tenants with high reliability, performance, availability and consistency. In order for the system to be performant and use as a broker for all Myntra systems preference was given to scalability, and ability to persist and evaluate large large number of rules with low latency evaluation window.

Let’s first delve into the functional and non-functional requirements that necessitate the creation of the Rule Executor Service.

Functional Requirements

Dynamic Rule Management: The rule engine should be designed with adaptability as a core principle, enabling administrators or users to effortlessly update, introduce, or dynamically remove rules through the user interface as the landscape evolves.
Immediate Rule Reflection: Following any changes to the rule set, the rule engine should ensure that all future fact evaluations reflect the latest rule standards with high degree of consistency. Once the creation, updating, or deletion of a rule has been confirmed as successful for the user, no subsequent fact evaluations should be conducted with the outdated rules from that timestamp onwards. This necessitates the updates to be immediately reflected in the knowledge base of the rule engine.
Fact Assessment Details: Each evaluated fact should be supplemented with essential details, including the associated rule’s ID, name, and description against which it was evaluated. In the absence of configured rules for the provided fact, enrich the fact by incorporating pertinent details regarding the absence of associated rules.

Non-Functional Requirements

Faster Rule Evaluation: The millions of facts should be evaluated against millions of configured rules within 30 seconds.

Let’s look into the rule engines that has been considered to meet our requirements.

Drools

Drools is indeed a very powerful rule engine that supports forward and backward chaining and uses an enhanced implementation of the Rete algorithm for rule evaluation. However, we encountered the following challenges with Drools:

Memory Consumption and Rule Storage: Drools maintains its rules in-memory. While this can provide fast rule processing, it poses a significant challenge when dealing with a large volume of rules. To illustrate, our preliminary testing with Drools demonstrated that storing ~300K rules consumed 60 GB of RAM. This demonstrated limitations of drools in supporting millions of rules.
Dynamic Rule Updation Challenges: Adapting to changing business requirements means updating or deleting existing rules with a high degree of consistency. In the Drools framework, such modifications necessitate a complete rebuild of our knowledge base across horizontally scalable machines, which would be time-intensive and would lead to scaling limitations.
Horizontal Scaling and Sharding: One might consider circumventing the aforementioned memory constraints by horizontally scaling and sharding the rules across multiple servers. However, this introduces two pressing issues:
a) Cost Inefficiency: With the memory demands observed during our tests, sharding would mean deploying numerous servers, leading to escalating operational costs.
b) Consistency Maintenance: Distributing rules across servers necessitates ensuring data consistency. This becomes complex when considering that a single rule might be replicated on at least two servers for availability. Managing such availability and ensuring synchronization between the replicas can be challenging.

Given the considerable memory demands, hurdles in dynamic rule adaptation, and the intricacies of efficient scaling, Drools falls short of our specific needs, compelling us to evaluate other solutions

Easy Rules (Opted One)

Easy Rules is a simple yet powerful Java rules engine providing the following features:

POJO based development
Useful abstractions to define business rules and apply them easily
The ability to create composite rules from primitive ones
The ability to define rules using an Expression Language (Like MVEL, SpEL and JEXL)

Ref: https://github.com/j-easy/easy-rules/wiki

Our proof-of-concept with Easy Rules underscored a significant advantage: it is highly memory-efficient, allowing us to accommodate millions of rules on a single server. However, this benefit is somewhat offset by an evident drawback — When evaluating vast numbers of facts against an equally large set of rules, the rule evaluation latency was significantly high.

To address this, we conceptualized and developed an in-house rule engine called Rule Executor Service. While it integrates Easy Rules at its core for fact evaluations, it distinguishes itself by leveraging in-memory sharding, parallel processing, and horizontal scalability with a stateless nature. These enhancements ensure that rule evaluation aligns with our stringent requirements and service level agreements (SLAs).

Before we dive into the intricacies of the Rule Executor Service architecture, let’s take a closer look at use case onboarding. We’ll explore the various types of use cases that can be integrated into the system and the configurations required for seamless onboarding. This preliminary understanding will provide a solid foundation for comprehending the Rule Executor Service architecture more effectively.

Use Case Onboarding

In order to onboard any use case, following are the configs which are needed to be onboarded into the Rule Executor Service :-

{
"ruleCategoryToExecutionConfigMap": {
"rule_category_1": {
"priority": 1,
"shardConfigs": [
{
"shardKeys": [
"key1", "key2"
],
"maxEntryAllowedPerShard": 10
},
{
"shardKeys": [
"key3", "key4"
],
"maxEntryAllowedPerShard": 200
}
],
"ruleMatchingConfig": {
"whenExpression": "$attr1 == $$value1 && $attr2 <= $$value2",
"fieldToThenExpressionMap": {
"result1": "return $$attr1 + $$attr2",
"result2": "return $$attr3 * $$attr4"
}
}
},
"rule_category_2": {
"priority": 2,
"shardConfigs": [
{
"shardKeys": [
"key10"
],
"maxEntryAllowedPerShard": 5
}
],
"ruleMatchingConfig": {
"whenExpression": "$attr10 >= $$value10 && $attr11 == $$value11",
"fieldToThenExpressionMap": {
"result1": "return $$attr12 - $$attr11",
"result2": "return $$attr10 / $$attr11"
}
}
}
}
}

Rule Category: To set up configurations for each use case, it’s important to define the settings at the category level. This is essential because a use case may include rules from different categories, such as seller+product or seller+brand levels. Thus, specifying configurations at the category level accommodates these diverse rule scenarios.
Priority: Each rule category can be assigned a priority. The lower the number, the higher the priority. If the first rule category has a higher priority than the second rule category, the actions specified in the first rule category will take precedence over those in the second rule category.
ShardConfigs: ShardConfigs are essential for implementing internal sharding to expedite fact evaluation. Depending on the requirement, a use case can incorporate any number of sharding levels to improve fact evaluation latency. For each shard level, two components are crucial: shard keys and the maximum number of entries allowed per shard. The calculation for determining the number of containers required at a shard level involves dividing the total number of rules by the maximum entries allowed per shard. Once the maximum number of containers is calculated, shard keys assist in distributing facts and rules into multiple containers at that shard level through hashing.

Example: Let's consider a scenario where rules are configured at the seller+product level. Let's say we have 100 sellers, and each seller can have up to 10K rules. With this substantial volume of rules, we aim for parallel fact evaluation for each seller. Additionally, we want to evaluate facts at the product level with some parallelism. To meet this requirement, we can have two sharding levels: the first at the seller level and the second at the product level. The shard configuration will appear as follows :-

[ { "shardKeys": ["sellerId"], "maxEntryAllowedPerShard": 1 }, { "shardKeys": ["productId"], "maxEntryAllowedPerShard": 2000 }

]

The above configuration ensures that in a fact evaluation request with 10 unique sellers, there will be 100 containers in the first level of shard. This is because the maxEntryAllowedPerShard for the first shard level is set to one, ensuring parallel execution for each seller. It also ensures that if a seller has up to 10k rules, with maxEntryAllowedPerShard set as 2000 for the second level shard, 5 containers will be created for each seller. So, in total, there are 100 containers in the first level and 500

(100 * 5) containers in the second level of the shard.

RuleMatchingConfig: The RuleMatchingConfig is needed to define the schema for the ‘when’ (condition) and ‘then’ (action), which should be valid MVEL expressions. All the rules falling into the same rule category will follow the same schema. This provides a generic way to configure any type of rule for any use case. The ‘when’ condition is always singular, but multiple ‘then’ expressions are possible, signifying the execution of multiple actions whenever the ‘when’ condition is met.

Example: Let's consider the configuration of rules at the product level for validating the discounts. The rule states that if the discount is greater than 90% or less than 10%, the discount uploaded for that specific product should be rejected. To configure this rule, we can set up 'when' and 'then' expressions as follows :-

When Expression: $productId == $$productId && ($$td <= $tdMin ||
$$td >= $tdMax)

Then Expression: _return "rejected"_In this context, all rules belonging to the 'product' rule category will utilize the aforementioned 'when' and 'then' expressions. During runtime, the values of variables prefixed with '$' will be substituted by the values provided in the rule, while variables prefixed with '$$' will be substituted by the values from the facts. This approach offers a generic way to specify the 'when' and 'then' conditions for all rules

within a given category.

Supported Use Case Types

Below are the possible use case types supported by the Rule Executor Service :-

Repository Based Use Case

If the client wants both rule storage and fact evaluation to be managed by the Rule Executor Service, they can choose to onboard the use case as a repository-based use case. In this scenario, they need to onboard the rule storage schema and its CRUD handler, along with its use case config. The client should send the use case name and the list of facts as part of the request body, and the Rule Executor service will handle fetching the rules using the onboarded CRUD handler. Subsequently, it will perform the evaluation based on the onboarded use case configs.

Client Based Use Case

If the client wishes to manage rule storage in their own system and only requires rule evaluation to occur in the Rule Executor service, they can onboard the use case as a client use case. In this scenario, they only need to provide use case configs. The client should send the use case name, list of facts, and list of rules, and the Rule Executor service will conduct fact evaluation based on the onboarded configs.

Default Use Case

If the client does not want to onboard any use case and just wants fact evaluation to be done, they can send a list of facts, a list of rules, along with the ruleMatchingConfig as part of the request body. The Rule Executor service will perform fact evaluation based on the passed ruleMatchingConfig. This use case type is preferable only when the number of facts and rules is limited because no in-memory sharding is going to be utilised in this case during fact evaluation.

Let’s delve deeper into the architecture of Rule Executor Service, elucidating how it adeptly manages the colossal volume of facts and rules and consistently delivers evaluation results within a 30-second timeframe, even when dealing with tens of millions of facts and rules.

Rule Executor Service Architecture

Rule Executor Service comprises of the following components:

Rule Executor Controller: Its responsibility is to receive the incoming request from the client, parse the same, and then transfer the request to the fact evaluation request delegator.
Fact Evaluation Request Delegator: Its responsibility is to coordinate with the UseCaseConfigManager to retrieve the appropriate use case config for the current request, using the provided use case name. Following this, it transfers the incoming request, along with the use case config, to the relevant use case object, determined by the specified use case type in the config object.
Use Case Config Manager: Its responsibility is to maintain an in-memory map of use case names to their respective use case configs. It continually updates the configs of each onboarded use case using a pooling mechanism.
Use Case: The UseCase class serves as an abstract class containing common behaviors. It is extended by RepositoryBasedUseCase, ClientBasedUseCase, and DefaultUseCase. These subclasses implement and override specific behaviors tailored to their needs. For instance, RepositoryBasedUseCase includes validations ensuring that the passed use case name has a CRUD handler onboarded, while ClientBasedUseCase checks that the passed rules in the request object are not empty. Additionally, RepositoryBasedUseCase enriches rules in the current request using the CRUD handler specified for the given use case. Subsequently, use case object invokes the FactEvaluationService for the evaluation of the facts.
Fact Evaluation Service: It offers two overloaded methods for fact evaluation. One accepts arguments for rulesList, factsList, and shardConfigs, while the other takes arguments for rulesList and factsList only. The former is utilized by the RepositoryBasedUseCase and ClientBasedUseCase objects, which in turn call RuleCategoryExecutorService, while the latter is employed by the DefaultUseCase object, which in turn calls RuleEngineService for the evaluation of the facts.
Rule Category Executor Service: Its responsibility is to conduct fact evaluation at the rule category level. To achieve this, it initially divides the rules based on their categories and then concurrently invokes the ShardExecutorService for each rule category. This involves passing a list of rules belonging to that category along with the full list of facts. Once the evaluation is complete, the evaluated facts will contain results from each rule category evaluation. The final evaluated result is then computed based on the specified priority for each rule category.
Shard Executor Service: It starts by popping the first shard level config from the shard configs list, then proceeds to split facts and rules based on the configuration (shardKeys & maxEntryAllowedPerShard) specified in the popped shard level config. Shard keys are used to distribute the facts and rules into multiple containers, and the calculation for determining the number of containers needed at the current shard level involves dividing the total number of rules by the maximum entry allowed per shard for that level. We will delve into the in-memory sharding mechanism in-depth in the later section of this article.
If the current shard level is not the last in the shard configs, the process initiates a parallel recursive call to ShardExecutorService for each container, passing a specific list of facts and rules belonging to that container, along with the shard configs list (now with one less shard config, as the first shard config has been popped out). On the other hand, if it is the last shard level, the system initiates a parallel call to RuleEngineService for each container, again passing a specific list of facts and rules belonging to that container.
Rule Engine Service: Its responsibility is to invoke EasyRule by passing the provided facts and rules along with the configured rule engine settings, such as ‘skipOnFirstRuleMatched’, and then return the evaluated facts.

In-Memory Sharding

Let’s delve deeper into the implementation of in-memory sharding and explore how it has been employed to achieve faster fact evaluation.

In-Memory Sharding Illustration

Let’s consider the following scenario, which can be referred to in the above illustration as well:

A fact evaluation request with a total of 8000 rules comprises two types of rule categories: Seller+Product and Seller+Brand. The first category has 5000 rules, and the second category has 3000 rules. All these rules are associated with two sellers, each having an equal number of rules, which is 4000 each.
The first rule category involves two levels of sharding. The first level has a shard key as SellerId and a maxEntryAllowedPerShard of 1. The second level of sharding has a shard key as ProductID and a maxEntryAllowedPerShard of 1000.
The second rule category also features two levels of sharding. The first level has a shard key as SellerId and a maxEntryAllowedPerShard of 1. The second level of sharding has a shard key as BrandID and a maxEntryAllowedPerShard of 500.

The sharding execution will proceed as follows:

Firstly, the RuleCategoryExecutorService divides the rules into two subsets: one with rules of the first category and the other with rules of the second category. Then, it concurrently invokes the fact evaluation for both categories by passing the category-specific rules and the complete fact list in parallel requests. To enable this parallel execution, it utilizes an I/O thread pool executor created during application startup using Java’s executor framework.
The ShardExecutorService receives the request at the rule category level. Suppose a shard executor receives a request for the first category (seller+productID), comprising 5000 rules and 4000 facts. It then retrieves the first shard level config from the shard configs list and distributes the rules and facts across multiple containers. To achieve this, it calculates the number of containers required at the current shard level by dividing the total number of rules by maxEntryAllowedPerShard, resulting in 5000 (5000/1=5000) containers. To distribute the rules and facts among the containers, it calculates the hash code of sellerId (shard key) present in the rule or fact and then takes a modulo of the same from the calculated number of shards required at the current shard level. Since only two unique sellers are present in the request, there will be only two unique hash codes, and all the rules and facts will be divided into two containers, each containing 2500 rules and 2000 facts. Here, the maxEntryAllowedPerShard is intentionally kept as 1 to enable parallel execution for each seller.
As this is not the last sharding level, the service initiates a parallel recursive request to the ShardExecutorService for both containers, passing facts and rules belonging to their respective containers. To invoke parallel execution, it uses the same I/O thread pool executor.
The ShardExecutorService is going to receive the request at the sellerID level. It will repeat the same process, and in this case, the number of containers required at the current level is going to be 3 (2500/1000). Then, the rules and facts are distributed to each container by calculating the hash code of the productID (shard key) present in the rule or fact and then taking a modulo of the same with the calculated number of containers required at the current level. One possible distribution could be: the first container having 1000 rules and 800 facts, the second container also having 1000 rules and 800 facts, and the third container having 500 rules and 400 facts. Since this is the last sharding level, the service makes parallel calls to the RuleEngineService for each container. To invoke parallel execution, it utilizes a computation thread pool executor that has already been created using Java’s executor framework during application startup.

One might wonder why there is a need for two thread pools here (IO and Computation). The reason is that we have two different types of jobs.

The first type of job involves sharding and invoking parallel execution

for each shard, essentially waiting for each shard execution to complete.

This means that this job has a further dependency on its child jobs.

These child jobs will essentially be the second type of job (at the last
shard level) which involves fact evaluation and is not dependent on any
other job. If we maintain a single thread pool for both types of jobs, it is possible that all the threads in the thread pool are exhausted by the first type of job only (which is a dependent job), leaving no threads

available for the second type of job resulting in the deadlock. That is

why we maintain an IO thread pool for the first type of job and a
computation thread pool for the second type of job to avoid a deadlock.

In summary, the Rule Executor Service stands out as a high-performance rule engine, performant in handling tens of millions of datasets with speed and precision. Its unique features include dynamic rule management, immediate rule reflection, stateless nature and in-memory sharding for faster execution. The architecture and use case onboarding further showcase its flexibility and robustness in meeting diverse requirements. Overall, the Rule Executor Service proves to be a powerful tool for businesses managing and evaluating rules at scale.