How Generative AI Simplifies Data Analytics Formula Creation for Faster Insights

In our “Engineering Energizers” Q&A series, we delve into the innovative methods of engineering leaders who are reshaping their industries. Today, we highlight Monil Gandhi, whose team at Salesforce developed AI Calculated Fields. This cutting-edge tool uses AI to convert simple user utterances into complex calculated fields, streamlining the process on Salesforce’s semantic layer and enabling robust analytics on platforms like Tableau Einstein. This innovation makes it easier for data analysts and business users to gain valuable insights and make data-driven decisions.

Learn how Monil’s team tackled challenges such as achieving high accuracy with the non-deterministic nature of large language models and ensuring robust security for sensitive data.

Our mission is to build a highly extensible semantic layer that enables customers to create business entities and gain important insights for data-driven decisions. To achieve this, we provide a highly extensible platform with robust capabilities for defining business entities, including the ability to create calculated fields, which are formulas based on existing entities.

These calculated fields can sometimes be complex in nature. AI Calculated Fields automates the generation of formulas for tools like Tableau. For example, users can input high-level requests such as “Create a calculated field that calculates ROI for sales of ‘Technical’ products over each quarter,” and the system translates these into precise formula suggestions that the user can choose to accept or modify. Traditionally, creating such formulas required significant expertise, time, and extensive documentation review. With AI Calculated Fields, users can describe their needs in natural language, and the AI generates the formulas, yielding faster results. The feature integrates metadata and a user-friendly design, ensuring that even complex calculated fields can be generated with minimal input.

Achieving high accuracy in formula generation presented significant challenges due to the limitations of large language models (LLMs). These models often have restricted context length and produce variable outputs, leading to errors when user instructions are ambiguous or metadata is insufficient. To tackle these challenges, we implemented several strategies:

Enhanced Metadata Inputs: We augmented the system prompts and data with field descriptions, added comprehensive examples of expected formulas, and enriched semantic structures to clarify ambiguous user instructions and improve the AI’s interpretation.
Optimized Pre-Processing: We developed algorithms to standardize user inputs, eliminating inconsistencies and structuring data in ways that maximize AI model performance.
Validation Processes: We integrated multi-step validation to detect anomalies in outputs and automatically suggest corrections, reducing the need for manual debugging by end users.

These measures, combined with iterative testing and feedback loops, have significantly improved the system’s accuracy and ensured it meets enterprise-grade requirements.

Calculated Field generation and fine-tuning using LLMs.

With sensitive enterprise data at stake, ensuring robust security was a significant challenge. Unauthorized access and data exposure risks required innovative solutions. To address these, the team implemented several security measures:

LLM Gateway: Built a secure intermediary that anonymizes sensitive data before processing, ensuring that personally identifiable information (PII) is stripped while maintaining the context needed for accurate formula generation.
Encryption Protocols: Applied advanced encryption standards (AES-256) to secure data both at rest and in transit, meeting compliance standards like GDPR and CCPA.
Role-Based Access Controls: Established fine-grained permission models that limit access based on user roles, ensuring that sensitive data is only visible to authorized personnel.
Real-Time Monitoring Tools: Deployed anomaly detection systems that flag unusual access patterns or usage behaviors, allowing immediate intervention and minimizing the risk of breaches.

Balancing deployment speed with rigorous testing required a strategic approach. The team focused on iterative development cycles, incorporating extensive error log analysis and automated test suites into the process. These suites simulate real-world scenarios to identify edge cases that might otherwise go unnoticed.

Troubleshooting and feedback collection were also integral to the process. Internal users provided critical insights into real-world use cases, enabling the team to identify and address potential gaps. Structured sessions, beta programs, and error log analysis ensured that user feedback was actionable and prioritized for improvement. Once recurring issues were addressed, validation tests confirmed system resilience. This collaborative approach not only enhanced system reliability but also ensured that enterprise-grade standards were met.

The team has planned a phased rollout, which began with a pilot phase in October 2024, leading to general availability by mid-2025. These milestones ensure thorough testing and optimization, allowing the team to address real-world use cases and refine features based on user feedback.

The team implemented a rigorous testing framework to evaluate challenges across all dependent modules and identify potential conflicts. Automated testing pipelines run comprehensive scenarios to ensure compatibility and stability after updates. These pipelines include regression tests that verify existing functionality remains unaffected by new enhancements.

Scenario analysis played a significant role in anticipating risks associated with new features. By modeling complex workflows, the team assessed how updates might impact interconnected components, identifying potential issues before deployment. While scenario analysis focuses on anticipating risks, continuous monitoring tools ensure real-time detection of unexpected issues post-deployment. These tools provide valuable insights into system health, enabling the team to address emerging problems quickly.

By combining regression testing, scenario analysis, and continuous monitoring, the team ensures that new features integrate seamlessly without affecting existing functionality. This comprehensive strategy maintains system stability and reliability as the product evolves.

Future R&D efforts focus on enhancing conversational capabilities and refining iterative workflows. The team is exploring natural language processing (NLP) advancements that allow users to provide feedback directly to the system for refinement. For example, users could request adjustments to outputs dynamically, enabling more precise control over formula generation.

Another focus area is embedding business-specific rules into the AI model. This involves training the system to understand industry-specific contexts and constraints, ensuring tailored outputs for diverse use cases. Metadata-driven enhancements remain a priority, with efforts directed at enriching semantic models to improve accuracy further. By aligning these developments with evolving user needs, the team aims to keep AI Calculated Fields at the forefront of generative AI innovation.