Inside Salesforce’s Scalable Time Series Forecasting AI Platform

By Ahad Shoaib & Kyle Gilson.

Salesforce operates data centers worldwide, continuously monitoring infrastructure health metrics in real-time. Accurate demand forecasting is essential for provisioning infrastructure capacity. Insufficient capacity can lead to customer-impacting incidents, while excess capacity may cause budget overruns. Teams such as capacity planning, finance, and performance engineering depend on reliable forecasts to ensure cloud infrastructure scales effectively, maintaining high availability and cost efficiency.

In early 2023, the Infrastructure Data Science (InfraDS) team faced a challenge: expanding infrastructure health forecasting to cover all 100+ services at Salesforce, rather than the five critical services previously focused on. Drastically scaling the number of data scientists was clearly not the right answer. Instead, the team built a new configuration-driven Time Series Forecasting Platform designed to manage this increased scale.

As a result, the platform’s capabilities have grown from five to over 70 forecasting use cases, generating millions of time series forecasts daily. Moreover, the time required to deploy new models has decreased from weeks to days. This expansion illustrates how Salesforce has successfully scaled its time series AI platform to meet the demands of its multi-cloud billion-dollar infrastructure.

Forecasting at scale presents unique challenges due to the lack of a universal modeling approach. Each new use case compels data scientists to balance model accuracy, hierarchical coherence, awareness of concept drift, and resilience. For instance, stability becomes crucial in long-range forecasts, which often incorporate economic drivers as inputs. In contrast, short-range forecasts need to adapt swiftly to data drift and typically exclude much exogenous information.

Foundational models like Moirai and TimesFM offer flexible, zero-shot frameworks that handle diverse use cases, showing great promise for generic forecasting problems. However, new time series models will always emerge, and unlike complex models used in fields such as NLP and Computer Vision, simpler models like ARIMA, Prophet, and XGBoost remain essential in time series forecasting for their interpretability and low overhead. Managing a high volume of these simpler, personalized models — one for each dataset — poses its own set of challenges, making rapid iteration from local experiments to production imperative.

The InfraDS team is required to produce time series forecasts for a variety of time series, each with unique trend, seasonality, and change points.

Additionally, the tooling for Machine Learning development remains fragmented. The surge in new AI applications has highlighted the nascent state of ML development lifecycle tooling. Developing new models typically involves using bespoke Jupyter notebooks for experimentation, translating code to production quality, and employing various MLOps tools to monitor models in production.

While some groups at Salesforce successfully utilize managed tools like AWS Sagemaker, the specific architecture needs of the InfraDS team necessitated a custom solution. This involved maintaining forecasting models across multiple disparate codebases utilizing different technologies and languages such as Python and R, along with varied integration patterns. The plethora of implementation options resulted in prolonged durations — weeks or months — to transition new models to production, in addition to significant ongoing maintenance costs.

This process often felt like reinventing the wheel for each new forecasting scenario, raising questions about the efficiency of the approach, given that time series forecasting fundamentally involves ingesting data, applying mathematical models, and outputting results to a database.

Inspired by “human-centric” frameworks like Metaflow, the decision was made to standardize and abstract common data and compute infrastructure requirements for each forecasting project, while maintaining flexibility in algorithmic development.

By hiding engineering tooling behind a YAML interface, data scientists can focus on modeling, and platform maintainers ensure the infrastructure operates seamlessly. This approach provides a unified interface for the entire model lifecycle, including backtesting, distributed processing, and deployment, with built-in security and scalability.

InfraDS simplified its approach to managing models at scale by treating them as cattle instead of pets.

Consequently, the forecasting service was developed with key design principles:

Reproducible Environments: Transitioning a model from local to production—and vice versa—should be a straightforward, “one-click” operation to encourage rapid experimentation.
Configuration-as-Code: Infrastructure implementations such as network connectivity, compute scaling, and orchestration should be hidden from users, adhering to separation of concerns.
Autonomy: Data scientists should have the right tools to enable end-to-end ownership of their models, from ideation to production, with robust monitoring and troubleshooting capabilities.

Despite unique considerations in time series modeling — such as hierarchical reconciliation, uncertainty, and seasonality — these integrate well into the core abstractions without limiting data scientists to a single model type:

Data: Utilize SQL queries with Jinja templating for ingesting raw metrics and features.
Algorithms: Focus on feature engineering, such as removing anomalies, and adjusting model hyperparameters.
Post Processing: Set alert thresholds, for example, predicting a breach of 80% CPU usage within three months.
Model Evaluation: Track accuracy and forecast quality metrics for continuous Service Level Objective monitoring.
Orchestration: Manage scheduling dependencies and compute resources efficiently.

This structured yet flexible approach ensures effective and efficient forecasting services, catering to specific needs while empowering data scientists.

Data scientists can easily make isolated changes to the config file, like algorithm selection and forecast horizon.

Safety Guarantees: Ensuring robust production code quality and type safety within the forecasting engine was essential, as a single code change could impact over 70 different configurations. Utilizing modern Python tooling like Mypy and Pydantic, highly valued at Salesforce, established strong code safety guarantees with minimal performance impact. Adhering to the principle of “once it compiles, it’ll work first try,” numerous guardrails and validations were implemented for configuration files. Once a YAML file was accepted, the process became seamless.
Extensible Model Selection: The system was designed to easily incorporate new models or algorithms without the need for extensive re-architecting. Both custom models and various off-the-shelf models are supported, and the integration of external algorithms that adhere to a sklearn-like model.fit() and model.predict()interface was facilitated. With new model architectures emerging regularly, staying updated was crucial.
Flexible Compute Backends: Seamless integrations with multiple compute backends, such as Spark and Kubernetes, are enabled, which are vital for balancing model development velocity with production scalability. While a large Spark cluster may be overkill for experimenting with 10 time series, the flexibility to scale from one to N machines allowed data scientists to concentrate on refining models without the concerns of concurrency patterns and the complexities of the latest distributed computing frameworks.