Salesforce 如何实现可靠、低延迟的 AI 推理

In our “Engineering Energizers” Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we meet Nilesh Salpe, a key engineer developing Salesforce’s AI Metadata Service (AIMS), which provides tenant-specific configuration for AI inferences across Salesforce for AI applications like Agentforce via AI Gateway service (a centralized service to serve AI requests).

在我们的“Engineering Energizers”问答系列中，我们聚焦推动 Salesforce 创新的工程人才。今天，我们见到了 Nilesh Salpe，这位关键工程师正在开发 Salesforce 的 AI Metadata Service (AIMS)，该服务通过 AI Gateway 服务（一个集中式 AI 请求服务）为 Salesforce 的 AI 应用（如 Agentforce）提供租户特定的 AI 推理配置。

Discover how Nilesh’s team designed a multi-layered caching system to eliminate a 400ms latency bottleneck, protected AI inference from backend outages, and balanced data freshness with resilience, achieving sub-millisecond performance without impacting downstream client teams.

了解 Nilesh 的团队如何设计多层缓存系统，消除 400 ms 延迟瓶颈，在后端故障时保护 AI 推理，并在数据新鲜度与韧性之间取得平衡，实现亚毫秒级性能，且不影响下游客户端团队。

The team delivers highly available, scalable infrastructure to power AI workflows. This includes services like model management, orchestration, and the AIMS, which stores tenant-specific configuration for every AI inference. Given Salesforce’s multi-cloud, multi-tenant architecture, each tenant might use different providers, such as OpenAI (ChatGPT) or Salesforce-hosted internal models, with unique tuning parameters. These details are crucial for every inference call.

团队提供高可用、可扩展的基础设施，为 AI 工作流提供动力。这包括模型管理、编排以及 AIMS 等服务，AIMS 存储每个 AI 推理的租户专属配置。鉴于 Salesforce 的多云、多租户架构，每个租户可能使用不同的提供商，例如 OpenAI（ChatGPT）或 Salesforce 托管的内部模型，并带有独特的调优参数。这些细节对每一次推理调用都至关重要。

All AI (including Agentforce) traffic passes via AI Gateway service through the AIMS to fetch the necessary metadata. Without this service, these systems would not know how to route requests or apply tenant-specific context. Although these configurations rarely change, they are essential for every call, making the service a critical dependency in the AI stack. The service’s latency and availability ...