Explore how big data processing successfully drove data migration and data activation for Salesforce Data Cloud.
If you’re paying attention to Salesforce technology, you’ve no doubt heard about Hyperforce, our new approach to deploying Salesforce on public cloud providers. Start with a look at Hyperforce’s architecture. There are many compelling reasons to move to Hyperforce, both for us and our customers. We’re excited to do it in the way that only Salesforce would — with trust, availability, and security at the forefront from day one. Building a unified infrastructure platform for Hyperforce meant relooking at our automation tools to scale our operations with a fresh lens.
Salesforce has been around for over two decades. In 1999, when the company was founded, if you wanted to run a public internet software service (Software as a Service, or SaaS), you first had to get some servers and hook them up to the internet. So we built a few tools to perform our releases and database maintenance operations using SSH. Fast forward to 2015, when Salesforce took a very early bet on Kubernetes (K8s) to help manage an extensive suite of microservices. We’re proudly using it today across product lines and business units. And with our transformation to Hyperforce, building and using cloud-native tools, security and process made the most sense.
To leverage the scale and agility of the world’s leading public cloud platforms, our Technology and Products team has worked together over the past few years to build a cloud-native task execution system to execute remote operational tasks at scale. Because we believe you may need to walk down this path, too, we’d like to share some challenges we faced and the solutions we identified.
Infrastructure and software failures will happen. We idolize four 9s (99.99%) availability. We know we need to optimize and improve Recovery-Time-Objective (RTO, the time it takes to restore service after a service disruption) and Recovery-Point-Objective (RPO, the acceptable data loss measured in time). But how can we actually deliver high availability for our customers?
One of the missions of Salesforce’s engineering team is to prevent and minimize service disruptions. To achieve high availability for our customers, we design cloud-native architectural solutions that enable resilience for infrastructure failures and faster resolutions for unplanned incidents. We adopt safe deployment for non-disruptive software updates and releases.
This post will share our architectural principles for high availability that we’ve learned over the years and are applying to the Salesforce Hyperforce platform.
At Salesforce, Trust is our number-one value, and it has its own special meaning to each part of the company. In our Technology, Marketing, & Products (TMP) organization, a big part of Trust is providing highly reliable Salesforce experiences to our customers, which can be challenging because of the scale of the Salesforce infrastructure, its range of tech stacks, and the many products that those tech stacks support. Because of that challenge — and because TMP must gauge reliability at both that high level (across products) and from a zoomed-in view (for individual services supporting those products) — agreeing on what “highly reliable” means and how to measure it is absolutely critical. So just as Salesforce employees refer to standardized branding guidelines to speak the same product language, we also need standardized service ownership guidelines to ensure that we’re speaking the same reliability language. This blog post is about the Salesforce journey to framing reliability in terms of service-level indicators (SLIs) and objectives (SLOs), which are often used in the enterprise software business to represent the true customer experience in a clear, quantitative, and actionable way.
在《创新者的窘境:大公司面对突破性技术时引发的失败》(The Innovator’s Dilemma：When New Technologies Cause Great Firm to Fail）一书里，克莱顿·克里斯坦森（Clayton M. Christensen）提出了经典的大公司失败模型：第一步：大公司的现有技术服务主流市场，有更好的性能，但只做极少的渐进改善。
We can control the way Spark partitions our data and us it to parallelize computations on our dataset.
At Salesforce, we operate thousands of services of various sizes: monolith and micro-services, both customer-facing and internal, across multiple substrates, i.e. first party and public cloud infrastructure. In our earlier blog “READS: Service Health Metrics,” we talked about the Service Level Objective (SLO) framework called READS that we developed at Salesforce to standardize SLO tracking for Salesforce services using a minimal set of indicators. Note that, at Salesforce, we consider SLO tracking for features as critical to our customers and our success as SLO tracking for the services that serve the features. So, from here on, when we use the word “service,” we mean both a service and feature. A natural question to ask in this context might be, how do we manage the SLO onboarding process for services at scale? How do we simplify the Developer Experience (DX) for service owners? What are some key takeaways from our approaches?
Salesforce的成功无法离开其底层平台Salesforce Platform的支持。而Salesforce Platform的核心是元数据驱动的多租户数据模型。Salesforce平台使用元数据来管理其内部使用的每一个逻辑数据库对象。
We mitigated complexity in our Salesforce Commerce APIs by introducing a correlation ID as the unique identifier of requests across systems.
Presenting Eflow, our ML flow management system. We share some of its fundamental principles that lend it unique scaling properties.
At Salesforce, we want to ensure all our experiences meet or exceed current WCAG (Web Content Accessibility Guidelines) standards. These accessibility guidelines, while comprehensive, can be overwhelming to absorb. They’re also open to interpretation. A group of engineers and designers across our accessibility and design systems teams partnered to evaluate the latest WCAG color contrast guidelines and streamline standards for our work. We wanted to fully integrate WCAG and hope our process can inspire your teams to do the same.
To leverage the scale and agility of the public cloud, we have built a new generation of infrastructure platform for Salesforce.
A framework prescribing the minimal set of indicators that every service needs to discern its health and performance.
By introducing a plug-in-like architecture, we could reshape how teams contribute native features to our mobile app.