Notifications are a key aspect of the Slack user experience. Users rely on timely notifications of mentions and DMs to keep on top of important information. Poor notification completeness erodes the trust of all Slack users.
Notifications flow through almost all the systems in our infrastructure. As illustrated in Figure 1 below, a notification request flows through the webapp (our application logic and web / Desktop client monorepo), job queue, push service, and several third-party services before hitting our iOS, Android, Desktop, or web clients.
Since 2017, our notification workflow has only grown more complex, through the addition of new features like Huddles and Canvas. As a result, solving notification issues can lead to multi-day debugging sessions across several teams. Customer tickets related to notifications also had the lowest NPS scores and took the longest time to resolve compared to other customer issues.
Debugging notification issues within our systems was difficult because each system had a different logging pipeline and data format, making it necessary to look at data with different formats and backends. This process required deep technical expertise and took several days to complete. The context in which events were logged also varied across systems, prolonging any investigations. This resulted in a time-consuming process requiring expertise in all parts of the stack just to understand what happened.