The SPACE Developer Productivity

如果无法正常显示，请先停止浏览器的去广告插件。

1. 1 of 29 productivity The of SPACE Developer Productivity NICOLE FORSGREN, Github MARGARET-ANNE STOREY, University Of Victoria CHANDRA MADDILA, THOMAS ZIMMERMANN, BRIAN HOUCK, AND JENNA BUTLER, Microsoft Research There’s more to it than you think. D eveloper productivity is complex and nuanced, with important implications for software development teams. A clear understanding of defining, measuring, and predicting developer productivity could provide organizations, managers, and developers with the ability to make higher- quality software—and make it more efficiently. Developer productivity has been studied extensively. Unfortunately, after decades of research and practical development experience, knowing how to measure productivity or even define developer productivity has remained elusive, while myths about the topic are common. Far too often teams or managers attempt to measure developer productivity with simple metrics, attempting to capture it all with “one metric that matters.” One important measure of productivity is personal perception; 1 this may resonate with those who claim to be in “a flow” on productive days. There is also agreement that developer productivity is necessary not just to improve engineering outcomes, but also to ensure the well-being and satisfaction of developers, as productivity and satisfaction are intricately connected. 12,20 acmqueue | january-february 2021 1 TEXT ONLY

2. productivity T he most important takeaway from exposing these myths is that productivity cannot be re- duced to a single dimension (or metric!). 2 of 29 Ensuring the efficient development of software systems and the well-being of developers has never been more important as the Covid-19 pandemic has forced the majority of software developers worldwide to work from home, 17 disconnecting developers and managers from their usual workplaces and teams. Although this was unexpected and unfortunate, this change constitutes a rare “natural experiment” that statisticians can capitalize upon to study, compare, and understand developer productivity across many different contexts. This forced disruption and the future transition to hybrid remote/colocated work expedites the need to understand developer productivity and well-being, with wide agreement that doing so in an efficient and fair way is critical. This article explicates several common myths and misconceptions about developer productivity. The most important takeaway from exposing these myths is that productivity cannot be reduced to a single dimension (or metric!). The prevalence of these myths and the need to bust them motivated our work to develop a practical multidimensional framework, because only by examining a constellation of metrics in tension can we understand and influence developer productivity. This framework, called SPACE, captures the most important dimensions of developer productivity: satisfaction and well-being; performance; activity; communication and collaboration; and efficiency and flow. By recognizing and measuring productivity with more than just a single dimension, teams and organizations can better understand how people and teams work, and they can make better decisions. acmqueue | january-february 2021 2

3. 3 of 29 productivity The article demonstrates how this framework can be used to understand productivity in practice and why using it will help teams better understand developer productivity, create better measures to inform their work and teams, and may positively impact engineering outcomes and developer well-being. MYTHS AND MISCONCEPTIONS ABOUT DEVELOPER PRODUCTIVITY A number of myths about developer productivity have accumulated over the years. Awareness of these misconceptions leads to a better understanding of measuring productivity. MYTH: Productivity is all about developer activity This is one of the most common myths, and it can cause undesirable outcomes and developer dissatisfaction. Sometimes, higher volumes of activity appear for various reasons: working longer hours may signal developers having to “brute-force” work to overcome bad systems or poor planning to meet a predefined release schedule. On the other hand, increased activity may reflect better engineering systems, providing developers with the tools they need to do their jobs effectively, or better collaboration and communication with team members in unblocking their changes and code reviews. Activity metrics alone do not reveal which of these is the case, so they should never be used in isolation either to reward or to penalize developers. Even straightforward metrics such as number of pull requests, commits, or code reviews are prone to errors because of gaps in data acmqueue | january-february 2021 3

4. productivity 4 of 29 and measurement errors, and systems that report these metrics will miss the benefits of collaboration seen in peer programming or brainstorming. Finally, developers often flex their hours to meet deadlines, making certain activity measures difficult to rely on in assessing productivity. MYTH: Productivity is only about individual performance While individual performance is important, contributing to the success of the team is also critical to measuring productivity. Measures of performance that balance the developer, team, and organization are important. Similar to team sports, success is judged both by a player’s personal performance as well as the success of their team. A developer who optimizes only for their own personal productivity may hurt the productivity of the team. More team-focused activities such as code reviews, on-call rotations, and developing and managing engineering systems help maintain the quality of the code base and the product/service. Finding the right balance in optimizing for individual, team, and organizational productivity, as well as understanding possible tradeoffs, is key. MYTH: One productivity metric can tell us everything One common myth about developer productivity is that it produces a universal metric, and that this “one metric that matters” can be used to score teams on their overall work and to compare teams across an organization and even an industry. This isn’t true. Productivity represents several important dimensions of work and is greatly influenced by the context in which the work is done. acmqueue | january-february 2021 4

5. productivity 5 of 29 MYTH: Productivity measures are useful only for managers Developers often say that productivity measures aren’t useful. This may come from the misuse of measures by leaders or managers, and it’s true that when productivity is poorly measured and implemented, it can lead to inappropriate usage in organizations. It’s disappointing that productivity has been co-opted this way, but it’s important to note that developers have found value in tracking their own productivity—both for personal reasons and for communicating with others. By remembering that developer productivity is personal, 7 developers can leverage it to gain insights into their work so they can take control of their time, energy, and days. For example, research has shown that high productivity is highly correlated with feeling satisfied and happy with work. 12,20 Finding ways to improve productivity is also about finding ways to introduce more joy, and decrease frustration, in a developer’s day. MYTH: Productivity is only about engineering systems and developer tools While developer tools and workflows have a large impact on developer productivity, human factors such as environment and work culture have substantial impact too. Often the critical work needed to keep the environment and culture healthy can be “invisible” to many members of the organization or to metrics traditionally used for measuring productivity. Work such as morale building, mentoring, and knowledge sharing are all critical to supporting a productive work environment and yet are acmqueue | january-february 2021 5

6. 6 of 29 productivity often not measured. The “invisible” work that benefits the overall productivity of the team is just as important as other more commonly-measured dimensions. 21 SPACE: A FRAMEWORK FOR UNDERSTANDING DEVELOPER PRODUCTIVITY P roductivity and satis- faction are correlated, and it is possible that satisfaction could serve as a leading indicator for productivity. Productivity is about more than the individual or the engineering systems; it cannot be measured by a single metric or activity data alone; and it isn’t something that only managers care about. The SPACE framework was developed to capture different dimensions of productivity because without it, the myths just presented will persist. The framework provides a way to think rationally about productivity in a much bigger space and to choose metrics carefully in a way that reveals not only what those metrics mean, but also what their limitations are if used alone or in the wrong context. Satisfaction and well-being Satisfaction is how fulfilled developers feel with their work, team, tools, or culture; well-being is how healthy and happy they are, and how their work impacts it. Measuring satisfaction and well-being can be beneficial for understanding productivity 20 and perhaps even for predicting it. 15 For example, productivity and satisfaction are correlated, and it is possible that satisfaction could serve as a leading indicator for productivity; a decline in satisfaction and engagement could signal upcoming burnout and reduced productivity. 13 For example, when many places shifted to mandatory work from home during the pandemic, an uptick occurred acmqueue | january-february 2021 6

7. productivity 7 of 29 in some measures of productivity (e.g., code commits and speed to merge pull requests). 8 Qualitative data, however, has shown that some people were struggling with their well-being. 3 This highlights the importance of balanced measures that capture several aspects of productivity: While some activity measures looked positive, additional measures of satisfaction painted a more holistic picture, showing that productivity is personal, and some developers were approaching burnout. To combat this, some software groups in large organizations implemented “mental health” days—essentially, free days off to help people avoid burnout and improve well-being. It is clear that satisfaction and well-being are important dimensions of productivity. These qualities are often best captured with surveys. To assess the satisfaction dimension, you might measure the following: 3 E mployee satisfaction. The degree of satisfaction among employees, and whether they would recommend their team to others. 3 D eveloper efficacy. Whether developers have the tools and resources they need to get their work done. 3 B urnout. Exhaustion caused by excessive and prolonged workplace stress. Performance Performance is the outcome of a system or process. The performance of software developers is hard to quantify, because it can be difficult to tie individual contributions directly to product outcomes. A developer who produces a large amount of code may not be producing high- quality code. High-quality code may not deliver customer acmqueue | january-february 2021 7

8. productivity I t is almost impossible to comprehen- sively measure and quantify all the facets of developer activity across engineering systems and environments. 8 of 29 value. Features that delight customers may not always result in positive business outcomes. Even if a particular developer’s contribution can be tied to business outcomes, it is not always a reflection of performance since the developer may have been assigned a less impactful task, instead of having agency to choose more impactful work. Furthermore, software is often the sum of many developers’ contributions, exacerbating the difficulty in evaluating the performance of any individual developer. In many companies and organizations, software is written by teams, not individuals. For these reasons, performance is often best evaluated as outcomes instead of output. The most simplified view of software developer performance could be, Did the code written by the developer reliably do what it was supposed to do? Example metrics to capture the performance dimension include: 3 Q uality. Reliability, absence of bugs, ongoing service health. 3 I mpact. Customer satisfaction, customer adoption and retention, feature usage, cost reduction. Activity Activity is a count of actions or outputs completed in the course of performing work. Developer activity, if measured correctly, can provide valuable but limited insights about developer productivity, engineering systems, and team efficiency. Because of the complex and diverse activities that developers perform, their activity is not easy to measure or quantify. In fact, it is almost impossible to comprehensively measure and quantify all the facets acmqueue | january-february 2021 8

9. productivity 9 of 29 of developer activity across engineering systems and environments. A well-designed engineering system, however, will help in capturing activity metrics along different phases of the software development life cycle and quantify developer activity at scale. Some of the developer activities that can be measured and quantified relatively easily are: 3 D esign and coding. Volume or count of design documents and specs, work items, pull requests, commits, and code reviews. 3 C ontinuous integration and deployment. Count of build, test, deployment/release, and infrastructure utilization. 3 O perational activity. Count or volume of incidents/ issues and distribution based on their severities, on-call participation, and incident mitigation. These metrics can be used as waypoints to measure some tractable developer activities, but they should never be used in isolation to make decisions about individual or team productivity because of their known limitations. They serve as templates to start with and should be customized based on organizational needs and development environments. As mentioned earlier, many activities that are essential to developing software are intractable (such as attending team meetings, participating in brainstorming, helping other team members when they encounter issues, and providing architectural guidance, to name a few). Communication and collaboration Communication and collaboration capture how people and teams communicate and work together. Software development is a collaborative and creative task that relies acmqueue | january-february 2021 9

10. productivity 10 of 29 on extensive and effective communication, coordination, and collaboration within and between teams. 11 Effective teams that successfully contribute to and integrate each other’s work efficiently rely on high transparency 5 and awareness 6 of team member activities and task priorities. In addition, how information flows within and across teams impacts the availability and discoverability of documentation that is needed for the effective alignment and integration of work. Teams that are diverse and inclusive are higher performing. 22 More effective teams work on the right problems, are more likely to be successful at brainstorming new ideas, and will choose better solutions from all the alternatives. Work that contributes to a team’s outcomes or supports another team member’s productivity may come at the expense of an individual’s productivity and their own ability to get into a state of flow, potentially reducing motivation and satisfaction. Effective collaboration, however, can drive down the need for some individual activities (e.g., unnecessary code reviews and rework), improve system performance (faster pull request merges may improve quality by avoiding bugs), and help sustain productivity and avoid (or conversely, if not done right, increase) burnout. Understanding and measuring team productivity and team member expectations are, however, complicated because of items that are difficult to measure such as invisible work 21 and articulation work for coordinating and planning team tasks. 18 That said, the following are examples of metrics that may be used as proxies to measure communication, collaboration, and coordination: 3 Discoverability of documentation and expertise. acmqueue | january-february 2021 10

11. 11 of 29 productivity 3 How quickly work is integrated. 3 Quality of reviews of work contributed by team members. 3 Network metrics that show who is connected to whom and how. 3 Onboarding time for and experience of new members. T he concep- tualization of produc- tivity is echoed by many developers when they talk about “getting into the flow” when doing their work. Efficiency and flow Finally, efficiency and flow capture the ability to complete work or make progress on it with minimal interruptions or delays, whether individually or through a system. This can include how well activities within and across teams are orchestrated and whether continuous progress is being made. Some research associates productivity with the ability to get complex tasks done with minimal distractions or interruptions. 2 This conceptualization of productivity is echoed by many developers when they talk about “getting into the flow” when doing their work—or the difficulty in finding and optimizing for it, with many books and discussions addressing how this positive state can be achieved in a controlled way. 4 For individual efficiency (flow), it’s important to set boundaries to get productive and stay productive—for example, by blocking off time for a focus period. Individual efficiency is often measured by uninterrupted focus time or the time within value-creating apps (e.g., the time a developer spends in the integrated development environment is likely to be considered “productive” time). At the team and system level, efficiency is related to value-stream mapping, which captures the steps needed to take software from idea and creation to delivering it to acmqueue | january-february 2021 11

12. productivity E fficiency is related to all the SPACE dimensions. 12 of 29 the end customer. To optimize the flow in the value stream, it is important to minimize delays and handoffs. The DORA (DevOps Research and Assessment) framework introduced several metrics to monitor flow within teams 9 —for example, deployment frequency measures how often an organization successfully releases to production, and lead time for changes measures the amount of time it takes a commit to get into production. In addition to the flow of changes through the system, the flow of knowledge and information is important. Certain aspects of efficiency and flow may be hard to measure, but it is often possible to spot and remove inefficiencies in the value stream. Activities that produce no value for the customer or user are often referred to as software development waste 19 —for example, duplicated work, rework because the work was not done correctly, or time-consuming rote activities. Some example metrics to capture the efficiency and flow dimension are: 3 Number of handoffs in a process; number of handoffs across different teams in a process. 3 Perceived ability to stay in flow and complete work. 3 Interruptions: quantity, timing, how spaced, impact on development work and flow. 3 Time measures through a system: total time, value- added time, wait time. Efficiency is related to all the SPACE dimensions. Efficiency at the individual, team, and system levels has been found to be positively associated with increased satisfaction. Higher efficiency, however, may also negatively affect other factors. For example, maximizing acmqueue | january-february 2021 12

13. 13 of 29 productivity flow and speed may decrease the quality of the system and increase the number of bugs visible to customers (performance). Optimizing for individual efficiency by reducing interruptions may decrease the ability to collaborate, block others’ work, and reduce the ability of the team to brainstorm. FRAMEWORK IN ACTION To illustrate the SPACE framework, figure 1 lists concrete metrics that fall into each of the five dimensions. The figure provides examples of individual-, team- or group-, and system-level measures. Three brief discussions about these metrics follow: First, an example set of metrics concerning code review is shown to cover all dimensions of the SPACE framework, depending on how they are defined and proxied. Next, additional examples are provided for two select dimensions of the framework: activity, and efficiency and flow. The section closes with a discussion of how to use the framework: combining metrics for a holistic understanding of developer productivity, as well as cautions. The accompanying sidebar shows how the framework can be used for understanding productivity in incident management. Let’s begin with code review as an example scenario that presents a set of metrics that can cover all five dimensions of the SPACE framework, depending on how it is framed and which metric is used: 3 Satisfaction. Perceptual measures about code reviews can reveal whether developers view the work in a good or bad light—for example if they present learning, mentorship, or opportunities to shape the codebase. This is important, acmqueue | january-february 2021 13

14. 14 of 29 l- pp bei y, fo r ou tc productivity FIGURE 1: Example metrics Individual One person z Developer satisfaction z Retention † z Satisfaction with code reviews assigned z Perception of code reviews Team or Group People that work together z Developer System End-to- end work through a system (like a devel- opment pipeline) z Satisfaction satisfaction z Retention † z Code review velocity z Code review velocity z Story points shipped † z Number z Code review of code score (quality or reviews thoughtfulness) completed z PR merge times z Coding time z Quality of z # Commits meetings † z Lines of z Knowledge code † sharing, discoverability (quality of documentation) z Code review z # Story z PR merge times z Quality of z Code review meetings † z Knowledge sharing or discoverability (quality of documentation) z Handoffs z Knowledge z Code review points completed † z Code review z Frequency with velocity of deploy- engineering z Code review ments system (e.g., CI/ (acceptance CD pipeline) rate) z Customer satisfaction z Reliability (uptime) † sharing, discoverability (quality of documentation) timing z Produc- tivity perception z Lack of inter- ruptions timing timing z Velocity/ flow through the system Use these metrics with (even more) caution — they can proxy more things. acmqueue | january-february 2021 14

15. productivity 15 of 29 because the number of code reviews per developer may signal dissatisfaction if some developers feel they are consistently assigned a disproportionate amount of code reviews, leaving them with less time for other work. 3 Performance. Code-review velocity captures the speed of reviews; because this can reflect both how quickly an individual completes a review and the constraints of the team, it is both an individual- and a team-level metric. (For example, an individual could complete a review within an hour of being assigned, but a team could have a policy of leaving all reviews open for 24 hours to allow all team members to see the proposed changes.) 3 Activity. Number of code reviews completed is an individual metric capturing how many reviews have been completed in a given time frame, and contributes to the final product. 3 Communication and collaboration. Code reviews themselves are a way that developers collaborate through code, and a measure or score of the quality or thoughtfulness of code reviews is a great qualitative measure of collaboration and communication. 3 Efficiency and flow. Code review is important but can cause challenges if it interrupts workflow or if delays cause constraints in the system. Similarly, having to wait for a code review can delay a developer’s ability to continue working. Batching up code reviews so they don’t interrupt a developer’s coding time (which would impact individual measures), while also not causing delays in the throughput of the system (which impacts system measures), allows teams to deliver code efficiently (team- level measures). Therefore, measuring the effects of code- acmqueue | january-february 2021 15

16. 16 of 29 productivity review timing on the efficiency and flow of individuals, teams, and the system is important—this can be done through perceptual or telemetry measures that capture the time to complete reviews and the characteristics of interruptions (such as timing and frequency). Let’s examine the SPACE framework in more depth by looking further at the dimensions of (1) activity and (2) efficiency and flow. In this example, the activity measures are individual-level metrics: number of commits, coding time (total time spent or times of day), and number of code reviews completed. These best describe work that directly contributes to the final product, understanding that work patterns and behaviors are influenced by the teams and environments in which developers work. Efficiency and flow have a broader mix of metrics. Self- reported measures of productivity are best captured at the individual level: asking a developer whether the team is productive is subject to blind spots, while asking if that member felt productive or was able to complete work with minimal distractions is a useful signal. You can also measure the flow of work—whether code, documents, or other items—through a system, and capture metrics such as the time it takes or the number of handoffs, delays, and errors in the software delivery pipeline. These would constitute system-level metrics, because their values would capture the journey of the work item through the entire workflow, or system. HOW TO USE THE FRAMEWORK To measure developer productivity, teams and leaders (and even individuals) should capture several metrics acmqueue | january-february 2021 16

17. productivity 17 of 29 across multiple dimensions of the framework—at least three are recommended. For example, if you’re already measuring commits (an activity measure), don’t simply add the number of pull requests and coding time to your metrics dashboard, as these are both activity metrics. Adding these can help round out the way you capture the activity dimension of productivity, but to really understand productivity, add at least one metric from two different dimensions: perhaps perception of productivity and pull request merge time. Another recommendation is that at least one of the metrics include perceptual measures such as survey data. By including perceptions about people’s lived experiences, a more complete picture of productivity can be constructed. Many times, perceptual data may provide more accurate and complete information than what can be observed from instrumenting system behavior alone. 10 Including metrics from multiple dimensions and types of measurements often creates metrics in tension; this is by design, because a balanced view provides a truer picture of what is happening in your work and systems. This more balanced view should help to reinforce smarter decisions and tradeoffs among team members, who may otherwise understandably focus on one aspect of work to the detriment of the whole system. One example is story points, a metric commonly used in Agile development processes to assess team-level progress. If a team is rated only on story points, members will focus on optimizing their own points, to the detriment of completing potentially invisible work that is important to other developers’ progress and to the company if that acmqueue | january-february 2021 17

18. productivity 18 of 29 means collaborating with other teams or onboarding future developers. And if leaders measured progress using story points without asking developers about their ability to work quickly, they wouldn’t be able to identify if something wasn’t working and the team was doing workarounds and burning out, or if a new innovation was working particularly well and could be used to help other teams that may be struggling. This leads to an important point about metrics and their effect on teams and organizations: They signal what is important. One way to see indirectly what is important in an organization is to see what is measured, because that often communicates what is valued and influences the way people behave and react. For example, companies that care about employee health, well-being, and retention will likely include the satisfaction and well-being dimension in their productivity measures. As a corollary, adding to or removing metrics can nudge behavior, because that also communicates what is important. For example, a team where “productivity = lines of code” alone is very different from a team where “productivity = lines of code AND code review quality AND customer satisfaction.” In this case, you have kept a (problematic, but probably embedded) metric about productivity and output, but nudged perceptions about productivity in a direction that also values both teamwork (by valuing thoughtful code reviews) and the end user (by valuing customer satisfaction). Metrics shape behavior, so by adding and valuing just two metrics, you’ve helped shape a change in your team and organization. This is why it’s so important to be sure to acmqueue | january-february 2021 18

19. 19 of 29 productivity SPACE and SRE: The Framework in Incident Management The SPACE framework is relevant for SREs (site reliability engineers) and their work in IM (incident management). An incident occurs when a service is not available or is not performing as defined in the SLA (service- level agreement). An incident can be caused by network issues, infrastructure problems, hardware failures, code bugs, or configuration issues, to name a few. Based on the magnitude of the impact caused by an incident, it is typically assigned a severity level (sev-1 being the highest). An outage to the entire organization’s customer-facing systems is treated differently than a small subset of internal users experiencing a delay in their email delivery. Here are some of the common myths associated with IM: 3 Myth: Number of incidents resolved by an individual is all that matters. Like a lot of other activities in the SDLC (software development life cycle), IM is a team activity. A service that causes a lot of outages and takes more hours to restore reflects badly on the entire team that develops and maintains the service. More team- focused activities such as knowledge sharing, preparing troubleshooting guides to aid other 3 pull from multiple dimensions of the framework: it will lead to much better outcomes at both the team and system levels. In this example, as the teams continue to improve and iterate, they could exchange the activity metric lines of code for something like number of commits. WHAT TO WATCH FOR Having too many metrics may also lead to confusion and lower motivation; not all dimensions need to be included for the framework to be helpful. For example, if developers and teams are presented with an extensive list of metrics and improvement targets, meeting them may feel like an unattainable goal. With this in mind, note that a good measure of productivity consists of a handful of metrics across at least three dimensions; these can prompt a holistic view, and they can be sufficient to evoke improvement. acmqueue | january-february 2021 19

20. productivity team members, mentoring juniors and new members of the team, doing proper handoffs and assignment/re-assignments are important aspects of IM. 3 Myth: Looking at one metric in isolation will tell you everything. It’s important to understand the metrics in context: the number of incidents, how long they took to resolve— the volume and resolution times of sev-1 incidents compared with sev-4, and other factors relevant to understanding incidents and how to improve both the system and the team’s response. So, there is no “one metric that matters.” 3 Myth: Only management cares about incident volume and meeting SLAs. With the rise of DevOps, developers are also doing operations now. IM (a part of operations) can take away a significant chunk of developers’ time and energy if the volume and severity of the incidents are high. As important as it is to management and executives to guarantee SLAs and reduce incident volume and resolution times, it is equally important to the individual developers who are part of the IM process. 3 Myth: Effective IM is just about improving systems and tools. Better monitoring systems, ticketing systems, case-routing systems, log- analysis systems, etc. will help make developers productive. While tools, guides, and workflows have a large impact on productivity, the human 20 of 29 Any measurement paradigm should be used carefully because no metric can ever be a perfect proxy. Some metrics are poor measures because they are noisy approximations (some examples are noted in figure 1). Retention is often used to measure employee satisfaction; however, this can capture much more than satisfaction—it can reflect compensation, promotion opportunities, issues with a team, or even a partner’s move. At the team level, some managers may block transfers to protect their own retention ratings. Even if retention did reflect satisfaction, it is a lagging measure, and teams don’t see shifts until it is too late to do anything about it. We have written elsewhere about the limitations inherent in the use of story points, 9 which could give teams incentive to focus on their own work at the expense of collaborating on important projects. Teams and organizations acmqueue | january-february 2021 20

21. productivity factors of the environment and work culture have substantial impact too. Mentoring new members of the team and morale building are important. If developers are constantly being paged in the night for sev-1 incidents while working from home during Covid-19, these “invisible” factors are especially helpful to make them more productive. Incident management is a complex process that involves various stakeholders performing several individual and team activities, and it requires support from different tools and systems, so it is critical to identify metrics that can capture various dimensions of productivity: 3 Satisfaction: how satisfied SREs are with the IM process, escalation and routing, and on-call rotations are key metrics to capture, especially since burnout is a significant issue among SREs. 3 Performance: these measures focus on system reliability; monitoring systems’ ability to detect and flag issues faster, before they hit the customer and become an incident. MTTR (mean time to repair) overall, and by severity. 3 Activity: number of issues caught by the monitoring systems, number of incidents created, number of incidents resolved—and their severity distribution. 3 Communication and collaboration: people included in resolving the incident, how many teams those people came from, and how they 21 of 29 should be cognizant of developer privacy and report only anonymized, aggregate results at the team or group level. (In some countries, reporting on individual productivity isn’t legal.) Individual-level productivity analysis, however, may be insightful for developers. For example, previous research shows that typical developer work shifts depend on the phase of development, and developers may have more productive times of day. 14 Developers can opt in to these types of analyses, gaining valuable insights to optimize their days and manage their energy. Finally, any measurement paradigm should check for biases and norms. These are external influences that may shift or influence the measures. Some examples are included here, but they aren’t exhaustive, so all teams are encouraged to look for and think about external influences acmqueue | january-february 2021 21

22. 22 of 29 productivity that may be present in their data: 3 Peer review and gender. Research shows that women are more likely to receive negative comments and less likely to receive positive comments in their code reviews. 16 Any analysis of satisfaction with the review process should check for this in your environment. Understand that developers are likely influenced by the broader tech industry even if the patterns are not in your organization or team. Take these effects into account. 3 Normalizing measures across time. Teams should be careful about any methods used to normalize time, especially across long periods. For example, looking at metrics over a year would bias against those taking parental leave. 3 Perceptual measures. Teams and organizations should be mindful of cultural norms—and embrace these. Some cultures naturally report higher, while some report lower. It doesn’t mean perceptual measures can’t be trusted; it just means measures from these different cultures will have a different baseline and shouldn’t be compared with each other. communicate during an incident. Incident resolution documentation outlines the steps involved in resolving incidents; this can be measured by completeness (to check if any resolution data was entered) or quick quality scores (e.g., thumbs up/down). Teams may also include a metric that measures the percentage of incidents resolved that reference these guides and documentation. 3 Efficiency and flow: incident handoffs, incident assignment/re-assignment, number of hops an incident has to take before it is assigned to the right individual or team. acmqueue | january-february 2021 22

23. 23 of 29 productivity WHY THIS MATTERS NOW Developer productivity is about more than an individual’s activity levels or the efficiency of the engineering systems relied on to ship software, and it cannot be measured by a single metric or dimension. We developed the SPACE framework to capture different dimensions of productivity, because without it, pervasive and potentially harmful myths about productivity may persist. The SPACE framework provides a way to logically and systematically think about productivity in a much bigger space and to carefully choose balanced metrics linked to goals—and how they may be limited if used alone or in the wrong context. The framework helps illuminate tradeoffs that may not be immediately obvious and to account for invisible work and knock-on effects of changes such as increased work if activity is measured at the expense of unfulfilled developers or disruptions to overall flow and efficiency. The need to understand and measure productivity holistically has never been greater. As the Covid-19 Related articles Getting What You Measure Four common pitfalls in using software metrics for project management. Eric Bouwers, Joost Visser, and Arie van Deursen https://queue.acm.org/detail.cfm?id=2229115 DevOps Metrics Your biggest mistake might be collecting the wrong data. Nicole Forsgren and Mik Kersten https://queue.acm.org/detail.cfm?id=3182626 Beyond the “Fix-it” Treadmill The use of post-incident artifacts in high- performing organizations. J. Paul Reed https://queue.acm.org/detail.cfm?id=3380780 People and Process Minimizing the pain of business process change. James Champy https://queue.acm.org/detail.cfm?id=1122687 3 3 3 3 acmqueue | january-february 2021 23

24. productivity 24 of 29 pandemic disrupted work and brought a sudden switch to working from home, many questioned its impact on productivity and posed questions around how to understand and measure this change. As the world slowly returns to a “new normal,” the SPACE framework captures the dimensions of productivity that are important to consider as future changes are proposed and made. The framework is meant to help individuals, teams, and organizations identify pertinent metrics that present a holistic picture of productivity; this will lead to more thoughtful discussions about productivity and to the design of more impactful solutions. Acknowledgments We are grateful for the thoughtful review and insightful comments from our reviewers and are confident that incorporating their notes and responses has strengthened the paper. We are excited to have it published in acmqueue. References 1. Beller, M., Orgovan, V., Buja, S., Zimmermann, T. 2020. Mind the gap: on the relationship between automatically measured and self-reported productivity. IEEE Software; https://arxiv.org/abs/2012.07428. 2. Brumby, D. P., Janssen, C. P., Mark, G. 2019. How do interruptions affect productivity? In Rethinking Productivity in Software Engineering, ed. C. Sadowski and T. Zimmermann, 85-107. Berkeley, CA: Apress; https://link.springer.com/ chapter/10.1007/978-1-4842-4221-6_9. 3. B utler, J. L., Jaffe, S. 2020. Challenges and gratitude: a acmqueue | january-february 2021 24

25. 25 of 29 productivity 4. 5. 6. 7. 8. 9. diary study of software engineers working from home during Covid-19 pandemic (August). Microsoft; https:// www.microsoft.com/en-us/research/publication/ challenges-and-gratitude-a-diary-study-of-software- engineers-working-from-home-during-covid-19- pandemic/. C sikszentmihalyi, M. 2008. Flow: The Psychology of Optimal Experience. Harper Perennial Modern Classics. Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J. 2012. Social coding in GitHub: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 Conference on Computer-supported Cooperative Work (February), 1277-1286; https://dl.acm.org/ doi/10.1145/2145204.2145396. D ourish, P., Bellotti, V. 1992. Awareness and coordination in shared workspaces. In Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work (December), 107-114; https://dl.acm. org/doi/10.1145/143457.143468. F ord, D., Storey, M. A., Zimmermann, T., Bird, C., Jaffe, S., Maddila, C., Butler, J. L., Houck, B., Nagappan, N. 2020. A tale of two cities: software developers working from home during the Covid-19 pandemic; https://arxiv.org/ abs/2008.11147. F orsgren, N. 2020. Finding balance between work and play. The 2020 State of the Octoverse. GitHub; https:// octoverse.github.com/static/github-octoverse-2020- productivity-report.pdf. F orsgren, N., Humble, J. Kim, G. 2018. Accelerate: The Science of Lean Software and DevOps: Building and acmqueue | january-february 2021 25

26. 26 of 29 productivity 10. 11. 12. 13. 14. 15. 16. Scaling High Performing Technology Organizations. IT Revolution Press. Forsgren, N., Kersten, M. 2018. DevOps metrics. Communications of the ACM 61(4), 44-48; https:// dl.acm.org/doi/10.1145/3159169. Fuks, H., Raposo, A., Gerosa, M. A., Pimental, M. 2008. The 3C collaboration model. In Encyclopedia of E-Collaboration, ed. Ned Kock, 637-644. IGI Global; https://www.researchgate.net/publication/292220266_ The_3C_collaboration_model. G raziotin, D., Fagerholm, F. 2019. Happiness and the productivity of software engineers. In Rethinking Productivity in Software Engineering, ed. C. Sadowski and T. Zimmermann, 109-124. Berkeley, CA: Apress; https://link.springer.com/ chapter/10.1007/978-1-4842-4221-6_10. M aslach, C., Leiter, M. P. 2008. Early predictors of job burnout and engagement. Journal of Applied Psychology 93(3), 498-512; https://doi.apa.org/doiLandin g?doi=10.1037%2F0021-9010.93.3.498. Meyer, A. N., Barton, L. E., Murphy, G. C., Zimmermann, T., Fritz, T. 2017. The work life of developers: activities, switches and perceived productivity. IEEE Transactions on Software Engineering 43(12), 1178-1193; https://dl.acm. org/doi/10.1109/TSE.2017.2656886. Murphy-Hill, E., Jaspan, C., Sadowski, C., Shepherd, D., Phillips, M., Winter, C., Knight, A., Smith, E., Jorde, M. 2019. What predicts software developers’ productivity? IEEE Transactions on Software Engineering; https:// ieeexplore.ieee.org/document/8643844/. P aul, R., Bosu, A., Sultana, K. Z. 2019. Expressions of acmqueue | january-february 2021 26

27. productivity 27 of 29 sentiments during code reviews: male vs. female. In IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 26-37; https://ieeexplore.ieee.org/document/8667987. 17. Ralph, P., et al. 2020. Pandemic programming: How Covid-19 affects software developers and how their organizations can help. Empirical Software Engineering 25(6), 4927-4961; https://www.researchgate.net/ publication/344342621_Pandemic_programming_How_ COVID-19_affects_software_developers_and_how_ their_organizations_can_help. 18. S chmidt, K., Bannon, L. 1992. Taking CSCW seriously: supporting articulation work. Computer Supported Cooperative Work 1(1), 7-40; https://link.springer.com/ article/10.1007/BF00752449. 19. S edano, T., Ralph, P., Péraire, C. 2017. Software development waste. In Proceedings of the 39th International Conference on Software Engineering, 130- 140; https://dl.acm.org/doi/10.1109/ICSE.2017.20. 20. Storey, M. A., Zimmermann, T., Bird, C., Czerwonka, J., Murphy, B., Kalliamvakou, E. 2019. Towards a theory of software developer job satisfaction and perceived productivity. IEEE Transactions on Software Engineering; https://ieeexplore.ieee.org/ document/8851296. 21. S uchman, L. 1995. Making work visible. Communications of the ACM 38(9), 56-64; https://dl.acm.org/ doi/10.1145/223248.223263. 22. Vasilescu, B., Posnett, D., Ray, B., van den Brand, M. G. J., Serebrenik, A., Devanbu, P., Filkov, V. 2015. Gender and tenure diversity in GitHub teams. In Proceedings of acmqueue | january-february 2021 27

28. productivity 28 of 29 the 33rd Annual ACM Conference on Human Factors in Computing Systems (April), 3789-3798; https://dl.acm. org/doi/abs/10.1145/2702123.2702549. Nicole Forsgren is the VP of Research & Strategy at GitHub. She is an expert in DevOps and the author of the Shingo Publication Award-winning book Accelerate: The Science of Lean Software and DevOps. Her work on technical practices and development has been published in industry and academic journals and is used to guide organizational transformations around the world. Margaret-Anne Storey is a Professor of Computer Science at the University of Victoria and a Canada Research Chair in Human and Social Aspects of Software Engineering. Her research focuses on improving software engineering processes, tools, communication and collaboration in software engineering. She consults with Microsoft to improve developer productivity. Chandra Maddila is a Senior Research Engineer at Microsoft Research. His research focuses on improving developer productivity and software engineering processes leveraging machine learning and AI. He developed tools and techniques that are used organization-wide at Microsoft, won best paper award at USENIX OSDI, and has been featured in tech journals like VentureBeat. Thomas Zimmermann is a Senior Principal Researcher at Microsoft Research. His research uses quantitative and qualitative methods to improve software productivity and to acmqueue | january-february 2021 28

29. productivity 29 of 29 investigate and overcome software engineering challenges. He is best known for his work on mining software repositories and data science in software engineering, which received several most influential paper awards. Brian Houck is a Principal Program Manager in the Azure Engineering Systems at Microsoft. His work focuses on improving developer productivity and satisfaction for engineers within the Azure organization by improving developer tooling, development processes, and team culture. Jenna Butler holds a PhD in Computer Science and specializes in Bioinformatics. She is an Adjunct Professor at Bellevue College in the Radiation Therapy department and is a Senior Software Engineer at Microsoft. Recently Jenna has been working with the Productivity & Intelligence (P+I) team in MSR to study alignment and decision making; working in the services; and the impact of remote work during this time. Jenna most enjoys multidisciplinary work that can benefit well-being and health while maximizing productivity. Copyright © 2021 held by owner/author. Publication rights licensed to ACM. acmqueue | january-february 2021 29