Green Server Design: Beyond Operational Energy to Sustainability Jichuan Chang, Justin Meza, Parthasarathy Ranganathan, Cullen Bash, Amip Shah Hewlett Packard Labs “Green” server and datacenter design requires a focus on environmental sustainability. Prior studies have focused on operational energy consumption as a proxy for sustainability, but this metric only captures part of the environmental impact. In this paper, we argue that to understand the total impact, we need to examine the entire lifecycle of the system, beyond operational energy to also include material use and manufacturing. We make two main contributions. We present a methodology that allows such a lifecycle analysis, specifically providing attribution of sustainability bottlenecks to individual system architecture components. Using this methodology, we compare the sustainability tradeoffs between popular energy-efficiency optimizations and discuss sustainability bottlenecks and optimizations for future system designs. 1. Introduction Environmental sustainability (the manufacturing, operation, and disposal of products to minimize their environmental impact in terms of destruction of natural resources or production of undesired emissions) is fast becoming an important design constraint for Information Technology (IT) systems [2]. The carbon footprint of the IT industry, though only 2% of the world economy, is estimated to be equal to that of the entire aviation industry [27]. Even more importantly, IT is increasingly being used to address the remaining 98% of the carbon emissions of the world economy [27] (e.g., use of video conferencing to avoid travel) and as this trend continues, it will become more important to design ―green‖ IT systems. A recent estimate showed that up to 75% of organizations will soon consider sustainability as one of the criteria in their IT purchases [28]. The UK government is starting a mandatory Kyoto-style cap-and-trade scheme to curb energy consumptions of businesses [4] and the US Congress has similarly been considering various federal cap-and-trade schemes [3]. There has been a large body of prior work on reducing the operational electricity consumption of servers (e.g. [6] [12] [19] [24] [30] [29] [31]). Given that most of the electricity produced in the world comes from carbon-intensive sources, these optimizations can help improve the carbon footprint of servers and datacenters during operation. However, these approaches do not address environmental impact of a system across all the stages of its lifecycle such as the extraction of raw materials, manufacturing, transportation, operation, and disposal. In this paper, we examine the problem of lifecyclebased optimization of future server and datacenter designs. We make two main contributions – (1) a methodology to reason about sustainability from a system architecture perspective and (2) a systematic analysis of the environmental impact of current designs across their entire lifecycle and the tradeoffs with stateof-the-art energy-efficiency techniques. 2. Measuring Sustainability: Using Exergy for Architectural Studies Numerous schemes exist to quantify the environmental sustainability of systems. Life-cycle assessment (LCA), a field that has been in practice for nearly 50 years [1], involves taking an end-to-end approach to assessing the environmental impact of a system across various stages in its lifecycle. In this paper, we perform lifecycle assessment using the thermodynamic metric of exergy (available energy) consumption to reason about sustainability. A detailed description of exergy is outside the scope of this paper. However, briefly, unlike energy that is neither created nor destroyed (1st law of thermodynamics), exergy is continuously consumed in the performance of useful work by any real entropygenerating process (2nd law of thermodynamics). Several previous studies have discussed how this destruction (or consumption) of exergy is representative of the irreversibility associated with various processes [7] [21] and correspondingly, to a first order, the environmental sustainability [11]. Additionally, models for specific IT systems [18] have shown that optimizations to reduce lifecycle exergy consumption often map fairly well to optimizations based on other types of environmental criteria such as greenhouse gas emissions, pollution, etc. [32]. Unfortunately, previous lifetime exergy characterizations have estimated the total environmental impact in computer systems based on a mapping of the system mass or material flows to perunit estimates of the environmental impact burden [18][34]. Figure 1(a) shows such a breakdown for a typical server (2-socket Xeon-based server with 4 DIMMs and two 72G HDDs, two 1Gb NICs, and 25% utilization) using these methods. Such a model is not very useful for system architects because extending such a breakdown of exergy to systems architecture choices is not clear. Since architectural choices may span multiple stages of the entire system lifecycle, deciding to use one component over another in a system will result in (often non-intuitive) changes to the total system environmental impact due to differences in the manufacturing process, not just for the chosen component but also for related components that interact at the system or datacenter level. An approach that considers lifecycle exergy consumption from an architectural perspective is required. DIMMs, and hard disk drives, as opposed to their associated raw materials. This enables us to express the environmental impact of complex ensembles of diverse sets of materials succinctly in terms of system architectural choices. Our approach categorizes exergy1 into three broad categories – embedded, operational, and infrastructure. Embedded exergy is the amount of exergy used to ―make‖ a system component. To a first degree, this is the amount of exergy expended during extraction, manufacturing, transportation, and recycling. For most components, the bulk of the embedded exergy is destroyed during manufacturing as complicated processes use high quality energy to manufacture highly-ordered electronic components, and various chemicals required for making these components themselves require large amounts of energy to manufacture. Our model abstracts out the appropriate exergy destruction values for all of the processes specific to each component, and then aggregates these data to discern the overall exergy consumption related to each architectural component2. Operational exergy is the amount of exergy spent during a system‘s operational lifetime. Although the heat dissipated from the server contains useful work potential, there are currently no practical techniques to harness this waste heat and recover this exergy. In this study, therefore, we assume that operational exergy is equivalent to the electricity consumed during operation. To determine operational exergy, for each component, we use its maximum power rating and model how its power varies with utilization. We determined these values from published sources, internal experiments, and communications with system designers. This model is similar to that used in other recent system studies (e.g. [23]) and provides a highorder estimate of the power consumed across different workloads (varying utilizations). We assume a threeyear lifecycle and 99.99% uptime. Figure 1(c) summarizes our model parameters. In most datacenters, the cooling and power delivery infrastructure accounts for a large fraction of the total electricity consumption, and consequently, we account for infrastructure exergy as a separate category. This takes into account the operational energy used by CRAC units, chillers, cooling towers and any other equipment employed in the data center (a) Process-based breakdown of total exergy (b) Architecture-based breakdown Part Embd. (MJ) Sources CPU 158 [9] [13] [26] Chipset 66 [13] [22] [26] DRAM 726 [13] [33] [26] PCB 1400 [13] [18] [34] Chassis 512 [16] [18] [21] PSU 683 [13] [18] HDD 546 [13] [18] Fan 209 [16] [18] Misc. 420 [20] [34] Part Processor Memory HDD (15K) NIC (Gigabit) Fan Northbridge Southbridge PSU DC conversion Misc. Total # TDP (W) Idle% 2 95 10% 4 10 50% 2 5 80% 2 6 50% 4 3 0% 1 27.1 0% 1 4.3 0% 1 33 100% 15 100% 10.6 100% 354 - (c) Sustainability modeling parameters Figure 1: (a) illustrates previous process-based approaches to reasoning about sustainability, (b) illustrates our proposed model to reason about sustainability based on system architecture components, (c) summarizes key model parameters. 1 Our work attempts to address these issues by adopting an architecture-centric approach to measuring and optimizing the environmental impact of systems. Specifically, we aggregate raw materials at the component level, allowing us to evaluate environmental impact at the granularity of familiar architectural building blocks such as processors, More specifically, it is exergy consumption. In this paper, we loosely use the term exergy to refer to exergy consumption. 2 We aggregate the embedded data from multiple public sources [13] [34] [33] [20] [9] [22] [16] [26]. Notice these data are derived based on specific supply chain and component models. Modeling embedded exergy in a different context should not directly use these numbers, but rather use the methodology and data sources described here with new, revised assumptions that are appropriate for the system being modeled. 2 Workload Ecommerce 1 Ecommerce 2 Dotcom Pharmacy SAP 1 SAP 2 Worldcup 1 Worldcup 2 Consolidation 1 Consolidation 2 Animation farm (a) Total exergy based exploration (b) Operational energy based exploration Utilizations Mean Peak_Sum 7% 17% 23% 49% 16% 36% 3% 11% 17% 31% 26% 75% 10% 53% 8% 19% 34% 79% 31% 79% 93% 100% OP (% base) EP Con 18% 27% 48% 66% 37% 52% 10% 17% 39% 50% 53% 84% 27% 61% 21% 31% 62% 88% 59% 88% 98% 100% Total (% base) EP Con 36% 25% 57% 63% 49% 49% 31% 16% 51% 46% 61% 82% 42% 60% 38% 28% 68% 87% 66% 86% 98% 100% (c) Real workloads and efficiencies (winners shaded) Figure 2: Illustration of tradeoffs between different energy-efficiency optimizations. infrastructure. (Note that on-board fans are considered part of server operational power.) We assume that cooling is provisioned appropriately to handle the maximum power rating, and we use the widely-used power usage effectiveness (PUE) metric 3 [14] to compute infrastructure exergy. The exergy consumption related to building the power and cooling infrastructure in the datacenter is outside the scope of our model; but, when normalized to a datacenter scale and across multiple IT refresh cycles, we expect the allocation of its embedded burden is minimal. categories of optimizations: (i) Energy proportionality (EP) [6] in the datacenter space has gained a lot of attention with several optimizations [15] [8] [29] [24] that seek to make the energy consumed by a system be proportional to the activity in the system. (ii) Consolidation (Con) is another optimization common in current datacenters. The intuition is that typical utilization on many enterprise services is relatively low and bursty and that across a collection of systems, peaks are often unsynchronized (the peak of the sums of the individual utilizations is lower than the sum of the peak individual utilizations). Multiple virtual machines (or tasks in a task scheduler) on separate servers can be consolidated onto a single server, raising its utilization and reducing the required server count (and total power) [25] [30]. (iii) Recently, there have been several low-power server solutions (LP) based on energy-efficient, but lower-power processors [23] [17] [10] [5]. A common idea behind these solutions is to better match the processor architecture to the workload characteristics (primarily around CPU-I/O balance) to leverage significantly better performance/watt. Figure 2 shows our results from examining these three optimizations for a parameterized design space exploration. The benefits from EP are primarily a function of workload average utilization. Figure 2(a) shows this design space exploration for an average workload utilization held constant at 25%. (We examine other utilization points as well, but omit them for brevity.) For a given average workload utilization, we identify different tradeoffs for the LP designs by using a performance/watt multiplier on the X axis. For some workloads (e.g, enterprise workloads), a lowerpower processor may lose more in performance than it saves in power; for these cases the performance/watt multiplier is less than 1 (right side of the figure), indicating the LP solution‘s performance/watt (or energy efficiency) at peak load is worse than a conventional server. For web workloads, prior studies [23] [17] [10] have found LP to yield better multipliers ranging from 2 to 5 (left side of the axis). As discussed earlier, the effectiveness of consolidation is a function of how many processes can be packed into a single server, which in turn is a function of the peak-of-sum utilization specific to the workload. The Y axis shows 3. Evaluating the state-of-the-art Exergy breakdown Figure 1(b) shows the breakdown of total lifecycle exergy using our models. We focus on the same server as in Figure 1(a), and assume a workload utilization of 25%, and a PUE (1.6) based on prior studies [17]. The results show that operational exergy dominates the total exergy of the system (53%), followed by infrastructure exergy (27%), and embedded exergy (20%). Of note is that the embedded exergy contributes a sizable amount to total system exergy. The dominant components of embedded exergy are from silicon-based processes and PCB design. Assuming a datacenter container with 1056 of these servers, the total amount of exergy consumption is 25.4 Tera Joules over a three year timeframe, equivalent to approximately 870 metric tons of coal consumption. Design space exploration There has been a large body of prior techniques that address operational energy. However, their impact on total exergy hasn‘t been studied. Specifically, how do these techniques compare from a sustainability point of view? Are there tradeoffs between operational exergy and embedded exergy that make some of these techniques less effective in improving net sustainability? If these techniques are aggressively applied to future systems, what would the new breakdown of exergy consumption look like? To answer these questions, we studied three broad 3 PUE = 1+ infrastructure_power / operational_power 3 this parameter. Lower values indicate that the peaks are completely non-synchronized and consolidation can more readily be leveraged. Different points on the heat map thus represent different workload/system configurations. For each data point, we individually compute the total exergy for EP, Con, and LP designs providing the same aggregate performance and identify the optimization that achieves the best exergy. (Recall that lower exergy consumption is better.) The heat map‘s color gradation reflects the absolute value of this best exergy. The division of the heat map into various regimes shows the technique that achieves the best exergy for that region of workload/system configurations. For energy proportionality, we studied a best-case future model where all the hardware shows ideal proportionality (the power consumed in an idle state is zero). For consolidation, we assumed perfect bin-packing that minimizes the number of servers. Figure 2(b) shows a similar picture, but for a case where only operational energy is considered. For EP and Con, we model component power after a conventional server shown in Figure 1(c); for LP, we model an HP BC2500 blade server with maximum component powers similar to [23]. Here we assume a PUE of 1.5 for infrastructure exergy, and adjust the embedded exergy consumption values of components within each system based on a scaling of key physical attributes for each component4. Observations This way of representing the data reveals several interesting high-level trends. First, the figures individually show the different regions when different techniques work best and the cross-over points, as well as the relative magnitude of the benefits. Comparing the two figures allows us to examine the changes to these design tradeoffs when optimizing for just operational energy versus considering total exergy. Figure 2(a) shows that in general the total exergy of the system is minimized when going towards the bottom left region of the graph—not surprising considering this assumes more power-efficient components and lower resource activity (more consolidation). First comparing EP and Con, we observe that EP outperforms Con when the workloads are not bursty and don‘t lend themselves to packing (top right part of Figure 2(a)). The break-even point is roughly corresponding to workloads with peak-of-sum utilizations close to 50%. Below this, Con is a better design alternative. Interestingly, this conclusion is different than when just focusing on operational energy. There, given fragmentation in bin-packing, perfect energy proportionality is always better than consolidation. However, when considering total exergy, a reduction in materials associated with fewer servers provides additional reductions in embedded exergy that allow Con to be better than EP 5. Comparing with LP, we find that after a breakeven point roughly corresponding to 1.6-2.6X improvement in performance/watt, LP designs are always better than both EP and Con. Considering the differences between Figures 2(a) and Figure 2(b), the inflection point at which LP is better than other alternatives shifts to the left (requiring even more energy efficiency from lower power processors) when total exergy is considered. This is because of the increased embedded exergy from the larger number of lower-power servers required for the same performance. Comparing LP and Con, it is worth noting that there is now a region where consolidation of multiple small processes into one server is better than distributing them into multiple small low-power blades. Notice that because LP and EP are independent of peak of sum utilization, the break-even point between these solutions is dictated entirely by the performance/watt multiplier. This implies that the optimal choice between these two solutions is dependent on their relative energy efficiencies for the type of workload. The number of machines used in Con depends on the peak-of-sum utilization, but notice that consolidation also raises overall system utilization, increasing operational exergy. This trade-off between fewer machines and higher utilization is shown as the angled line dividing LP and Con. The table in Figure 2(c) illustrates the tradeoffs between EP and Con with data from various real-world traces. (They correspond to specific real-world points in the bottom right portions of the heat maps.) From an operational exergy perspective, EP achieves more savings compared to Con for all the enterprise traces, but, by contrast, from a total exergy perspective, in many cases Con outperforms EP. 4. Discussion The results above illustrate that focusing on the most efficient system design for operational energy does not always produce the most sustainable solution. Tradeoffs between operational exergy and embedded exergy need to be considered. The examples in the previous section—requiring larger factors of energy 4 For example, we find that the key physical attribute governing the footprint of a microprocessor is the area of the silicon. Thus, we normalize the impact calculated in Fig. 1(c) by the area to derive an ‗impact factor‘ representing the exergy consumption per unit area. This impact factor can then be scaled as required for processors of different sizes, assuming uniform thickness; fabrication; etc. If other key attributes vary (e.g., a change in the thickness of the package), these can be accordingly parameterized as well. A similar approach can be repeated for each of the different architectural components. 5 Note that we assume a model where consolidation leads to lower provisioning of servers; if consolidation just allowed servers to be turned off, we would not get the embedded savings. 4 [2] Revolutionizing Datacenter Energy Efficiency. McKinsey, 2008. [3] Regional Greenhouse Gas Initiative. http://www.rggi.org. 2008. [4] UK Government. Carbon Reduction Commitment, July 2009. [5] D. Andersen, J. Franklin, et al. FAWN: a fast array of wimpy nodes. SOSP 2009. [6] L. A. Barroso and U. Hölzle. The case for energy-proportional computing. IEEE Computer, 40(12):33–37, 2007. [7] A. Bejan. Advanced Engineering Thermodynamics (2nd Edition). John Wiley & Sons, 1997. [8] R. Bianchini and R. Rajamony. Power and energy management for server systems. IEEE Computer, 37(11):68–74, 2004. [9] S. Boyd, A. Horvath, et al. Life-cycle energy demand and global warming potential of computational logic. Env. Sci. Tech., 2009. [10] A. Caulfield, L. Grupp, and S. Swanson. Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications. ASPLOS-XIV, 2009. [11] I. Dincer and M. Rosen. Exergy: Energy, Environment and Sustainable Development. Elsevier, 2007. [12] X. Fan, W-D. Weber, and L. Barroso. Power provisioning for a warehouse-sized computer. ISCA 2007. [13] R. Frischknecht, et al. The ecoinvent database: overview and methodological framework. J. Life Cycle Assessment, 10(1), 2005. [14] The Green Grid. Green grid metrics: Describing datacenter power efficiency. http://www.thegreengrid.org, 2007. [15] M. Gupta, S.Singh. Greening of the Internet. SIGCOMM 03. [16] T. Gutowski, et al. Thermodynamic analysis of resources used in manufacturing processes. Env. Sci. Tech., 43(5), 2009. [17] J. Hamilton. Cooperative expendable micro-slice servers: Low cost, low power servers for internet-scale services. CIDR 2009. [18] C. Hannemann, et al. Lifetime exergy consumption as a sustainability metric for enterprise servers. ASME ICES, 2008. [19] T. Heath, et al. Mercury and Freon: Temperature emulation and management for server systems. ASPLOS-XII 2006. [20] Y. Huang, C. Weber, and H. Matthews. Carbon footprinting upstream supply chain for electronics manufacturing and computer services. IEEE ISSST 2009. [21] D. Morris, J. Szargut and F. Steward. Exergy analysis of thermal, chemical and metallurgical processes. Hemisphere, 1988. [22] N. Krishnan, et al. A hybrid life cycle inventory of nano-scale semiconductor manufacturing. Env. Sci. Tech., 42(8), 2008. [23] K. Lim, P. Ranganathan, et al. Understanding and designing new server architectures for emerging warehouse-computing environments. ISCA 2008. [24] D. Meisner, B. Gold, and T. Wenisch. PowerNap: Eliminating server idle power. ASPLOS-XIV, 2009. [25] R. Nathuji and K. Schwan. VirtualPower: coordinated power management in virtualized enterprise systems. SOSP 2007. [26] J. Oliver, R. Amirtharajah, et al. Life cycle aware computing: Reusing Silicon Technology. IEEE Computer, 40(12), 2007. [27] The Climate Group and GeSI, SMART2020: Enabling the low carbon economy in the information age, 2008. [28] D. Plummer, et al., Gartner‘s top predictions for IT organizations and users: Going green and self-healing, 2008. [29] R. Raghavendra, et al. No ―power‖ struggles: Coordinated multi-level power management for the data center. ASPLOS 2008. [30] K. Rajamani and C. Lefurgy. On evaluating request-distribution schemes for saving energy in server clusters. ISPASS 2003. [31] P. Ranganathan, P. Leech, D. Irwin, and J. Chase. Ensemblelevel power management for dense blade servers. ISCA 2006. [32] A. J. Shah, C. D. Patel, , and V. P. Carey. Exergy-based metrics for sustainable design. IEEES-4, 2009. [33] E. Williams. The environmental impacts of semiconductor fabrication. Thin Solid Films, 461(1), 2004. [34] E. Williams. Energy intensity of computer manufacturing: Hybrid assessment combining process and economic input-output methods. Env. Sci.Tech., 38(22), 2004. efficiency improvement for low-power servers to be sustainably better, or consolidation being more sustainable than energy proportionality—illustrate this point. The best way to optimize for sustainability is to use power-efficient and material-efficient systems that scale power with resource usage and are utilized fully. In future systems, as the ratio of embedded exergy to total exergy grows, new optimizations will be needed that explicitly target embedded exergy. For example, upcycling (reusing components when they would normally be recycled or discarded) is an effective way to reduce embedded exergy, amortizing the destruction of exergy over a longer period of time. However, this will require new ways of building systems, including designs that allow technology upgrades to be localized only to the components that need to be upgraded, allowing the rest to be upcycled. ―Dematerialization‖ techniques that reduce the material in the solution will also be important. This will require identifying the sweet spot of resources for best performance efficiency. For example, smaller memory configurations could use less silicon and consequently reduce the embedded exergy associated with memory. Finally, when considering the approaches above, it is important to note that embedded exergy, operational exergy, infrastructure exergy, and performance are not independent variables. For example, dematerialization sometimes reduces infrastructure exergy consumption (e.g., removal of sheet metal in the backplane can enable better designed air flow), but in other cases increases infrastructure exergy (e.g., removal of fans in a server can increase overall cooling energy in the datacenter). Similarly, different optimizations can have different tradeoffs on performance: backplane redesign for dematerialization can impact networking topologies, reductions to cooling infrastructure may lead to performance throttling, and so on. It will therefore be important to address sustainability holistically across the various components of total lifecycle exergy. Overall, as sustainability becomes a more important design consideration for future systems, design methodologies and system optimizations need to correspondingly change to address these emerging challenges. This paper takes the first steps in this direction—around a methodology to reason about sustainability bottlenecks from an architectural viewpoint, and enabling an understanding of tradeoffs and bottlenecks in future designs. We believe, however, that we have only scratched the surface and that these areas offer a rich opportunity for more innovation by the broader community. References [1] ISO 14040: Environmental management – Life Cycle Assessment – Principles and framework. ISO, 2006 5
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement