IT Sustainability Think Tank: Calculating IT equipment capacity – the challenging path forward


Datacentre energy consumption and operational efficiency are under intense scrutiny. Sensing a need to control datacentre growth, legislators and regulators have begun efforts to require operators to report facility-level information and key performance indicators and set minimum facility performance thresholds.

Their ultimate goal is to require operators to report indicative performance and efficiency metrics, including power usage effectiveness (PUE) and a work delivered per unit of energy consumed metric.

The European Union has acted first. The Energy Efficiency Directive (EED) and Delegated regulation (finalised but not published) require datacentres with more than 500 kilowatts of installed IT equipment power demand to report over 30 location and operating parameters to member states and the European database on datacentres.

The final delegated regulation expands the EED reporting requirements and establishes reporting requirements for installed server work capacity (SERT® active state performance or SERT CPUperf) and installed storage capacity (petabytes).

Even without regulatory mandates, a work per energy metric represents an industry best practice. Each new generation of IT equipment delivers double or triple the equipment work capacities and work per watt of the previous equipment generation. Advancements in workload management software enable IT operators to increase the utilization of their IT equipment infrastructure. A work per energy metric captures and illustrates the benefits of refreshing and consolidating IT equipment.

Datacentre operators face significant challenges in reporting server and storage capacity indicators. To comply with these requirements or calculate a work per energy metrics for a datacentre facility, operators need to maintain an equipment inventory with critical component data and equipment location and establish a process to capture and calculate datacentre server work capacity. The industry needs a standardised method for reporting server work capacity.

Equipment Inventories

Calculating or estimating IT equipment work capacities requires knowledge of the equipment location and component specifications. Unfortunately, an Uptime survey of IT operators indicates that only one-third of IT operators maintain a detailed equipment inventory capable of calculating equipment capacities in a data center (Table 1).

Calculating the total server work capacity or the storage capacity of a datacentre facility requires an operator to know the quantity and type of equipment located in each facility. The survey results indicate that only 30% of operators can make that match from their inventory.  

A blue rectangular object with numbers and text

To calculate total server work capacity, an IT operator needs to know the quantity, part number, and core count of the CPUs installed in each server. Work capacity values are expected to be assigned by either CPU part number or core count. The total work capacity will be calculated or estimated based on the aggregation of the work capacity values of the installed server infrastructure. The survey indicates that only 27% of operators currently collect CPU part numbers and core count.

To calculate storage capacity, the operator needs to know the number of storage devices and their capacity for each piece of storage equipment. 53% of operators indicate that they collect this data.

Datacentre operators must take three steps to update and improve their equipment inventory and management processes to calculate datacentre work capacity.

  • Update the equipment inventory system to include all the component information required to calculate work capacity. In addition to the values required for server and storage products, the bandwidth and data transfer rates of network equipment should also be collected.
  • Update equipment purchase specifications and processes to require the reporting and collection of equipment component data. This will require collaboration with the purchasing organization and the creation of an electronic process for collecting inventory information.
  • Perform a survey of the installed IT equipment to get a complete inventory of the existing infrastructure. Asset discovery software that identifies equipment and captures the component information offers the best approach to update the current inventory. It can also be used in place of a procurement process to update the inventory as new equipment is installed. The other, resource-intensive option is to conduct a manual survey.   

Building an effective inventory system will take time and require inter-organizational collaboration. IT operators lacking a system are optimistic about their ability to create or upgrade to a workable system: 68% (47% of total respondents) indicate that they can implement a system within one year.

Server work capacity

The delegated regulation defines the server work capacity as the Server Efficiency Rating Tool (SERT®) active state performance score as designated in EU Commission regulation – 2019/424 (Ecodesign requirements for server and storage products). The active state performance is the geometric mean of the measured 100% normalised performance scores of the seven CPU worklets in the Server Performance Evaluation Corporation (SPECSM) suite. It has units of transactions per second relative to a reference server.

The majority of the datacentre industry supported the use of the SERT active state performance score as a representative server work capacity value. As SERT measurements are required to demonstrate compliance with server energy efficiency requirements in the EU, the US, and Japan, server manufacturers are generating SERT measurements for selected configurations for regulatory compliance and internal research purposes.

Working with manufacturers, The Green Grid (TGG) has collected a dataset of SERT measurements for over 600 server configurations with over 100 CPU part numbers. Using this dataset, TGG performed an analysis showing that the active state performance score is dependent on the CPU part number and independent of the server configuration.

A study of 15 CPU part numbers, representing three generations of AMD and Intel CPUs, found that 13 CPU part numbers had average active state performance values had standard deviations ranging from 3 to 20%, with two-part numbers having standard deviations around 30%.

Overall, the CPU active state performance values provide an acceptable basis for calculating a representative estimate of the total data center work capacity for standard CPU-based servers, enabling comparison of year-to-year changes.   

Given the EED delegated regulation requirement to begin limited work capacity reporting for the 2024 report year, the industry must establish one or more databases containing active state performance values for CPU part numbers. Databases could be created by industry organisations such as TGG or by equipment manufacturers. A TGG working group has a project underway to publish a database in the second half of 2024.

Work capacity data is not currently available for servers that incorporate GPUs (GPU-based servers) for high-performance computing, artificial intelligence, and machine learning applications. These servers make up a small percentage of the currently installed server infrastructure, but their presence in the datacentre is expected to grow in the future. The SPECPower® committee is reportedly working on an extension to the SERT test that will measure performance and power values for GPU-based servers and provide a work capacity for these servers.

Storage equipment capacity

The work capacity of a storage product is the raw storage capacity of the product in terabytes, a value readily available from the product manufacturer. The value of a datacentre storage capacity is the sum of the raw storage capacity of all the installed storage products. Operators should collect and record this data in their equipment inventory when equipment is purchased.

The network equipment bandwidth and data transfer capacity would be calculated in a similar way.

In summary, given a complete server equipment inventory and a data set of CPU average active state performance values, datacentre operators will be able to calculate their total datacentre work capacity for CPU-based servers by multiplying the number of CPUs for a given part number times the active state performance value and summing those values for all the servers in a specific datacentre.

While this is a simple summation, a significant amount of work is required of individual datacentre operators and the datacentre industry to build the datasets needed to complete the work capacity calculation for regulatory reporting.

Source link

Leave a comment