Resources—News, Articles and Events

Data Center Reliability Classification

There have been numerous proprietary methods of defining the reliability of the physical attributes of the data center over the years from various vendors. These methods have used 3, 4 or 10 different levels to define the reliability. All of these have been successful in providing value to the end users to some extent in the absence of an industry standard. However, an open standard that has been in development is about to published.

Several data center industry experts consisting of end users, manufacturers and data center focused consultants have teamed up with the BICSI organization to develop an open data center standard that addresses both facility and technology attributes. There were some key elements of this standard that the contributors committed too early in the process: 1.) The standard must be open, and 2.) The reliability metrics should be performance based, not solutions based.

Some of the challenges with the proprietary methods promoted by various vendors are:

  1. Closed format without the ability for open participation by the industry. We believe it is important to engage all stakeholders in the development of standards so that all perspectives can be analyzed.
  2. Reliability definitions had previously been based on specific topologies which limited the ability to implement new design options developed by engineers that understand the challenge, and not simply apply “cookie cutter” solutions. New methods of designing reliability into the electrical distribution in a more cost effective approach is one example of how the proprietary vendor methods where ineffective, these new methods did not “fit” within their “cookie cutter” model.
  3. Some of the proprietary methods combined design parameters to build in a desired reliability with the operational processes required to maintain a desired reliability. While we agree that operational processes play a significant part in ensuring the reliability of a data center, and the design can certainly impact the ability to maintain the data center in a reliable manner, combining these metrics added complexity to the framework rather that clarify how the data center should be designed and operated.
  4. The other key challenge with the proprietary vendor methods was that they focused exclusively on the facility systems. Reliable Resources views the Data Center providing services to the business, these “Data Center Services” consist of the facility, the communications physical infrastructure, the network, the processing and storage systems, the applications and the operating systems.

The BICSI standards committee has developed a performance based metric to define the reliability “Classes” of the facility systems. The following is a summary of the 5 Classes that define the various levels of reliability:

Class F0 – Single Path without Alternate Power Source

  • The objective of Class F0 is to support the basic environmental and energy requirements of the IT functions without supplementary equipment. Capital cost avoidance is the major driver. There is a high risk of downtime due to planned and unplanned events. However, in F0 facilities maintenance can be performed during nonscheduled hours, and downtime of several hours or even days has minimum impact on the mission.
  • A critical power distribution system separate from the general use power systems would not exist. There would be no back-up generator system. The system might deploy power conditioning or surge protective devices to allow the specific equipment to function adequately (utility grade power does not meet the basic requirements of critical equipment). No redundancy of any kind would be used for power or air conditioning for a similar reason.

Class F1 – Single Path

  • The objective of Class F1 is to support the basic environmental and energy requirements of the IT functions. There is a high risk of downtime due to planned and unplanned events. However, in Class F1 facilities, maintenance can be performed during nonscheduled hours, and the impact of downtime is relatively low.
  • The critical power distribution system would deploy a power conditioning device to allow the critical equipment to function adequately (utility grade power does not meet the basic requirements of critical equipment). No redundancy of any kind would be used for power or air conditioning for a similar reason.

Class F2 – Single Path with Redundant Components

  • The objective of Class 2 is to provide a level of reliability higher than that defined in Class 1 to reduce the risk of downtime due to component failure. In Class 2 facilities, there is a moderate risk of downtime due to planned and unplanned events. Maintenance activities can typically be performed during unscheduled hours.
  • In this Class, the critical power system would need redundancy in those parts of the electrical distribution system that are most likely to fail. These would include any products that have a high parts count or moving parts, such as UPS, controls, air conditioning, generators or ATS. In addition, it may be appropriate to specify premium quality devices that provide longer life or better reliability.

Class F3 – Concurrently Maintainable

  • The objective of Class F3 is to provide additional reliability and maintainability to reduce the risk of downtime due to natural disasters, human-driven disasters, planned maintenance, and repair activities. Maintenance and repair activities will typically need to be performed during full production time with no opportunity for curtailed operations.
  • The critical power system in a Class 3 facility must provide for reliable, continuous power even when major components (or, where necessary, major subsystems) are out of service for repair or maintenance. To protect against unplanned downtime, the power system must be able to sustain operations while a dependent component or subsystem is out of service.

Class F4 – Fault Tolerant

  • The objective of Class F4 is to eliminate downtime through the application of all tactics to provide continuous operation regardless of planned or unplanned activities. All recognizable single points of failure from the point of connection to the utility to the point of connection to the critical loads are eliminated. Systems are typically automated to reduce the chances for human error and are staffed 24×7. Rigorous training is provided for the staff to handle any contingency. Compartmentalization and fault tolerance are prime requirements for a Class F4 facility.
  • The critical power system in a Class F4 facility must provide for reliable, continuous power even when major components (or, where necessary, major subsystems) are out of service for repair or maintenance. To protect against unplanned downtime, the power system must be able to sustain operations while a dependent component or subsystem is out of service.

The “F” designation for each class is used to represent the “facility”. Although the current BICSI standard defines only the reliability of the facility systems, it is recognized that defining reliability for the remaining services included within the Enterprise Data Center will need to be completed.

This now takes us to the next steps in developing the data center standards. The development of additional open standards that extend beyond the facility will be the evolution of these guidelines. Reliable Resources has developed processes and guidelines to align the reliability criteria of the facility with the critical technology systems in the Data Center Computer Room.

Again, we view the “Data Center” not as a facility or building, but as a service that is supporting critical business processes. The Data Center Services consist of the following:

  1. Facility: space, power, cooling, structural integrity, work & equipment flow/adjacencies
  2. Communications Physical Infrastructure: copper and fiber cable plant, pathways, racks/cabinets
  3. Network: WAN access facilities from the building to the service providers node that is meshed within their broader network, WAN services, LAN architecture and topology, SAN architecture and topology
  4. Data Processing and Storage Platforms: enterprise architecture of the processing and storage systems, high availability, clustered systems, grid computing, cloud computing, virtualized systems.
  5. Application: appliance based solutions, ability to implement virtual motion, grid computing.

The Reliability Classification model shown is a framework developed by Reliable Resources to align the reliability classification of the Network, Systems and Application data center services with the framework used to classify the facility reliability. We see this as an important next step as the data center is becoming more intelligent with the integration of processing loads with facility power and cooling provisions, more fluid with the ability to virtually move applications and data throughout the enterprise, and complex with the implementation of a hybrid approach to “cloud computing”, a subset of enterprise applications in the cloud and a subset within the end user’s facility.

Reliable Resources is a strong proponent of open industry standards and has frequently made intellectual property developed internally available to the industry. This has been demonstrated with the contributions included in the development of the TIA-942 and the pending BICSI standard.