“The moment we believe that success is determined by an ingrained level of ability as opposed to resilience and hard work, we will be brittle in the face of adversity.” – Joshua Waitzkin
When talking about UPS resilience it’s more than just redundancy, it’s the relationship between Mean Time To Repair (MTTR) and Mean Time Before Failure (MTBF) resulting in availability, and the overall design considerations of the system.
We know that redundancy is a form of duplication, that increases the availability of a system, whereas resilience is the UPS’s ability to maintain continuous levels of service in the face of adversity, simply put, it has a low ‘likelihood of going down’.
The Uptime Institute oversees four classifications of site infrastructure topology based on increasing levels of redundant capacity components and distribution paths.
The four classifications are: Tier 1 = N with a single path, Tier 2 = N+1 with a single path, Tier 3 = N+1 with two paths, 1 active and 1 passive and Tier 4 = 2n+1 with two active paths. The Tier 4 classification enables the complete shutdown of one active path whilst the remaining active path protects the load, demonstrating the highest level of resilience.
When considering system design there is something to be said for choosing sophisticated technology and additional distribution paths to ensure datacenters stay safe. Whilst the most resilient systems are available and all datacentres want to be well protected, but to what degree and at what cost? The resilience level of UPS topology design is a choice, where the pros and cons need to be carefully assessed well before system installation to guarantee an appropriate level of protection. Usually, the selection comes down to resilience versus cost, whilst balancing risk. Where inevitably cost becomes the main driver. However, what if we flip this argument on its head and prioritize resilience? What factors go into creating the most resilient and safest UPS and how can this impact positively on the total cost of ownership. We know that resilience is the likelihood of the system ‘not going down’ and becoming unavailable. An unavailable system is costly in terms of a damaged reputation, lost clients and lost revenue. When designing and selecting the UPS system with the highest level of resilience as a priority, we can consider a modular approach. Modular has many advantages over traditional monolithic-blocks including higher availability due to low a mean-time-to-repair plus the reduction of maintenance costs as the replacement of modules is quick and straight forward. The latest generation of modular systems are more energy efficient too and so running costs are minimized. Simon Roger, Facilities Manager Sure Data Centre explains: “Sure has seen an 18% saving on energy costs in the Data centres since the installation of CENTIEL’s three phase truly modular CumulusPower UPS and has predicted a payback period of less than 3 years.
Certain modular topologies could help datacenters to increase their Tier rating, and as a result, attracting more clients. A traditional monolithic-block solution may be cheaper to purchase initially, but there are limitations on what the installation offers. A modular solution earns back the initial investment in superior technology over time. In this way, functionality (and resilience) is prioritized over and beyond initial CapEx budget considerations. Joined-up thinking at the outset, could alter the way the budgets are set over the long term and some forward-thinking clients are already making this approach.
Simon Roger, Facilities Manager Sure Data Centre continues: “We have been able to upgrade from a Tier 3 to a Tier 4 datacentre using the same infrastructure by working with CENTIEL and installing the company’s modular UPS CumulusPower. Becoming a Tier 4 datacenter means we can offer our valued client base an even better service with the highest levels of power protection while paying close attention to minimising costs.”
One of the big advantages of modular UPS technology is ‘speed of repair’. Many datacenters historically use traditional monolithic-block UPS in a parallel redundant configuration. Two separate UPS cabinets in parallel feeding a critical load = N+1 resilience. Failure of one UPS will not expose the system to failure and so the set-up is fairly reliable. However, repair of monolithic-blocks can take considerable time, as the repair will take place on site at sub assembly or possibly component level, or worst-case scenario the entire UPS needs to be replaced.
This is a huge risk to the load as you would be running on N rather than N+1 during this period. By contrast, an N+1 modular UPS only takes a matter of minutes to regain resilience, this is because the components and sub-assemblies are replicated throughout the system into individual UPS modules. This means that an entire module can be removed and replaced within the UPS, live (“Hot Swappable”) without any effect to the load, in a very short amount of time. With this type of architecture the mean-time-to-repair is significantly lower, the resilience is higher.
If we compare 1+1 parallel redundant configuration of monolithic-block architecture supporting a 200kW load, this will require Two 200kW UPS systems to support the load. A rack mounted modular 4+1 parallel redundant configuration could have five 50kW UPS modules supporting the same 200kW load. Although the component count has increased, the availability of the system has improved dramatically. This is due to the innovative architecture of a truly modular system, each UPS module can be ‘hot swapped’ and the MTTR is just a few minutes.
The higher initial purchase price of a modular system compared with a monolithic-block system is offset partly by its flexibility and scalability. A rack mounted modular UPS can be easily right-sized allowing datacenters to ‘pay as they grow’. Ongoing running costs are minimized with a correctly sized system and initial CapEx is reduced too.
Distributed Active Redundant Architecture (DARA), is a concept introduced by CENTIEL into its 4th generation UPS CumulsPower. The overall architecture is a completely decentralised one, where no common component can act as a single point of failure. DARA can take the downtime from seconds, to the milliseconds level. This active-redundant technology alongside the elimination of potential single points of failure and the true modular hot swap capability provides leading availability of 9 nines (99.999999999). It is a key element in selecting the most resilient UPS system.
For the highest level of resilience, your UPS system must be properly maintained. Right sizing is key to reducing the overall TCO and also reducing running and maintenance costs. Why pay for more than you need? By prioritising a resilient UPS, datacentres can install the most appropriate system with the least likelihood of it ‘going down’ and keep control of costs. At CENTIEL our design team has worked with data centres across the world for many years and we are at the forefront of technological development. We are the trusted advisors to some of the world’s leading institutions in this field. We are pleased to share our knowledge and experience to help our clients configure the most appropriate and resilient UPS system possible, minimising both risk and budgets in order to protect the power to datacentres and on behalf of their valued client base.
Article originally featured in Mission Critical Power December 2019