How do you keep a data center up and running 24/7/365? A lot of our clients have some idea how this is done, but even those who see the server rooms don’t get the full picture. To give you a better idea of how this all works, today we’re going to give you a small behind-the-scenes look at our Berzarina data center in Moscow.
An important property of any data center is fault tolerance. At Berzarina (and all Selectel data centers), we achieve this by maintaining redundant configurations of all key infrastructures, such as cooling and power. Let’s start with these.
Berzarina is equipped with UNIFLAIR precision air conditioners.
The system uses a liquid coolant to transfer heat between the main cooling machines (chillers) and air-cooling nodes (fan coils). The coolant circulates under relatively low pressure.
The installation also includes a water pump station, an automatic regulating subsystem, and the piping that connects them all. The workload is most intense during the summer, when outside temperatures are their highest. For the rest of the year, the system uses free cooling, which utilizes the low outside temperature as a natural coolant. This significantly reduces the load on the chillers. Similar technology is used in a lot of data centers; Microsoft, for example, uses free cooling in their data center in Dublin, Ireland.
The pump station is vital to the data center. Here, pumps work around the clock to move an endless stream of coolant from the chillers to the fan coils.
Redundancy requires the system have at least two working pumps. We’ve installed three pumps that work in shifts. Every 10 hours, one of the working pumps shuts down and an idle pump takes over.
This ensures the pumps run for equal amounts of time. Additionally, if one pump fails, the system won’t be affected in any way. Our engineers take readings of pumps’ indicators and condition as part of their routine walkthroughs of the center. To check on the chillers, we have a separate cooling-system control panel. This is monitored 24/7.
Our server cabinets are arranged to create two climate zones within the server area: cold aisles and hot aisles. Two rows of racks are placed facing one another. Cold air comes from under the perforated raised floor and passes through the servers, cooling their components. This is the cold aisle and it’s kept at +20±2°С.
Air that is heated by the servers is exhausted behind the racks to what’s called the hot aisle. Here, fan coils pull in the hot air for cooling.
Information on hot and cold aisle temperatures is sent to the on-duty system engineer every 30 seconds.
If temperatures exceed the permissible level, an alarm will go off. During their walkthroughs, engineers measure the temperature of equipment with an infrared thermometer. If we discover that a client’s equipment is overheating, we immediately notify the client and report the temperature.
The uninterrupted flow of power to our racks is one of our top priorities. Our primary distribution board receives power from three independent feeds: two from different transformer substations and one from our diesel generators.
The transformer feeds work in tandem and the load is divided evenly between them. If one loses power, an ATS (automatic transfer switch) instantly transfers the load to the second stream, effectively avoiding down time.
In the event of a sudden power outage (like if the city’s power grid were to go down), our General Electric UPS clusters take over.
The command is automatically given to launch the diesel generators three seconds after the power goes down. Two minutes later, they are fully operational and they assume the entire load. We use high-performance Gesan diesel generators with Volvo Penta engines. They can produce up to 504 kW of power. This is how the data center can work at full capacity without any downtime. The on-site fuel supply is enough for 10 hours and can always be replenished if necessary.
Every month, we start up the generators and check the fuel, oil, and antifreeze levels. We periodically perform a trial run, whereby conditions are recreated to imitate a full power outage. The diesel generators are automatically started and assume the data center’s latest workload. The engines typically start better in the summer than they do in the winter, which is why they’ve been equipped with block heaters and are calibrated for a guaranteed start at temperatures as low as -30°C.
Even when working with the most reliable equipment, there is always the risk of a short circuit or fire. This is why all of our data centers are equipped with automatic fire suppression systems. These ensure fire seats get fully extinguished without causing harm to any equipment. This is done with a gaseous suppression system.
In this case, the combustion reaction is chemically inhibited. The system releases a gaseous suppression agent (HFC-125) to the area. Once this gas enters the combustion zone, it rapidly disintegrates into free radicals that react with the primary burning agents. The combustion slows until it is fully extinguished.
The automatic fire alarms have low reaction times. To ensure people have time to evacuate, the fire suppression system works on a slight delay.
In our facilities, the system first allows 30 seconds for evacuation and then activates. To avoid false starts, the system initiates fire suppression only if at least two fire detectors are triggered.
Evacuation is crucial: the gas forces out a large percentage of the oxygen in the area and visibility is reduced to tens of centimeters. Our engineers have been instructed on what to do if the system is ever triggered.
Monitoring and Response
All equipment is under constant observation, and system engineers can receive up-to-date information on any device at any given moment. This is how we ensure an immediate response to any malfunction or emergency.
Walkthroughs of the entire data center are performed several times a day. During these walkthroughs, we identify all potential malfunctions and inform the relevant parties. For the most part, this gives us the confidence to say that our data centers are ready for any situation and to work autonomously for however long may be needed.
Ensuring the constant operations of a data center is far from a trivial matter. Every little component that may break down must be backed up with additional equipment. Regular walkthroughs and monitoring let us diagnose probable causes of failure and take proper courses of action. Periodically replacing old equipment, developing a more thorough monitoring system, and a flexible approach to monitoring–this is what we do everyday to ensure our clients that their data and projects are protected and available 24/7/365.