I just visited a data center of a branch in Thailand. I was headed down to Thailand anyway, and as they were having some problems and I like troubleshooting I said I would stop by.
For some reason, the data center has been extremely hot and they have not been able to contain it. This has been an ongoing problem, ever since adding just one rack of equipment to the room. Nothing in that rack was all that big, and so it couldn’t have been the problem. Yet, the temperatures in the room steadily rose. It was probably 90+ degrees in the room.
They have been battling this problem for a couple of months, and it is getting critical enough that they fear equipment will be damaged. In fact, they have barely been able to keep up with it. Something has been continually generating more and more and more heat each week.
It is a perplexing problem. They have strip charts showing the average temperature. Sure enough, the room has been getting hotter and hotter each few weeks, starting with the one rack of equipment added. Bizare.
I was taken to this mischievous rack and saw very little inside of it. A chassis with some blade computers and some networking gear. I asked what else had changed in he data center and was told: nothing. It’s all the same equipment.
To call this a “data center” is to add some politeness to what is really an oversized closet.
To combat the heat the staff has purchased a roll-around air conditioner. Then, they purchased another. And another. And another. by this time They had 5 tonnes of air conditioning added to handle the extra load from this one new equipment rack. Temperatures were at an all-time high. The staff was flummoxed.
Then, I saw what the problem was. I thought of Fukushima Reactor #4. And then I said to myself: “Forgive them, for they know not what they are doing.”
They had lined up the roll-around air conditioners along a wall where they had power outlets. This bank of roll-rounds was dutifully blowing out a massive wave of cold air.
I smiled, and asked the staff to turn off all of the roll around air conditioners. They howled that the room would instantly get so hot that the equipment would be permanently damaged. They didn’t have the budget to replace everything, nor did they have the parts. Again, I asked them to just turn all of them off. Well, they insisted that I out the order in writing and that hey get permission from their boss who wanted to make certain that HQ knew I was about to melt down their entire data center. I hesitated in explaining why I wanted to turn them off because that would likely make things worse.
Well, finally, they agreed and we turned each one off. I said: “And now, we go for lunch.”
After lunch we returned and the data center was distinctly cooler than it was before lunch. I wouldn’t say it was cold, but it was nothing like it was before.
The staff was mystified.
I explained that the new rack gave off a small amount of heat and caused the room to enter thermal runaway, just like Fukushima Reactor #4. Adding the air conditioners made things worse. The more they added, the worse the problem got.
Everybody had a perplexed look.
I asked: why do you think the air conditioners helped? The answer was because they blew out cold air, of course. And, where does the heat go? Blank stares. So I explained: the heat goes out the back of the air conditioner. Each unit is blowing cold air out the front and hot air out the rear duct. You see, they hadn’t bothered to duct the heat out of the room. I can’t make something like this up, honest.
For those who may still not get it, an air conditioner is a heat pump. Heat is pumped out the back and cold is pumped out the front. They are in exact proportion to each other. It the air conditioner itself consumes about 1.5KW of electricity, and this becomes heat output. So an unvented air conditioner just circles the air, heating and cooling in equal proportions, but adding 1.5KW of heat to the room by its own compressors. The five units added 7.5KW of heat, which is enough to heat a small house.
What caused the overheating problem in the first place is more interesting.
The room had a dedicated air conditioner. It did its job adequately. But it had a capacity limit, of course. The staff over the years added small increments if equipment to the room and never had a problem. The heat load was always under the capacity of the room air conditioner. Eventually, the heat load equalled the capacity of the air conditioner. And then, finally, with the new rack, the heat load just very barely, minusculy exceeded the capacity of the air conditioner. The room was sealed, and so that very tiny amount of heat beyond the capacity of the air conditioner would build up and up and up.
There had been a warning sign: in the afternoons the room was always warmer, even though it had no windows. It was warmer because the outside portion of the air conditioner depends upon Td, or the difference between the outside air and the heat within the unit. As Td got smaller, the air conditioner pumped less heat.
I don’t know how many small-time data centers I have been in where they are warmer in the afternoon, despite having no windows. This is a sure-fire sign that the air conditioner is at capacity. Add just a couple pieces of equipment and you can get thermal runaway.
Anyway, we took just one of the roll-around units and vented the heat outside. The room chilled down completely.
I am forever amazed that the sorts of problems we had 40 years ago still plague us today. Some lessons, like this one, must be learned by each generation.
(Cross-posted @ TalkingPointz Colin)