Here in the northern hemisphere, summer has rolled round again, bringing record temperatures across Europe and North America with it. While a cold shower or an icecream might be enough to cool us down, things aren’t so simple for the devices that we trust to store our data. Have you ever wondered why data centres are so cold? The answer is slightly more complex than “to keep the IT people cool”.
We all know that excess heat is bad for our devices. You may have even seen pictures or videos of competitive overclockers using liquid nitrogen to stop their CPUs overheating, and if you haven’t: tomshardware has got you covered. However, proper cooling isn’t just important for CPUs, it is also important for storage devices.
First, it can be beneficial to define what is a good temperature for a storage device. Although the exact numbers will vary from manufacturer to manufacturer, some basic outlines are
Solid State Drive (SSD) - 0 °C to 60°C (32°F to 140°F)
Hard Disk Drive (HDD) - 0ºC up to 70ºC (32ºF to 158ºF)
For more information you should read the manufacturer’s specification. This example is from a Samsung 870 EVO:
What happens if the temperature of the drives rise above this? Well, basically, nothing good. Many modern devices are smart enough to monitor their own temperature via their firmware. If temperatures reach a certain level, they may throttle their own performance to prevent further overheating. Whilst this is good as it prevents the drive from further overheating and potentially damaging components, this may also lead to reduced performance including reduced I/O performance. This large-scale case study of flash memory failure in the field at Facebook’s datacentres notes:
“Higher temperatures lead to increased failure rates, but do so most noticeably for SSDs that do not employ throttling techniques.”
Not only have they noticed a correlation between shorter flash storage lifespan with higher temperature, but drives that don’t implement performance throttling at higher temperatures had noticeably shorter lifespans than their counterparts that do.
In this whitepaper, innodisk also draw a correlation between temperature and a decrease in data retention.
“Data retention decreases in higher temperatures and with increasing program/erase (P/E) cycles as both these factors induce a higher rate of charge leakage.”
Charge leakage occurs when the dielectric barrier designed to trap a charge in the flash cell degrades, if the dielectric degrades too much, the cell can no longer hold a charge and thus can’t be programmed. For more information about the structure of NAND flash cells, I recommend reading this blog post. Higher temperatures mean that charged particles in the flash cell move more, further eroding the dielectric layer.
As for HDDs, in this article by Backblaze they indicate that in their datacentre there was no significant correlation between drive temperature and drive failure rate. In their ‘Hard Drive Temperature Takeaways’ they state:
“As long as you run drives well within their allowed range of operating temperatures, keeping them cooler doesn’t matter.”
However, this leaves a lot of room for speculation if the HDD exceeds the operating temperature, something that might not be that difficult as temperatures spike globally, and the average user does not have datacentre cooling to rely on.
The good news, however, is that there are steps we can take to help our storage devices out. Most methods to keep data cool may seem like common sense: from keeping devices away from direct sunlight, placing the device in an area with good air circulation, to avoiding heavy workloads in high temperatures and turning the device off when not in use; there are a lot of things we can do to reduce the temperature of our devices.
On the very hottest days, if you can’t keep your storage cool, just turn it off; that’s what Google and Oracle do.
Keeping devices as cool as possible will not only prevent potential damage from overheating but many also improve data retention.