The Derechos, the namesake of the new supercomputer coming to the National Center for Atmospheric Research (NCAR), are swift, swift bands of thunderstorms. Indeed, NCAR itself is moving quickly and ambitiously with the new system – its third major installation since 2012. Irfan Elahi, director of NCAR’s high performance computing division, recently spoke with HPC wire on Derecho’s development and schedule, as well as future plans.
Nuts and bolts
First, the specs: Derecho, built by HPE, will be water-cooled and primarily powered by third-generation AMD Epyc Milan processors and 40GB A100 graphics processors from Nvidia, with 2,488 dual-socket nodes just for processor (256 GB of memory per node) and 82 single-socket heterogeneous nodes (four A100s and 512 GB of memory per node). In total, the system is equipped with 692TB of total memory, 328 A100 GPUs and 5,058 Milan CPUs, all connected by HPE Slingshot v11 networking.
This combined material will provide 19.87 peak petaflops, more than triple the performance of Derecho’s predecessor, Cheyenne (5.34 peak petaflops). Cheyenne, installed in 2016, was itself preceded by the Yellowstone system of 1.26 petaflops peak in 2012.
Derecho’s firepower will be deployed in the service of all atmospheric and many environmental things, with Elahi noting applications ranging from severe weather (thunderstorms, tornadoes and hurricanes) and climate change to water availability. , forest fires, renewables, underground oil and gas flows, solar storms and more. “The supercomputer will primarily enable research that will lead to more detailed and useful predictive capabilities that will have significant societal benefits,” he said, “in particular by becoming more resilient to climate change”.
A storm that gathers
HPE’s victory was announced in January this year, but plans for Derecho had been in the works for years. “We launched this project at the end of summer 2018 and we started by doing a workload analysis study,” Elahi said. “We also then created a panel… and I think it had 43 different members – diverse both in terms of gender. [and] ethnicity but also in subject matter expertise, because we wanted to… examine that subject matter expertise across Earth Systems Science, and we worked with the Scientific Requirements Advisory Committee to get their requirements[.]”
Through this process, NCAR developed a series of benchmarks to measure a new system, which was then called NWSC-3 after the NCAR-Wyoming Supercomputer Center (NWSC) where NCAR’s supercomputers were housed. With the references in hand, NCAR issued an RFI for the system, working with “four or five” potential suppliers, including through a workshop that brought together researchers and suppliers to develop a strategy for the construction. of the system. After the tender was published and the dust had fallen, Elahi said NCAR picked the best value – not the lowest cost – and landed on HPE.
Derecho is however still a bit far away. First, a test system – Gust – will be launched around February 2022. Then Elahi said: “Derecho itself will be delivered in mid-March – the first quarter of next year.” Once it’s installed and tested – a six to seven week process, he said – NCAR will open up the system to its first external users. These users will come from the Accelerated Science Discovery (ASD) program, which solicits proposals from researchers whose projects involve “actionable science” related to the core objectives of NCAR. These ASD users, Elahi explained, would help test the system for a few months over the summer before its wider launch and help NCAR jump into the next generation of supercomputers. “The whole idea of ASD lies in these new upcoming applications,” he said.
“After ASD, we’ll open it up to the entire user community,” Elahi continued. However, Cheyenne – which Elahi says is a remarkably reliable system, with only one power failure in recent years – will continue to operate until “around the end of December 2022”. “In order to help our users transition and migrate to the new environment, we want to provide them with a six month overlap,” Elahi said.
Derecho will be hosted in the NWSC data center, which Elahi said was LEED Gold certified. “The most important thing, I think, is the energy efficiency of the applications,” he said, speaking of the durability of the new system. A So, he said, Derecho will produce about three to three and a half times more flops per watt than Cheyenne.
Looking further into the future, Elahi noted that computer processing was stagnating, pointing instead to technologies such as accelerators, GPUs, FPGAs and AI as sources of greater computing power and efficiency. And, he said, NCAR would look to push the angle of efficiency even further for its fourth system. “One of the things we want to do for our next system is also to consider the carbon footprint and sustainability as a specification,” he said.