Moore’s Law requires a hug. The days of filling transistors with tiny silicon computer chips are numbered, and their life rafts – hardware accelerators – come at a price.
If programming an accelerator – a process in which applications offload certain hardware tasks on the system primarily to speed up that task – you need to build a new software support. Hardware accelerators can run some tasks faster than CPUs, but they are not available out of the box. The software must effectively use the instructions of the accelerators to make it compatible with the entire application system. This translates to a lot of engineering work that has to go on for a new chip that you compile code, in any programming language.
Today, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a new programming language called “Exo” for writing high-performance code on hardware accelerators. Exo helps low-level performance engineers transform very simple programs that determine what they want to calculate, into more complex programs that do the same thing as the specification, but much, much faster by using these special accelerator chips. Engineers, for example, can use Exo to make a simple multiplication matrix in a more complex program, running orders of magnitude faster by using these special accelerators.
Unlike other programming languages and compilers, Exo is built around a concept called “Exocompilation.” “Traditionally, a lot of research has focused on automating the optimization process for specific hardware,” said Yuka Ikarashi, a PhD student in electrical engineering and computer science and an affiliate of CSAIL who is a lead author of a new paper about Exo. “This is good for most programmers, but for performance engineers, the compiler is always stuck when it helps. Since compiler optimization is automatic, there is no good way to fix it if it is done wrong. and give you 45 percent efficiency instead of 90 percent.
With Exocompilation, the performance engineer returned to the driver’s seat. Responsibility for choosing which optimizations to apply, when, and in what order to externalize from the compiler, goes back to the performance engineer. This way, they don’t have to waste time fighting the compiler on one side, or doing everything manually on the other. At the same time, Exo has a responsibility to ensure that all of these optimizations are correct. As a result, the performance engineer can spend their time improving the performance, rather than debugging complex, optimized code.
“The Exo language is rather a hardware-parameterized compiler it targets; the same compiler can adapt to many different hardware accelerators, ”said Adrian Sampson, assistant professor in the Department of Computer Science at Cornell University. Instead of writing a bunch of messy C ++ code to compile for a new accelerator, Exo gives you an abstract, similar way to write the ‘shape’ of the hardware you want to target. Then you can reuse the existing Exo compiler to customize the new description instead of writing something new from scratch. The potential impact of work like this is enormous: If hardware innovators can stop worrying about the cost of creating new compilers for every new hardware idea, they can try and ship multiple idea. The industry could sever its reliance on legacy hardware that is only successful because of ecosystem lockdown and despite its ineffectiveness.
The highest-performing computer chips made today, such as Google’s TPU, Apple’s Neural Engine, or NVIDIA’s Tensor Cores, power scientific computing and machine learning applications by accelerating something called “key sub-programs,” kernels, or high-performance computing (HPC). ) subroutines.
Clunky jargon aside, programs matter. For example, something called Basic Linear Algebra Subroutines (BLAS) is a “library” or collection of such subroutines, dedicated to linear algebra computations, and can perform many machine learning tasks such as neural networks, weather forecasts, cloud computation, and drug discovery. . (It was so important that BLAS won Jack Dongarra the Turing Award in 2021.) However, these new chips-which require hundreds of design engineers-are just as good as this HPC software library allows.
For now, however, this type of performance optimization is still done by hand to ensure that every last computing cycle on these chips works. HPC subroutines consistently run at 90 percent-plus peak theoretical efficiency, and hardware engineers work hard to add an additional five or 10 percent speed to theoretical peaks. So, if the software isn’t aggressively optimized, all hard work will be wasted – which is exactly what Exo avoids.
Another important feature of Exocompilation is that performance engineers can describe the new chips they want to optimize, without having to modify the compiler. Traditionally, the definition of hardware interface has been maintained by compiler developers, but in most new accelerator chips, the hardware interface is proprietary. Companies have to maintain their own copy (fork) of an entire traditional compiler, modified to support their particular chip. It requires hiring teams of compiler developers in addition to performance engineers.
“At Exo, we externalized the definition of hardware -specific backends from the exocompiler. This gives us a much better separation between Exo – which is an open source project – and hardware -specific code – which “always proprietary. We’ve shown that we can use Exo to easily write code as performance as Intel’s hand-optimized Math Kernel Library. We’ve been actively working with engineers and researchers at many companies,” Gilbert said. Bernstein, a postdoc at the University of California at Berkeley.
The future of Exo includes exploring a more productive meta-language scheduling, and expanding its semantics to support parallel programming models to use it with more accelerators, including GPUs.
Ikarashi and Bernstein wrote the paper with Alex Reinking and Hasan Genc, both PhD students at UC Berkeley, and MIT Assistant Professor Jonathan Ragan-Kelley.
This work was partially supported by the Applications Driving Architectures center, one of JUMP’s six centers, a Semiconductor Research Corporation program co-sponsored by the Defense Advanced Research Projects Agency. Ikarashi is supported by the Funai Overseas Scholarship, Masason Foundation, and Great Educators Fellowship. The team presented the work at the ACM SIGPLAN Conference on Programming Language Design and Implementation 2022.