An Interview with Dr. Jay Hoeflinger about Automatic Parallelization

When I started my PhD.-thesis a couple of years ago, I took some time to look at auto-parallelizing compilers and research. After all, I wanted to work on making parallel programming easier, and the best way to do that would surely be to let compilers do all the work. Unfortunately, the field appeared to be quite dead at that time. There has been a huge amount of research done in the eighties and nineties, yet it all appeared to have settled down. And the compilers I tried could not parallelize more than the simplest loops. I have always been asking myself why this was the case, and when I had the chance to talk to Dr. Jay Hoeflinger, he had some very interesting answers for me. He agreed to let me re-ask these questions in an email interview and this is the result. Thanks, Jay, for sharing your knowledge!

Michael: First of all, please share a little bit about your biography and background on automatic parallelization with us!

Jay: I have a BS (1974), MS (1977) and PhD (1998) from the University of Illinois at Urbana-Champaign. I filled in the large gap between the MS and the PhD with various things. I did some scientific programming, some business programming, and then in 1985 I joined the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. I worked on the Cedar Fortran parallelizing compiler, which was being built to drive the Cedar supercomputer being built at CSRD. We had high hopes that a parallelizing compiler would be the “silver bullet” that would allow an easy way to program parallel machines, which we viewed as being inevitable.

We made some big strides in understanding the challenges facing parallelizing compilers, but by the time CSRD lost its funding, we were far from solving the problems.

After CSRD, I stayed at the University of Illinois and worked on the Polaris parallelizing compiler. Many talented students contributed to Polaris, and it had good success over the years. I contributed to the compiler and was involved in many papers about the work. Finally I decided to work on a PhD degree, attacking the problem of helping the compiler represent the complex data access patterns of programs. I thought that the only way to really solve the automatic parallelization problem was to give the compiler a richer data structure for representing data accesses in a program.

I finally got the PhD degree in 1998, but it was obvious at that time that automatic parallelization by the compiler alone was not near a solution. A richer data structure for the compiler helped, but still the compiler had too little information about the program. The only hope was to do something at runtime. Of course, giving the compiler a richer way to represent complex data accesses forms a basis for that, and that work is finding some success, but a general solution of the problem remains out of reach.

In 1998 I went to work for the Center for Simulation of Advanced Rockets as a Senior Research Scientist. I studied various aspects of parallelizing the rocket simulation codes being developed at the Center.

In 2000, I joined Intel and have worked on the OpenMP runtime library and the Cluster OpenMP runtime library. While working for Intel I also have been involved in the OpenMP 2.0, 2.5 and 3.0 language committees. I’m a Senior Staff Software Engineer and the team lead of the Cluster OpenMP project.

Michael: Wow, now I know why you are the man (TM) to talk to about automatic parallelization :smile: . Let us start the actual interview with the current state of automatically parallelizing compilers: What are they able to do today?

Jay: Vectorization technology and simple parallelization has moved into mainstream commercial compilers. This is because the students that worked on parallelization in the 1980s and 90s now form the core of various commercial compiler teams. Many of the techniques used to attack parallelization have moved into commercial compilers and are supporting both parallel and serial optimization.

Today, parallelizing compilers can be fairly successful with Fortran77 programs. The language is simple, making it more possible to represent what the program is doing in the compiler’s internal data structures. Success with C has been more limited. In the 90s a lot of work was done on parallelizing Java programs, with some success. In the last 10 years work has been done on parallelizing domain-specific languages, like Matlab.

As people realized how hard the parallelization of programs at compile time was, they started turning to parallelizing at runtime. Many projects have studied this problem and there has been some success. Parallelization at runtime has the distinct advantage that all the information about variable values is available. Some of the complicated mechanisms that were developed to symbolically simulate execution in the compiler and propagate data and program attributes throughout the program aren’t needed at runtime.

Now, research compilers can work together with the runtime system to parallelize a code. For instance, if a compiler is trying to prove a certain relation in order to parallelize a given program loop, and is unable to prove it, the compiler could compile a runtime test into the program that can determine whether the relation is true. Parallelization decisions can then be based on the result of the test.

Michael: Ok, now that we know what they can do let’s talk about the more important part: what are they not able do? And what are the reasons for that?

Jay: The key to parallelization is finding a way to divide up the work of the code such that a write to a memory location by one thread is always separated from another thread’s access to the memory location by a synchronization. So, uncertainty about which data is being accessed by which threads really hampers parallelization.

This uncertainty can result from data values not being available to the compiler, for instance, if they are read from an input file at runtime or simply calculated at runtime based on a complex formula. This means that compilers have difficulty with adaptive codes that adjust the way they access data, based on complicated functions or external information. Some very clever techniques have been devised to allow the compiler to parallelize even in these complex situations, but typically such techniques require long compile times.

In the commercial world, compile time is much more important than it is in the research world. So, complicated techniques that can parallelize loops in a research compiler by causing compilation to take a long time, may not be implemented in a commercial compiler – at least they would not be used by default. So, commercial compilers are usually limited to rather simple parallelization techniques.

Achieving the same success in C or C++ as we see in Fortran77 is much more difficult because of the presence of pointers. Pointers create a big problem for compilers because they obscure what data is actually being accessed, making the problem of understanding data access even harder than it would be otherwise.

Michael: Is there hope? Compiler writers are some of the smartest people in computer science and parallel programming is a huge problem to attack, what kind of help can we hope for from our compilers in the near future?

Jay: I think there is some hope in limited domains. There is some hope for parallelizing domain-specific languages, for instance. Matlab would be an example of a language that is great for linear algebra, but not so much for other things. A Matlab compiler doesn’t have to consider all the things that a C++ compiler has to consider.

The thing you have to understand is that compilers are enormous programs, typically written by a large group of experts. Even without automatic parallelization, the modern commercial compiler is extremely complex, and has a very large amount of source code. The software engineering challenges of design and testing for such a large program are already daunting. Adding the large amount of source code that would be required to do a really good job of parallelization would only exacerbate this problem.

So, I think it is unlikely that compilers in the near future will do a significantly better job of parallelization. This is especially true due to the rising popularity of explicitly parallel languages and the increasing willingness of universities to teach people about parallel algorithms and how to design their codes for parallelism from the start.

I believe that compiler vendors are more interested in increasing support for the explicitly parallel languages than they are in trying to do automatic parallelization.

Michael: Could a change in languages help? You mentioned earlier that Fortran 77 code is easier to optimize / parallelize for compilers than e.g. C/C++ code. Could a switch to a more restricted / more regular or just different language (maybe even a new one) change the situation and allow our compilers to help us more?

Jay: There is a tension between the expressibility of a language and the ability of compilers to properly optimize it. A language with a high degree of expressibility may be easier for humans to use, yet at the same time it almost certainly is more complex and therefore more difficult for a compiler to understand, making it harder to parallelize.

A program written in a language with limited expressibility (e.g. Fortran77) is therefore easier to parallelize, yet harder for a person to write programs with. Defining a new language that’s sufficiently rich in expressibility, yet simple enough to be routinely successfully parallelized is a difficult challenge. It seems possible, but no good solution yet.

This is why OpenMP has become popular. OpenMP has the serial program embedded in it, but allows the programmer to tell the compiler where the parallel parts are. The programmer’s knowledge of the code is put to its best use. The programmer typically knows that conceptually, a given region of the program consists of independent computation, and can just say so in a very simple way to the compiler.

Michael: Where do you see automatically parallelizing compilers in 5 years from now? Will some of the problems you have already sketched be solved by then? Any silver bullets on the horizon? Please take a look into your crystal ball 😉 !

Jay: I think there’s hope for domain-specific compilers as I explained above. There may be some hope for runtime parallelization using virtual machines or maybe just runtime tests compiled into the code.

But, for the most part, parallel code will still need to be designed and written by experts. Along these lines, I think parallel libraries will be much more prominent in 5 years. These libraries allow people to write serial programs that call the library routines to solve various problems in parallel, in which the library routines themselves have been optimized by experts.

So, the emphasis has to shift to people writing their own parallel code, not relying on compilers. Universities will need to do a better job of producing parallel experts.

For the most part, I think automatic parallelization will remain an unfulfilled dream.

So, unfortunately, I don’t see any silver bullets. Automatic parallelization remains an extremely interesting research problem, though. Maybe one day, when someone much smarter than I has a great idea …

Michael: Thank you, Jay, that was very enlightening! And although the outlook at the end may sound pessimistic, I am looking forward to interviewing the smart guy you have mentioned. You never know – he may be right around the corner implementing his revolutionary idea in a compiler as we speak…