OpenMP Does Not Scale – Or Does It?

While at the Parco-Conference two weeks ago, I had the pleasure to meet Ruud van der Pas again. He is a Senior Staff Engineer at Sun Microsystems and gave a very enlightening talk called Getting OpenMP Up To Speed. What I would like to post about is not the talk itself (although it contains some material that I wanted to write about here for a long time), but about the introduction he used to get our attention. He used an imaginary conversation, which I am reprinting here with his permission. Only one part of the conversation is shown, but it’s pretty easy to fill in the other one:

Do you mean you wrote a parallel program, using OpenMP and it doesn’t perform?

I see. Did you make sure the program was fairly well optimized in sequential mode?

Oh. You didn’t. By the way, why do you expect the program to scale?

Oh. You just think it should and used all the cores. Have you estimated the speed up using Amdahl’s Law?

No, this law is not a new European Union environmental regulation. It is something else.

I understand. You can’t know everything. Have you at least used a tool to identify the most time consuming parts in your program?

Oh. You didn’t. You just parallelized all loops in the program. Did you try to avoid parallelizing innermost loops in a loop nest?

Oh. You didn’t. Did you minimize the number of parallel regions then?

Oh. You didn’t. It just worked fine the way it was. Did you at least use the nowait clause to minimize the use of barriers?

Oh. You’ve never heard of a barrier. Might be worth to read up on. Do all processors roughly perform the same amount of work?

You don’t know, but think it is okay. I hope you’re right. Did you make optimal use of private data, or did you share most of it?

Oh. You didn’t. Sharing is just easier. I see. You seem to be using a cc-NUMA system. Did you take that into account?

You’ve never heard of that. That is unfortunate. Could there perhaps be any false sharing affecting performance?

Oh. Never heard of that either. May come handy to learn a little more about both. So, what did you do next to address the performance ?

Switched to MPI. Does that perform better then?

Oh. You don’t know. You’re still debugging the code.

What a great way to start a talk on performance issues with OpenMP, don’t you think? And he manages to pack some of the most important problems while optimizing not only OpenMP-programs, but all parallel programs into a tiny introduction. At the end of his talk, he continued the imaginary conversation as follows:

While we’re still waiting for your MPI debug run to finish, I want to ask you whether you found my information useful.

Yes, it is overwhelming. I know.

And OpenMP is somewhat obscure in certain areas. I know that as well.

I understand. You’re not a Computer Scientist and just need to get your scientific research done.

I agree this is not a good situation, but it is all about Darwin, you know. I’m sorry, it is a tough world out there.

Oh, your MPI job just finished! Great.

Your program does not write a file called ‘core’ and it wasn’t there when you started the program?

You wonder where such a file comes from? Let’s get a big and strong coffee first.

I am sure the MPI-crowd doesn’t really approve this ending, but I found the talk way more entertaining than the usual talks at conferences. Of course he is teasing, of course he is exaggerating, but that’s OK when you are a presenter and want to get your point across. Of course it also helps to put a smile on your face so your audience knows you are not a die-hard fanatic. :smile:

By the way, this was not really the end of Ruud’s talk, he went on just a tiny bit further to pitch a new book on OpenMP, for which he is an author. Called Using OpenMP, this is a book I have been looking forward to for a while (and not just because the main author, Prof. Barbara Chapman is the second advisor for my thesis). Maybe I can finally add a real recommendation for a book on OpenMP to my list of recommended books on parallel programming.