Taming the Parallel Beast

Many programmers seem to think parallelism is hard. A quick Internet search will yield numerous blogs commenting on the difficulty of writing parallel programs (or parallelizing existing serial code). There do seem to be many challenges for novices. Here’s a representative list:

  • Finding the parallelism. This can be difficult because when we tune code for serial performance, we often use memory in ways that limit the available parallelism. Simple fixes for serial performance often complicate the original algorithm and hide the parallelism that is present.
  • Avoiding the bugs. Certainly, there is a class of bugs such as data races, deadlocks, and other synchronization problems that affect parallel programs, and which serial programs don’t have. And in some senses they are worse, because timing-sensitive bugs are often hard to reproduce — especially in a debugger.
  • Tuning performance. Serial programmers have to worry about granularity, throughput, cache size, memory bandwidth, and memory locality. But for parallel programs, the programmer also has to consider the parallel overheads and unique problems, like false sharing of cache lines.
  • Ensuring future proofing. Serial programmers don’t worry whether the code they are writing will run well on next year’s processors — it’s the job of the processor companies to maintain upward compatibility. But parallel programmers need to think about how their code will run on a wide range of machines, including machines with two, four, or even more processors. Software that is tuned for today’s quad-core processors may still be running unchanged on future 16-, 32- or even 64-core machines.
  • Using modern programming methods. Object-oriented programming makes it much less obvious where the program is spending its time.
  • Other reasons that parallel programming is considered hard include the complexity of the effort, insufficient help for developers unfamiliar with the techniques, and a lack of tools for dealing with parallel code. When adding parallelism to existing code, it can also be difficult to make all the changes needed to add parallelism all at once, and to ensure that there is enough testing to eliminate timing-sensitive bugs.

Use Serial Modeling to Evolve Serial Code to Parallel

The key to success in introducing parallelism is to rely on a well-proven programming method called serial modeling. Using serial modeling tools and technique, programmers can achieve parallelization with enhanced performance and without synchronization issues. The essence of the method involves consistently checking and resolving problems, and beginning early in the process to slowly evolve the code from pure serial, to serial but capable of being run in parallel, to truly parallel.

The first step is to measure where the application spends time — effort spent in hot areas will be effective, while effort spent elsewhere is wasted. The next step is to use a serial modeling tool to evaluate opportunities for potential parallelization and determining what would happen if this code ran in parallel. This kind of tool observes the execution of the program, and uses the serial behavior to predict the performance and bugs that might occur if the program actually executed in parallel.

Checking for problems early in the evolution process, while a program is still serial, ensures that you don’t waste time on parallelization efforts that are doomed because of poor performance. You can then model parallelizations that resolve the performance issues or, if no alternatives are practical, focus your efforts on more profitable locations.

The tool can also model the correctness of the theoretical parallel program, and detect race conditions and other synchronization errors while still running the serial program. Although the program still runs serially, it is easy to debug and test, and it computes the same results. The programmer can change the program to resolve the potential races, and after each change, the program remains a serial program (with annotations) and can be tested and debugged using normal processes.

When the program has fully evolved, the result is a correct serial program with annotations describing a parallelization with known good performance and no synchronization issues. The final step in the process is to convert those annotations to parallel code. After conversion, the parallel program can undergo final tuning and debugging with the other tools. The beast has been tamed.