New Approaches to Producing High-Performance Code, Thanks to Compiler Technology


Mary_HallBy CRA Board Member Mary Hall

What does it take to produce application code that performs as close as possible to a parallel architecture’s compute or memory peak performance? This question is one that programmers of high-performance architectures contemplate regularly since using such systems efficiently can solve problems faster, or solve larger or more complex problems.

This question fundamentally changes the approach to programming.

Programming is no longer simply about the correct specification of an algorithm, but expands to understanding and exploiting features of the target architecture in all aspects of an application: algorithm choice, data structures and data layout, where to exploit parallelism, how to make the best use of the memory hierarchy, and how to avoid costly communication and synchronization between cooperating computations. Building applications while addressing performance and scalability concerns is difficult and frequently leads to low-level software that exposes architectural details. If the application is successful enough to outlive the target architecture, then this process must be repeated.

My research focuses on building compiler technology for deriving high-performance, architecture-specific code from a higher level specification. I usually work with computations that arise in mathematical libraries and scientific simulations, and often partner with computational scientists and application teams.

I find this to be an exciting research area for many reasons. I started out working on compiler technology because I appreciate the elegant algorithms and abstractions in compilers. The concrete realization of these algorithms in working and faster code is tangible.

I developed an interest in high-performance architectures because increasingly powerful hardware is exciting.  Getting early access to new systems keeps the research current and helps scientists determine how to make best use of such systems.

My subsequent interest in working primarily on mathematical algorithms and science applications appeals to my interest in these areas. Advancing science has its own rewards. Finally, I like the human side of working with application teams, understanding their requirements, and gaining insights into how to improve their workflow as well as their code.

My Ph.D training at Rice University taught me the foundations of building compilers for parallel architectures before such architectures were ubiquitous. I learned analysis techniques to identify parallelism in codes written in a general-purpose language, and verify correctness of transformations on this code that optimize in an architecture-specific way. My postdoctoral training at Stanford University taught me the fundamentals of computer architecture and fleshed out my understanding of the relationship between software and its mapping to hardware.

My later research has explored how to map applications to a variety of architectural features: shared-memory parallel architectures, complex memory hierarchies, processing-in-memory architectures, field programmable gate arrays, single-instruction-multiple-data compute engines, graphics processing units, and many-core architectures like the Intel Xeon Phi processor.

For more than a decade, my research has developed compiler and programming systems that automate performance tuning for parallel codes. The general approach, called autotuning, empirically evaluates a search space of possible implementations of a computation to identify the one that gives the best performance.  Autotuning gained popularity due to the growing complexity and diversity of modern architectures, to the point that accurately modeling the performance effects of optimizations in a compiler became infeasible.  By running the code on representative input data, the compiler can systematically test the impact of optimization strategies, and achieve performance comparable to the labor-intensive manual tuning of code. Although the high-performance computing community has demonstrated the effectiveness of autotuning, it is not in mainstream use due to overheads and required changes in the application development process. My current research seeks to address these limitations and migrate autotuning compiler technology into more common practice.

Over the past few years I have been interested in optimizing the particular application domain of sparse matrix and graph computations, which arise in molecular dynamics simulations, finite element analysis, machine learning, and data analytics.  Such applications have been considered mostly beyond the reach of parallelizing compilers due to the information required for optimization becoming only available at execution time. For example, static analysis lacks the necessary information to analyze access patterns in the presence of indirect array accesses (e.g., A[B[i]], where the contents of B are determined dynamically). Together with my collaborators, we have developed novel extensions to a mathematical representation of sparse matrix/graph computations that incorporates runtime information into compiler optimization to perform a variety of optimizations: runtime dependence testing, runtime code transformations, and selecting and transforming sparse matrix data representations.

Since I joined the CRA board in 2015, I have had the opportunity to work on important issues facing the computing community. Thanks to Tracy Camp, I have participated in a comprehensive study of the massive growth in enrollments in undergraduate computing programs, considering its impact on students and institutions.  I am now working with CRA to partner with programs and organizations that seek to broaden the participation of diverse groups in computing. It is my hope that my work with CRA will assist the field in managing and adapting to growth in ways that retain the excitement that first brought me into the field for today’s students while creating a diverse and resilient computing workforce.

About the Author

Mary Hall is a Professor in the School of Computing at University of Utah, where she has been since 2008.   She has co-authored numerous reports for government agencies, particularly NSF, DOE and DARPA, to establish the research agenda in compilers and high-performance computing. Professor Hall is an ACM Distinguished Scientist. She served as chair of the ACM History Committee for the past decade, and chair from 2009-2013. She has also served IEEE as a member of the Computer Society Award Committee and chair of the ACM/IEEE Kennedy Award Committee.  She received a Ph.D. in Computer Science from Rice University. Prior to joining Utah, Professor Hall was jointly a research associate professor and project leader at University of Southern California, and previously held research positions at Caltech, Stanford and Rice.