DrCCTProf Tutorial

Enabling Easy Fine-Grained Binary Analysis on
ARM and X86 Architectures

HELD IN CONJUNCTION WITH 2022 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'22)

When: 13:00 - 17:00 (EST), April 2nd, 2022.

Where: Zoom (can be found in the registration page)

Audience

Researchers, practitioners, and students interested in performace/correctness/security analysis of ARM/X86 binaries (compiled from C/C++/Fortran/Rust/Go).

Overview

Complex codebases with several layers of abstractions have abundant inefficiencies that affect the execution time. Inefficiencies arise from myriad causes such as developer's inattention to performance, inappropriate choice of abstractions, algorithms and data structures, ineffective or detrimental compiler optimizations, among others. Not all inefficiencies are easy to detect or eliminate with compiler optimization; compilers have inherent limitations of static analysis and optimization scope. Classical "hotspot" performance analysis tools are also incapable of identifying many kinds of software inefficiencies. Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize inefficiencies that masquerade in complex programs.

Dynamic binary-instrumentation tools are widely used in microscopic program introspection such as performance analysis, debugging, software security, among others. However, existing tools, such as Pin, DynamoRIO, and Dyninst, are difficult to use with their complex APIs. One needs to spend subtantial efforts to develop a useful tool. Moreover, existing tools do not provide efficient APIs to attribute runtime measurements to execution contexts--primarily the calling context. A detailed call path attribution of execution measurements enhances a tool's capability and usability.

In this tutorial, we will introduce DrCCTProf, a library for efficiently collecting execution-wide call paths and associating execution metrics with call paths for fine-grained analysis tools. DrCCTProf is based on DynamoRIO, which works for both ARM and X86 binaries. DrCCTProf is simple to use and effective in improving diagnostic capabilities of fine-grained analysis tools. We introduce simple, yet effective, DrCCTProf APIs that offer rich calling context capabilities. We will introduce advanced DrCCTProf features for attributing every memory access to the corresponding data object in the program. We will introduce DrCCTProf internals for advanced users. We will show example tools atop DrCCTProf for detecting certain classes of software inefficiencies such as dead stores and redundant computations. Using DrCCTProf for these clients, we will show how one can pinpoint software inefficiencies in large, complex code bases and show how one can gain a superior understanding of execution profiles. Using DrCCTProf's pinpointing capabilities we show how one can tune their code to eliminate inefficiencies and obtain significant performance improvements. Finally, we will show the visualization support for DrCCTProf, which facilitates intuitive visualization of huge amounts of data obtained from fine-grained analysis.

DrCCTProf is available on GitHub with MIT License.

Agenda (The time is EST)

13:00 - 13:10   Welcome Notes [slides][video]   Xu Liu, North Carolina State University

13:10 - 13:40   Introduction to DynamoRIO [slides][video] Derek Bruening, Google

13:40 - 14:30   Introduction to DrCCTProf and its Internal [slides][video] Xu Liu, North Carolina State University

14:30 - 14:45   Break

14:45 - 15:15   Pinpointing Program Inefficiencies with DrCCTProf Clients -- LoadSpy [slides][video] Pengfei Su, University of California, Merced

15:15 - 15:45   Identifying Inefficient Memory Stores with DrCCTProf Clients -- DeadSpy and RedSpy [slides][video] Milind Chabbi, Scalable Machine Research

15:45 - 16:15   DrCCTProf APIs and GUIs [slides][video] Qidong Zhao, North Carolina State University

16:15 - 16:25   Break

16:25 - 17:00   A hands-on lab: developing the first DrCCTProf client tool on ARM Qidong Zhao & Xu Liu, North Carolina State University

Organizers

Image

Xu Liu

Associate Professor at North Carolina State University

Questions

For questions about this tutorial please contact Xu Liu.

Related Publications

[SC'20] "DRCCTPROF: A Fine-grained Call Path Profiler for ARM-based Clusters", Qidong Zhao, Xu Liu, Milind Chabbi, The International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 15-20, 2020, Atlanta, GA, USA. Best Paper Finalist

[SC'20] "Zerospy: Exploring the Software Inefficiencies with Redundant Zeros", Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian, Xu Liu, The International Conference for High Performance Computing, Networking, Storage and Analysis, Nov 15-20, 2020, Atlanta, GA, USA.

[ICS'20] "What Every Scientific Programmer Should Know About Compiler Optimizations?", Jialiang Tan, Shuyin Jiao, Milind Chabbi, Xu Liu, The 34th ACM International Conference on Supercomputing, Jun 29 - Jul 2, 2020, Barcelona, Spain.

[ICSE'19] "Redundant Loads: A Software Inefficiency Indicator", Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, Xu Liu, The International Conference on Software Engineering, May 25 - Jun 1, 2019, Montreal, Canada. Acceptance ratio: 21% (109/529). Distinguished Paper Award

[ASPLOS'17] "RedSpy: Exploring Value Locality in Software", Shasha Wen, Milind Chabbi and Xu Liu, The 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, Apr 8-12, 2017, Xi'an, China. ASPLOS Highlights

[ISMM'16] "Characterizing Emerging Heterogeneous Memory", Du Shen, Xu Liu and Felix Xiaozhu Lin, The 2016 ACM SIGPLAN International Symposium on Memory Management (ISMM), Jun 14, 2016, Santa Barbara, California, USA.

[PACT'15] "Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations", Shasha Wen, Xu Liu and Milind Chabbi, The 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct 18-21, 2015, San Francisco, California, USA.

[CGO'14] "Call Paths for Pin Tools", Milind Chabbi, Xu Liu and John Mellor-Crummey, The 2014 International Symposium on Code Generation and Optimization (CGO), Feb 15-19, 2014, Orlando, Florida, USA.

[CGO'12] "DeadSpy: a tool to pinpoint program inefficiencies", Milind Chabbi and John Mellor-Crummey, The 2012 International Symposium on Code Generation and Optimization, Mar 31 - Apr 04, 2012, San Jose, California, USA.