Audience
Researchers, practitioners, and students interested in performace/correctness/security analysis of ARM/X86 binaries (compiled from C/C++/Fortran/Rust/Go).
Overview
Complex codebases with several layers of abstractions have abundant inefficiencies that affect the execution time. Inefficiencies arise from myriad causes such as developer's inattention to performance, inappropriate choice of abstractions, algorithms and data structures, ineffective or detrimental compiler optimizations, among others. Not all inefficiencies are easy to detect or eliminate with compiler optimization; compilers have inherent limitations of static analysis and optimization scope. Classical "hotspot" performance analysis tools are also incapable of identifying many kinds of software inefficiencies. Microscopic observation of whole executions at instruction- and operand-level granularity breaks down abstractions and helps recognize inefficiencies that masquerade in complex programs.
Dynamic binary-instrumentation tools are widely used in microscopic program introspection such as performance analysis, debugging, software security, among others. However, existing tools, such as Pin, DynamoRIO, and Dyninst, are difficult to use with their complex APIs. One needs to spend subtantial efforts to develop a useful tool. Moreover, existing tools do not provide efficient APIs to attribute runtime measurements to execution contexts--primarily the calling context. A detailed call path attribution of execution measurements enhances a tool's capability and usability.
In this tutorial, we will introduce DrCCTProf, a library for efficiently collecting execution-wide call paths and associating execution metrics with call paths for fine-grained analysis tools. DrCCTProf is based on DynamoRIO, which works for both ARM and X86 binaries. DrCCTProf is simple to use and effective in improving diagnostic capabilities of fine-grained analysis tools. We introduce simple, yet effective, DrCCTProf APIs that offer rich calling context capabilities. We will introduce advanced DrCCTProf features for attributing every memory access to the corresponding data object in the program. We will introduce DrCCTProf internals for advanced users. We will show example tools atop DrCCTProf for detecting certain classes of software inefficiencies such as dead stores and redundant computations. Using DrCCTProf for these clients, we will show how one can pinpoint software inefficiencies in large, complex code bases and show how one can gain a superior understanding of execution profiles. Using DrCCTProf's pinpointing capabilities we show how one can tune their code to eliminate inefficiencies and obtain significant performance improvements. Finally, we will show the visualization support for DrCCTProf, which facilitates intuitive visualization of huge amounts of data obtained from fine-grained analysis.