Hump-and-dump: efficient generic unpacking using an ordered address execution histogram
Li Sun, RMIT University, Australia and Tim Ebringer, Tim Ebringer, Witham Laboratories, Australia
We present a new and efficient generic unpacking algorithm which effectively locates the OEP area of a packed program. The algorithm is based upon the dual observation that (a) even in a packed program, the OEP bytes are almost always only executed once, and (b) that most packers unpack the original program to an area of memory which has not been previously executed.
Given this, the technique relies upon creating a histogram of the addresses of executed instructions (EIP on x86). Whilst others have done this, the trick is to order the histogram by the last time an address is executed. Decryption, decompression and copying appear as large spikes at the start of the histogram, followed by a flat section, of height one, which is usually the OEP. We attach figures showing the histogram for Upack, on both linear and log scales, which clearly illustrate the OEP after the massive unpacking "hump".
This technique is extremely efficient to implement, and can compute the OEP "on-the-fly" in an emulator, or offline from a trace of EIP. For the Upack example shown, we need only 1.2K of memory to hold the necessary data structures, and computation is similarly cheap (and compatible with dynamic-translation emulators). Given the shape of the chart, and the fact that after the "hump" represents a good opportunity to dump the memory, we have given this technique the somewhat sordid name of "hump-and-dump".