Learning Binary Code Representations for Security Applications

Prof. Heng Yin, Computer Science and Engineering Department, UCR

Learning a numeric representation (also known as embedded vector, or simply embedding) for a piece of binary code (an instruction, a basic block, a function, or even an entire program) has many important security applications, ranging from vulnerability search, plagiarism detection, to malware classification. By reducing a binary code with complex control-flow and data-flow dependencies into a numeric vector using deep learning techniques, we convert complex binary code detection and search problems into search of embeddings, which can be done in O(1) time and often can achieve even higher accuracy than traditional methods. In this talk, I am going to show how we can revolutionize several security applications using this approach, including vulnerability search, malware variant detection, and binary diffing. In the end, I'd like to shed some light on adversarial learning against this new approach.

Prof. Heng Yin